MANAGING DATA FOR LONG-TERM STEWARDSHIP

  Table of Contents
PART II

POTENTIAL SOLUTIONS

CHAPTER 5

5.0 POTENTIAL SOLUTIONS

This chapter provides potential solutions for overcoming the gaps and limitations associated with current practices as they pertain to the generation, preservation, and future accessibility of stewardship data (summarized in Table 5-1).

Solutions Diagram

Section 5.1 presents suggestions for developing criteria to identify stewardship data. Section 5.2 presents some suggested revisions to DOE records retention schedules to ensure preservation of stewardship data. Section 5.3 presents a preliminary set of metadata descriptors for preserving information on the content, quality, condition, and other characteristics of stewardship data. Section 5.4 provides an understanding of what might be required to develop and maintain a system for providing future access to stewardship data.

Table 5-1. Gaps, Issues, and Potential Solutions

Gaps and Issues Potential Solutions
Data are generated, but not necessarily identified as stewardship data. Develop criteria for identifying stewardship data.
Very few types of stewardship data are permanently preserved. Modify existing records retention schedules.
Sufficient contextual information is not always preserved. Develop metadata standards for stewardship data.
Preserved data are not readily accessible. Develop a system to access stewardship data.

5.1 Develop Criteria for Identifying Stewardship Data

As noted in Chapter 4, it appears that current requirements and practices, if continued into the future, will be sufficient to generate stewardship data. However, there does not appear to be a clear requirement to identify the specific types of information that will be needed by future generations. Discriminating criteria must be applicable to two main types of information, each of which poses a different challenge in terms of identification:

Identification criteria applicable to historical records will need to account for the varying types and amount of contextual information available for these records. Some historical records will have only minimal information available from which to determine stewardship value (e.g., title of document, one-line description of document or data). Other historical records will have an abstract or summary; however, these will most likely have been written from a perspective other than long-term stewardship and thus may not contain sufficient information for determining their stewardship value. Identification criteria applicable to present and future information can be more comprehensive and prescriptive, but will depend on the timely development of a broad consensus on stewardship data needs.

POTENTIAL SOLUTIONS

Identification criteria also will need to be comprehensive to meet the needs of the future activities related to long-term stewardship that were identified in Chapter 2. While some types of information will support most or all of these future activities, others may require unique types of data. For example, different information is needed to support emergency response activities than to conduct compliance oversight or to do community planning. Because the emphasis of work differs among these areas, criteria to identify necessary information to support the work may also differ.

To assist in the development of preliminary criteria for identifying stewardship data, a group of functional area experts was asked to identify a set of criteria for each of the 12 types of stewardship data identified in Chapter 2. These experts also were asked to test these criteria against existing data to examine the effectiveness of the criteria in identifying records of value for stewardship. (See Appendix D for a more complete description.)

General criteria that can be used for screening information include:

  • Content information: Did the document, record, or data contain the necessary information?
  • Vintage: Did it cover the period of interest?
  • Currency: Was it the most recent edition of the work?
  • Stature: Was it used for site decision making, such as a federal facility agreement?
  • Administrative pedigree: Has it received the necessary reviews for release of information?

    Functional area criteria varied among the 12 types of stewardship data and often focused on content issues. For example, criteria for the existing hazards data type included the following:

    Additional criteria apply across all of the individual data types:

    To help sites identify stewardship data, DOE may need to provide policy-level guidance that defines stewardship data, describes its importance to DOE, and outlines how to recognize data with stewardship value. The guidance could further describe:

    CHAPTER 5

    Criteria for identifying stewardship data should be developed as soon as possible to prevent their future loss.

    5.2 Modify Existing Records Retention Schedules

    As noted in Chapter 4, many of the current requirements and practices for data retention are shorter than the long time periods needed for stewardship purposes. For example, DOE records retention schedules require some records to be archived permanently; others to be destroyed after periods of time that range up to 80 years; and others to be destroyed even before cleanup is complete. Data preservation practices are governed by a variety of laws and regulations, each of which applies to some, but not all, types of stewardship data. While it would be advantageous to attempt to modify data preservation requirements under federal laws such as RCRA, CERCLA, and AEA, DOE has more flexibility in modifying existing records retention schedules.

    Existing DOE records retention schedules do not appear to cover all of the 12 stewardship data types (see Appendix B). Even within a particular data category, it does not appear the records schedules require all relevant stewardship data to be preserved permanently. While some of this information will be preserved permanently pursuant to other requirements (e.g., property transfer), stewardship data preservation would be greatly enhanced by:

    Records retention schedules should be revised as soon as possible to prevent the future loss of stewardship data. It will be important to couple any attempts to modify the records schedules with the development of clear criteria for identifying stewardship data. Simply increasing the number of data types to be permanently archived without corresponding criteria to identify the subset of data required for long-term stewardship would lead to an unworkable situation in which nearly every piece of information was permanently archived.

    POTENTIAL SOLUTIONS

    5.3 Develop Metadata for Stewardship Data

    As noted in Chapter 4, current requirements and practices do not seem to require the preservation of sufficient contextual information to allow future generations to be able to understand the nature and context of stewardship data that are preserved. For hard copy records, much of this contextual information is provided by indexes and other finding tools, but there do not appear to be any requirements for the use of standardized indexes or pointers. Several types of federal metadata standards apply to some types of electronic records (e.g., geospatially referenced information), but not all types of electronic records appear to be covered by these requirements. As DOE sites move toward electronic-based information management systems (see Figure 5-1), improved metadata standards will become more important for stewardship data. Metadata standards also are an appropriate focus because geospatial referencing may be an effective tool for users to locate stewardship data.

    Figure 5-1. Estimated Percentage of Records Created and Stored in Electronic Format

    Estimated Percentage of Records Chart

    Much of the information required to support long-term stewardship activities involves spatially referenced data. For example, future users may wish to know:

    The metadata standards developed for spatial data are a logical starting point for preserving the essential contextual information about stewardship data. The existing Federal Geographic Data Committee (FGDC) Content Standards for Digital Geospatial Metadata primarily describe published data sets, describing their structure, content, and data quality issues without making specific refer-

    CHAPTER 5

    ence to the particular needs of stewardship for DOE facilities. For stewardship purposes, DOE needs to develop metadata standards that are well-suited to describing both published and non-published data. DOE also will require additional stewardship-specific information not explicitly referenced in the generic FGDC standards.

    DOE needs at least two different types of metadata standards. First, in the near term, there is a need to screen a broad range of data sets for their potential stewardship value. By necessity, this screening must be a rapid process and may often need to depend on comparatively sketchy information in existing indexing systems or on documentation prepared by individuals who are not subject matter experts. Second, once a data set is identified as having stewardship value, the content, quality, condition, and other characteristics of the data set need to be documented sufficiently to ensure the long-term usability of the data.

    The FGDC core metadata elements (i. e., the mandatory elements) are an appropriate starting point for rapid screening of data sets for stewardship value. These elements are relatively easy to complete for a data set, and if formal records of the data sets already evaluated have been maintained, the amount of detail provided by the core metadata elements should be sufficient to avoid repeatedly re-evaluating the stewardship potential of the same data sets. Moreover, providing the amount of detail required by the full FGDC structure can take substantially more time, particularly when creating after-the-fact documentation for existing data sets. Therefore, the full FGDC metadata elements do not appear appropriate during the screening process. Table 5-2 lists the data fields included as sets of metadata elements whose output meets the minimum data collection requirements of the FGDC core elements.

    Table 5-2. Data Fields Included in a Metadata Form that Meet FGDC Core Elements

    Data Fields
    Identity of this entry (for future tracking and updating)
    Originator
    Publication date
    Title of data set
    Edition
    Presentation form (e.g., map, atlas)
    Publication place
    Publisher
    Online linkage (URL)
    Abstract
    Purpose
    Supplemental information
    Beginning and ending dates
    Currentness reference
    Progress
    Intended data set maintenance and update frequency
    Bounding coordinates (West, East, North, South)
    Theme keywords
    Theme keywords reference
    Place keywords
    Place keywords reference
    Limits on data accessibility
    Limits on use of data
    Browse graphic URL
    Browse graphic caption
    Browse graphic file type
    Spatial data type
    Distribution organization
    Distribution contact position/ person
    Address type
    Street address, city, state or province, postal code, country, phone, fax, e-mail
    Data set name as known by distributor
    Liability held by distributor
    Date of last metadata entry or update (year, month, date)
    On-screen forms are available on the Internet at: http:/ www/ fgdc/ gov/ Clearinghouse/ MetadataESystem/ metaform. html

    POTENTIAL SOLUTIONS

    Although data sets can be documented comparatively quickly using only the mandatory data fields, the full FGDC structure is designed to describe fully the characteristics of a data set and would be required for future use of the data. Data sets with likely stewardship value merit published descriptions sufficient to let potential users identify the suitability of a data set to their purpose, obtain the information, and contact the creators of the data for further information if necessary. This is the essence of the FGDC and Government Information Locator Service (GILS) approaches outlined above. However, the original creators of the data will not be available over the long life-cycle of stewardship data, so it will be important to complete metadata documentation sufficiently to ensure utility of the data over decades or centuries. Therefore, even the full set of FGDC metadata standards may not be sufficient for stewardship data.

    To document key data sets for future stewardship use, it will be necessary to go beyond the existing FGDC metadata standards. The FGDC process provides a suitable framework, because supplemental profiles tailored to a specific type of data can be formally proposed and approved. However, additional information to be documented for stewardship may include:

    Depending on the eventual design of a system to manage stewardship data, several other types of metadata elements may be required. The FGDC standards, for example, would allow a user to identify and obtain all records that pertain to a given building at a site for a particular time period. However, the standards by themselves would not ensure that the user would be able to merge the data from a variety of sources into a single data set. If a decision is made to merge a number of stewardship data sets together into a unified data set, then additional information will need to be col-

    CHAPTER 5

    lected and preserved. This would include the complete specifications defining the information content of the data set together with the transformation rules used to incorporate the source data into the unified data set. A stewardship archive could take this approach or preserve the source data sets in their original form.

    5.4 Develop a System to Access Stewardship Data

    As noted in Chapter 4, perhaps the most difficult challenge involving stewardship data is their accessibility. Under current requirements and practices, people get access to archived information primarily by request. Information is preserved in a number of places, so it is difficult for users to know that relevant data may exist and where to look for these data. When information is located, it may take more than a year to retrieve it. The lack of adequate contextual information may make it difficult or impossible to use any information that is retrieved.

    This discussion outlines the elements of a system DOE could adopt to deliver stewardship data to appropriate users. The discussion considers both the requirements of an overall system to manage stewardship data and the types of roles, responsibilities, and other practices that need to be established to manage and operate such a system.

    5.4.1 ELEMENTS OF A STEWARDSHIP DATA SYSTEM

    Any system for managing stewardship data must be able to perform two key functions: (1) maintain physical control of stewardship data from the time they are identified until they are no longer needed (if such a time can be identified); and (2) enable appropriate users to find and retrieve these data in a timely manner. The first requirement is essentially an inventory or asset control problem; the system must be able to track the location and status of all physical and electronic units of stewardship data and ensure that these data are being adequately preserved. The second requirement is essentially an accessibility problem; the system must allow appropriate users to identify, find, and obtain all units of stewardship data that may be of interest.

    Both physical control and accessibility must be maintained throughout the full life-cycle of stewardship information. One of the unique challenges for stewardship data is maintaining accessibility even when physical control is transferred from one entity to another. There are five key elements to include in a stewardship data system:

    POTENTIAL SOLUTIONS

    Two systems under development may be useful to consider when designing a stewardship data system. First, the National Environmental Data Index (NEDI) is being developed to assist in integrating a diverse set of environmental data generated by several federal agencies. The design principles for this index may provide guidance for how to integrate a diverse set of stewardship data, currently in multiple formats, across all DOE sites. Second, DOE has begun to develop a data repository and retrieval system for the proposed geologic repository at Yucca Mountain, Nevada. This system is designed to improve the accessibility, traceability, and transparency of data critical for decisions related to granting permits for the proposed geologic repository. The system includes an electronic archive, indexing system, interface, and search engine, and currently handles requests for about 15,000 pages of information per month. The Openness Advisory Panel of the Secretary of Energy's Advisory Board concluded that existing technologies and expertise are sufficient to extrapolate experience at Yucca Mountain to the entire Department. 1

    Design Principles for the National Environmental Data Index (NEDI):
    • Support multiple metadata standards
    • Use Internet and other communication links, as needed
    • Develop the system as a distributed data index
    • Support distributed searches (through FIPS 192/ Z39.50)
    • Use existing standards and off-the-shelf software
    • Support multiple interfaces to span the range of user needs
    • Allow for multiple access points to NEDI

    5.4.2 PRACTICES AND PROCESSES

    A system for managing stewardship data will need to establish and codify practices to ensure physical control and accessibility of all information from the time it is identified as stewardship data until it is no longer needed for stewardship purposes, or indefinitely. These practices must be clear and simple enough to be followed by DOE and current contractors as well as future site stewards, particularly when responsibility for a given activity shifts from one entity to another. These practices also need to ensure that appropriate users can access stewardship data for decades or centuries. Any system for managing stewardship data must establish a process for:

    1 The Prospects for Introducing a Comprehensive Electronic Records Management System into the Department of Energy. Secretary of Energy Advisory Board, Openness Advisory Panel, Draft Subgroup Preliminary Assessment Report, November, 1997.

    CHAPTER 5

    5.4.3 INSTITUTIONAL FRAMEWORK FOR STEWARDSHIP DATA

    Although DOE sites can take many steps now to begin implementing practices and processes for addressing stewardship data needs, a more systematic approach is needed to coordinate and focus efforts across all DOE organizations and sites. To do this effectively, DOE needs to develop an institutional framework to generate, preserve, and provide access to stewardship data. Since the stewardship mission differs significantly from missions of existing organizations within DOE, a specialized stewardship data entity would likely be the most effective means of providing for stewardship data needs. A distinct stewardship data entity also would mean that funding for long-term stewardship can be addressed directly through the annual budget process, rather than dispersed as an indirect cost in a variety of DOE offices.

    It is impossible to determine how many entities might be involved in stewardship at the local, state, regional, and/or national levels or whether these would be government agencies, non-governmental organizations, or commercial enterprises. A variety of options exist for developing an institutional framework and for distributing responsibilities associated with managing stewardship data among current and future stewardship entities. The following three options for managing stewardship data describe the range of possibilities for designing such a framework:

    POTENTIAL SOLUTIONS

    One of the key functions of a stewardship data entity would be coordination of stewardship information management activities at all the sites for which DOE is responsible as they complete cleanup and other missions and prepare for closure or transfer and for long-term stewardship. Another key function would be to maintain the electronic archive and indexing/ metadata system critical for data accessibility. This entity or function could be located at the field level, preferably at a site with a well-defined long-term mission. Ideally the site would already have the resources, personnel, expertise, and technologies needed for the stewardship data functions.

    Establishing a new data function in a central facility would require some investment of effort and funds. Funds are currently being spent on data retention with no assurance that the systems and data needed by long-term stewards will be available. Given the findings cited in Chapter 4, a single, effective data preservation system would reduce costs because it would prevent the loss of records, eliminate the need to regenerate information, and possibly help avoid site closure delays.

      Back to Top

    Last Updated 03/16/1999 (jrjb)