Metadata Education Project

Metadata Element Definitions

Based on the FGDC Content Standards for Digital Geospatial Metadata

Developed by the Wyoming Geographic Information Science Center (WyGISC)

Identification Information (title, originator, abstract, keywords, etc)
Data Quality Information (sources, processes, accuracy, etc)
Spatial Organization Information
Spatial Reference Information (projection, datum, coordinate system)
Entity and Attribute Information (description of tables, fields, records, domains, etc)
Distribution Information (where do you get the data, what format, cost, etc)
Metadata Reference Information


Identification Information


Title: should include location and theme at minimum . This should not be just a computer file name.

Originator: the person or organization responsible for creating the dataset or process (similar to the author(s) in a traditional literature citation).

Publication date: Year when dataset or product was (or will be) ready for distribution. "Unpublished" is acceptable.

Abstract: "This dataset represents ...." - mention its source scale and geographic extent, and brief description of its source(s).

Abstract/Purpose examples:

Purpose: Why was it developed? - mention specific project requirements if necessary.

Time Period of Content: The date(s) that the data was collected, OR, the oldest to most recent date of ANY sources contributing to the dataset.

Currentness Reference: "Ground condition" or "publication date". Data produced from satellite and photographic images or GPS-collected would have a currentness reference of "ground condition" based on the date the photo or image was taken. However, data that was digitized from previously published maps would have a currentness reference of "publication date" for the date when the map was published.

Status: complete, in work or planned

Maintenance and Update Frequency: As Needed, None Planned, Annually, Monthly, etc

Spatial Domain: the bounding box or "footprint" encompassing the dataset or study area, expressed in terms of minimum and maximum latitude and longitude in decimal degrees. Alternatively, another description of the area can be given such as quadrangle names, county names, etc.

Thematic Keywords: Examples - streams, hydrography, geology, political, name of project, name of image sensor (Thematic Mapper), GPS if collected by GPS, etc. Socioeconomic dataset example; wildlife model dataset example

Place Keywords: Examples - Wyoming, county name, quad name, park name, or watershed name - list in a hierarchical fashion from Wyoming down to smallest unit necessary.

Access Constraints: usually none unless you have sensitive data that is only available to certain people or data, which cannot be distributed for free.

Use Constraints: Restrictions or legal prerequisites for using the data. Basic example; detailed example: from a project with national standards, includes appropriate/inappropriate uses; detailed example: includes statements about altering data and citation of data

Dataset Credit: (optional) Credit the people/organizations that produced the source information or were involved in different stages of production (such as collection, digitization, analysis, review, etc).

Native Environment: Software and version (e.g. Arc/Info 7.04, MIPS, Pathfinder Office) the data was created in, and platform (PC, Unix, Mac).


Data Quality


Attribute or Thematic Accuracy: (or thematic accuracy). How sure are you that your attributes are correctly labeled? Cite quantitative accuracy if available and any measures taken to verify attributes. There are three types of accuracy descriptions:

Examples:

Logical Consistency: Did you check for bad values and conditions? Does your data meet all the topological requirements needed (e.g. any overlapping polygons? Gaps between polygons? Unconnected lines?) Can also include consistency of domains (are the values of X between 0 and 100?)

Completeness Report: What is the completeness of the data? If it is in development, how much is still left to do? Was there some data that could not be collected (ran out of time/money or just not available)?

How complete is the attributing for data that was collected? Are there any records missing information because it wasn't available? For instance, for the land cover data set there are attributes for "crown closure", but not all of Wyoming is forested so only about 20% of the polygons are coded for crown closure. This section is to assure the data users that any polygons that are labeled "0" (or left blank if it is a character field) were so labeled intentionally and not merely overlooked in the attributing process. Good example; adequate example

Horizontal Positional Accuracy: Degree of compliance to the spatial registration standard. This standard may vary depending on how the data is registered - for instance, accuracy of GPS data is based on different criteria than accuracy of registered maps or images.

GPS data: the type of GPS equipment (mapping grade, survey grade), settings, number of satellites used, logging intervals of position, and post-processing techniques (such as differential correction) all contribute to the final accuracy of the data.

Maps or images: note the RMS (root mean square) error and the number of registration points used. For images registered to existing coverages, note the final image-to-coverage RMS error, the number of links used in registration, and the maximum positional offset accepted for the links.

In addition to registration error, error should be quantified, or at the very least estimated, for each of these steps in the production process:

Example:

The map was digitized from USGS 1:24,000-scale base maps, with an inherited error of +/- 40 feet according to USGS national mapping standards. In addition to this inherited error, the map was registered with 16 tics and an RMS error of .006, corresponding to +/- 12 ft for this scale. Line shift error is not greater than .01 map inches (+/- 20 ft). Unquantifiable errors may be associated with coordinate shift (fuzzy tolerance was set to 0.99 meters, single precision), snapping (set to 2 meters), and source media (unfolded paper maps were used). Check plots were created to check digitized lines against source maps and any lines off by more than a line width were corrected, but the number of errors detected and corrected were not quantified.

Other methods of determining positional accuracy include comparing the data to independent sources of higher accuracy when they are available, or using internal programs for detecting positional shifts, such as used by the USGS.

In some rare cases, positional accuracy may not apply. For instance, the 1:100,000-scale land cover map for Wyoming was interpreted from satellite imagery with a specific minimum mapping unit (100 ha). Because 100 ha units do not apply to actual vegetation boundaries (it is too generalized), positional accuracy was not a concern for this dataset.

Examples:

Lineage (Sources): List all the sources used in compiling your data. Usually these are source maps, source photography, source images, people involved in GPS point collection, and may also include publications or people that were sources for attribute information. Example

Each source should include:

In some cases, there may be a large number of source maps from a series such as the USGS 1:24,000 or 1:100,000 quadrangle series used to create one digital data set. In such a case it is not practical to list each source map individually in this section. In this case, they can be grouped under one source (e.g. USGS 1:24,000 quadrangle maps) and the time period of content for this source would be a range of dates - the oldest to most recently published map. The actual names of the source maps can be listed (along with publication date) in one of the process steps, or attached in a separate file or table (INFO or dBase) to go with the coverage. If the names/dates of the source maps are attached in a separate file, the name of this file and its contents should be described in the entity/attribute section of the metadata.

Process Steps: this section is used to describe how the dataset was created. Includes relevant steps in the production of the data, or major changes/updates to the data. Digitization of a map could be one step, attributing another step; any unions/intersections, appended or subtracted features or areas of extent should be recorded as individual steps (with dates) as best as possible. Special modeling processes or decision rules used should also be put in this section. In some cases it may be worthwhile to mention a contact for a process if the process was not done in-house (for instance, if the data set was sent to a specialist to be reviewed).

Changes/updates: if an error is found and corrected in a dataset, a detailed description of the correction should be added as a process step with its own date so that data users can determine the status of the data they currently possess.

Examples:


Spatial Organization Information:

This section contains information about how spatial information is represented in the dataset. There are two primary ways:

A dataset composed only of points, or a raster dataset composed only of cells (pixels), is the simplest to describe.

Datasets with lines, nodes, polygons and polygon topology are more difficult to describe and involve more complicated terminology, based on the Spatial Data Transfer Standard (SDTS) federal information processing standard.

raster example, vector example


Spatial Reference Information


The coordinate system, projection and datum that the dataset is in. You can also record spatial reference information for the datas sources - for instance, GPS data can be collected in latitude/longitude coordinates in the WGS84 datum, then converted to UTM zone 12 coordinates in the NAD83 datum. Make sure to record projection parameters when necessary such as latitude of origin, central meridian, false eastings/northings, zones, units (meters, feet), etc.

Abscissa resolution and ordinate resolution: the smallest distance that can exist between two points. This value is almost always the same for the x axis (abscissa) and the y axis (ordinate) but may differ for non-square pixels. For vector data, this is usually the cluster tolerance or the minimum distance at which two points will be automatically converged together. For raster data, this is usually the length/width of the pixel size

Planar distance units: the units of measure the coordinates are represented in. Commonly used units: decimal degrees, meters, feet.

Examples:


Entity and Attribute Information


Description of the content of all entities and attributes (see definitions below) for each feature associated with the dataset.
Contains two main sub-sections, which can be used individually or together to describe the dataset:

The Overview description is unstructured and information is usually written in paragraph format. The Detailed Description is a much more structured method of describing the data dictionary, hierarchically organized. For simple datasets with only one or two entity types and a limited number of attributes and attribute domains, the Overview section may be sufficient. For more complex databases, it is important to enter information in both the Overview and Detailed Descriptions. The Overview Description should be used as a summary, to explain the structure and relation of different entities; the Detailed Description gives specifics about each of the entity's attributes and attribute domains, including definitions and sources.

In the case of some complex databases, a data dictionary may already have been compiled in a database format. Therefore it is redundant to transfer the information over into the Detailed Description format of the FGDC's content standard. The Overview Description can be used to reference the separate data dictionary, its format, and how to access it.

Definitions:


Distribution Information

This section is optional, used only if the dataset is distributed or shared. Contains information about the distributor and options for obtaining the dataset.

Distribution liability: statement of liability assumed by the distributor. Example and detailed example.

Standard order process: this section is repeatable. For instance, the same dataset may be available two different ways: it can be ordered on CD or some other media, or it may be available for download over the internet. This section includes information on where to obtain the data, what format it is in, and any applicable fees. Example


Metadata Reference Information

Contains the name and contact information for the person responsible for creating and/or maintaining the metadata document. Also includes the date the metadata was last modified, and the version of the standard the metadata conforms to (version 1: 1994, version 2: 1998)


 
 

 

 

University of Wyoming Logo
(307) 766-2532  |  webmaster-wygisc@uwyo.edu