Skip to Main Content

Data Management and Sharing

Documentation

In general terms, documentation is the supplemental material that provides the information needed to read, understand, identify, and reuse data.
Documentation varies between disciplines, but generally might be presented as: Readme Files, Data dictionaries, Code books, Glossary, Definition files, Lab notebooks, or other supporting documents.

Best Practices
  • Documentation should be in file types that are nonproprietary or open source (.txt, .csv, .ods).  
  • Documents should describe the content of data files, defines values, explain variables and parameters.

File Formats and Naming Conventions

Best Practices

  • Use descriptive names that identify content and version without being too long (less than 25 characters).
  • Name may also indicate researcher, equipment, lab, or date. This varies by the needs of the project.
  • Avoid special characters like ! @ # $ % ^ & *.
  • Add versions or dates into file names
  • When using dates, use numerals and begin with the year and month. Example: 1/26/21 would be 20210126. 

File Formats 

Consider using file types that can be opened without proprietary software. These options include:

  • Video images: MOV, MPEG, AVI, MXF
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Sounds: WAVE, AIFF, MP3, MXF
  • Containers: TAR, GZIP, ZIP
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tables: CSV
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF      
  • Web archives: WARC

Metadata

Metadata standards vary between disciplines, but is broadly described as "data about data." Metadata provides contextual information surrounding the collected data, indicating the creator, creation date, format, subject, and other important details.

At a minimum, metadata should contain the 15 elements identified by Dublin Core standards (text below is from the Dublin Core guide):

  • Title: A name given to the resource. Typically a Title will be a name by which the resource is formally known.
  • Creator: An entity primarily responsible for making the resource. Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.
  • Subject: The topic of the resource. Typically the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.
  • Description: An account of the resource. Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource.
  • Publisher: An entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.
  • Contributor: An entity responsible for making contributions to the resource. Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.
  • Coverage: The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. Temporal topic may be a named period, date, or date range. A jurisdiction may be a named administrative entity or a geographic place to which the resource applies. Recommended best practice is to use a controlled vocabulary such as the Thesaurus of Geographic Names [TGN]. Where appropriate, named places or time periods can be used in preference to numeric identifiers such as sets of coordinates or date ranges.
  • Date: A point or period of time associated with an event in the lifecycle of the resource. Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].
  • Type: The nature or genre of the resource. Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE] to describe the file format, physical medium, or dimensions of the resource, use the Format element.
  • Format: The file format, physical medium, or dimensions of the resource. Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME].
  • Identifier: An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string conforming to a formal identification system.
  • Source: A related resource from which the described resource is derived. The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system.
  • Rights: Information about rights held in and over the resource. Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights.
  • Language:  A language of the resource. Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646].
  • Publisher: An entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.

Metadata Standards by Discipline

Metadata requirements vary between disciplines and funding sources, however some of the standards are below:

General

Humanities

Social Sciences

Natural Sciences

Earth Sciences

Ecology

Geography & Geospatial

Other recommended metadata standards

Guide Editors

Carmen Mitchell, Scholarly Communication Librarian and Melissa TeetzelManager, Grants and Contracts Development have compiled the resources in this guide. Grateful acknowledgement is given to Portland State University Library for allowing us to reuse the content on their Data Management Guide.