This guide is intended for faculty/staff researchers who are planning or overseeing research projects that involve the collection of data, including investigators with planned or active externally funded sponsored projects who are looking for resources on data management. The image below illustrates one example of a data management lifecycle. It is essential to remember that data management is important throughout the entire research process, not just in the planning phase.
Reused from: https://data.library.virginia.edu/data-management/
Data Management is the process by which researchers plan to collect, store, archive, and ultimately share their research data. Many questions related to Data Management have long been issues researchers are trained to address through the course of their work:
• What data are collected and created?
• How is the data created or collected?
• What supplemental documentation is needed to understand the data?
• Are there privacy issues associated with your data collection?
• Are there legal issues associated with your data?
• How will your data be stored and backed-up during the project?
• How will you ensure data security?
• How much of the collected data will be retained and shared when the project is concluded?
• What is the long-term preservation plan?
• How will the final data be shared?
Answers to these questions will vary by discipline and project.
Data Management guidelines were written to apply across all disciplines. As a result, the definitions and terminology used is often extremely broad. While potentially frustrating, this ambiguity is meant to allow flexibility for researchers as they write a Data Management Plan. Below are some comment terms and definitions:
Data reflects any information created during the course of a research project that is needed to validate or recreate the final results of the study. This can include, but is not limited to: test results, statistics, code, images, computer files, survey responses, transcripts, recordings, laboratory logs, or algorithms.
Live data is data currently being created, manipulated, or used for an ongoing research project.
Storage & Stored Data
Short or long-term storage of active data. This may be on IITS provided storage like Sharepoint, local hard drives, or on the Cloud.
Archived Data is data that is no longer being altered or manipulated, or has served as final research data for a grant or published study. Archived data is being stored in a secure and permanent system, and is accessible to researchers.
Data that is made publicly accessible through data repositories like ScholarWorks, FigShare, or other discipline-specific repositories.
Final Research Data
This is data generated during a project that is needed to validate or recreate the results and conclusion of the completed study. The scope of “Final Research Data” may vary between projects, and it is the responsibility of the Principal Investigator to determine and justify the scope of their Final Research Data within their Data Management Plan.
The primary researcher responsible for managing the research data. Principal Investigator may also be the researcher tasked with overseeing a laboratory, or the lead team member responsible for overseeing data collection and creation.
Data Management Plan
The document drafted by the Principal Investigator, through which data creation, preservation, and sharing policies are outlined. Fundamental issues to be addressed in each plan must include: data collection methods, documentation and metadata, ethics and legal compliance, storage and backup policies, data sharing, data management responsibilities.
Open Data vs Public Data
Public data is publicly available upon request, while open data is immediately and freely available without an intermediary. Data produced by the National Center for Education Statistics (NCES) that must be requested is "public" while data that can be immediately downloaded from the NCES website is "open." Similarly, if researchers need to directly contact a PI for access to research data, this does not meet open data sharing requirements; conversely, depositing data in an open repository like ScholarWorks, which provides 24/7 immediate access to content, does count as open data.