February 21, 2011

Data Mangement Resource List

A collection of items I'm pulling together as a potential resource to place on the Research Gateway.

How to Write a Data Management Plan for a National Science Foundation (NSF) Proposal

LSA Joint IT Research Committee Data Management Plan Webpage

Writing DMPs - ICPSR

ISPSR Data Management Webinars

UM Library Data Management Page

Posted by kkwaiser at 04:25 PM | Comments (0)

May 07, 2010

Federal Research Public Access Act of 2009

There is a bill in the House of Representative's Committee on Oversight and Government Reform (H.R. 5037) that would require federal funding agencies to implement procedures to ensure that data and publications funded by the government become common property. This link should 1take you there.

Out of curiosity, I called Dingell's office to see if he had a position. It's not his committee so he didn't have one but the rep for the Rep said he would get back with a response.

Posted by kkwaiser at 11:05 AM | Comments (0)

NSF data management requirements

Here's a story in Science about NSF data management plan requirements.

And here is an email response I wrote to the person who drew this to my attention:

I think UMBS may serve as an example of one type of departmental-level response to this. We've currently finished drafting a data management policy (please don't circulate):


and we recently inserted a section on data management into a proposal our Director, Knute Nadelhoffer, is on. I think the next step for us will be to draft a generic appendix that meets requirements of the NSF solicitation (http://www.nsf.gov/pubs/2009/nsf09514/nsf09514.htm; see below). We (I) would then be available to the researchers to identify likely future datasets and customize the appendix.


(A-1) Data Management Plan (maximum 1 page): Development and adherence to community-wide standards for collection and presentation of data, such as microarray or interactome data, are highly encouraged. Large-scale datasets must be made available in a format that enables rapid comparison and effective utilization of reproducible information. All proposals must include a detailed data management plan if the project is expected to generate significant digital data for preservation (maximum 1 page). The contents of the data management plan should include:

* The types of data to be produced
* The standards that would be applied for format, metadata content, etc.
* Provisions for archiving and preservation
* Access policies and provisions
* Plans for eventual transition or termination of the data collection after the NSF funding period


Posted by kkwaiser at 11:01 AM | Comments (0)

April 19, 2010

Citing Data Sets

Here are a few good resources on citing data sets:


Posted by kkwaiser at 03:16 PM | Comments (0)

September 09, 2009

The Proposal

UMBS Data Management Policy: DRAFT FOR COMMENT

Provided by
Kyle Kwaiser - Information Manager, UM Biological Station
Knute Nadelhoffer - Director, UM Biological Station

UMBS Data Management Policy (Proposed)

The University of Michigan Biological Station (UMBS), founded in 1909, is dedicated to education and research in field biology and related environmental sciences. The history and status of the UMBS as a leader in environmental education and research creates an obligation to preserve data that describe the ecosystems of Northern Michigan while fostering the development of knowledge that contributes to an understanding of local and global environmental problems and solutions. The UMBS Information Management System is intended to accommodate these responsibilities by achieving the following:

• Ensure the long-term (>20 years) value and viability of data sets collected with UMBS resources through proper metadata documentation and data archiving.

• Protect the near-term and long-term intellectual property rights of those who originate data at the UMBS.

• Facilitate access to UMBS-related data in order to create opportunities for the development of unique research questions and researcher collaborations that will further advance environmental research and education.

This Data Management Policy supports these goals by outlining the basic principles the that UMBS Information Management System, Data Users, and Data Originators will adhere to. This document 1) defines two categories of data recognized by the UMBS Information Management System, 2) outlines metadata requirements that must accompany contributed data, and 3) dictates a Data Usage Agreement that details the rights of Data Originators and responsibilities of Data Users .

This Data Management Policy applies to all data collected by current and future courses and research projects that use UMBS resources (e.g., facilities, properties, lab equipment) in any way. Submission of data collected without UMBS resources (e.g., collected at off-site study areas) that are pertinent to the goals of the UMBS and its students and researchers is highly encouraged. At the time of submission, the following Data Management Policy will be applied to non-UMBS data sets unless written modifications are approved by the Director and Information Manager.

Data Types

The UMBS acknowledges that a right of “first-use” of a Data Set is accorded to the Data Originator. To balance this right with the need to document and archive data sets as they are developed the UMBS recognizes two types of Data based upon their availability to those not directly involved in the data collection effort.

• Non-Restricted - Data that will be made available publicly following submission to the Information Management System. No usage limitations, aside from those stated in the Data Use Agreement (see below), apply to this type of data. Legacy data sets, data collected by UMBS courses, and some data collected with UMBS funding are examples of Non-Restricted data.

• Restricted - Data to be archived at the UMBS but with public access limited to the metadata because they are part of an ongoing project. Restricted data will automatically convert to Non-Restricted data two (2) years following collection or the publication of major findings, whichever comes first. In order to better protect the right of “first-use” of researchers who collect data at the UMBS, extensions to the Restricted status can be achieved upon approval by the Director and Information Manager. Examples of data that may qualify for a Restricted status-extension include data used by graduate students who are in the process of completing a thesis or dissertation and data that are part of a long-term study. In this case, annual submission of properly documented data for archiving purposes will be required to maintain Restricted status.

In exceptional circumstances, data can receive permanent status as Restricted data. Examples of this include endangered species location information, data protected by licensing or copyright restrictions or data covered by the Human Subjects Act.

UMBS Course Project Data

Data Sets and reports produced by students of UMBS courses present a unique resource and it is the goal of the UMBS Information Management System to archive these resources for future use by students and researchers. As part of a final project, student groups are expected to submit to the UMBS Resident Biologist Bob Vande Kopple 1) a copy of their report as a Portable Document Format (PDF) or Word Document (.doc or .docx), 2) a spreadsheet containing the raw data used to derive the project results (.txt, .csv, .xls or .xlsx format) and 3) an accompanying metadata form, to be supplied by the Information Manager.

It is recognized that the shortened time allotted to students for the development of field research projects introduces a high degree of variability in product quality. To account for this, Faculty Instructors and Teaching Assistants are asked to categorize student products into one of three Tiers (see below). The Information Manager will meet with student groups to oversee metadata creation and data formatting based on this categorization.

All products of UMBS student projects are considered to be Non-restricted Data unless an exception is sought and approved by the Director and Information Manager.

1) Tier 1 Data
• Publication quality data, especially if aggregated among years
• Extensible by future student project groups
• Accompanied by complete metadata
• Well formulated, documented and reproducible methods
• Receives a high priority for archiving

2) Tier 2 Data
• Useful for exploratory analysis by UMBS researchers
• Extensible by future student project groups
• Accompanied by complete metadata
• Some gaps in the documentation of methodology may exist
• Receives a mid-level priority for archiving

3) Tier 3 Data
• Usefulness limited to immediate educational purposes
• No/insufficient accompanying metadata
• Methodology cannot be reproduced
• Product is not extensible among years
• Receives a low priority for archiving Metadata

The proper documentation of the materials and methods used to collect environmental data is absolutely essential to ensuring long-term viability of data. All Data Sets submitted to the UMBS Information Management System must be accompanied with a completed metadata form to be provided by the Information Manager and three (3) hard copies or a Portable Document Format (PDF) version of any reports or manuscripts derived using the data. Metadata will be made freely available to the public regardless of Data Type.

The failure of researchers to provide copies of appropriately documented data within the specified timeframes will result in the denial of future use of UMBS resources at the discretion of the Director and Executive Committee.

Data Use Agreement (DUA)

The use and application of data made available through the University of Michigan Biological Station (UMBS) Information Management System is subject to the following restrictions and qualifications:

1) The Data User will acknowledge the Data Originator (e.g., Principal Investigator) and the UMBS in any publications, reports, or presentations that use data falling under the auspices of the UMBS Information Management System. Where such products result from the use of UMBS Data, the Data User is strongly urged to consider collaboration and/or co-authorship with the Data Originator.

2) The Data User will provide three (3) hard copies or a Portable Document Format (PDF) version of all manuscripts and reports derived from Data Sets obtained through the UMBS Information Management System to the Data Originator and to the UMBS Resident Biologist Bob Vande Kopple (bvk@umich.edu).

3) The Data User agrees not to disseminate or re-distribute data covered under the UMBS Information Management System beyond the immediate collaboration sphere.

4) Products garnered from data covered under the UMBS Information Management System may be used for non-profit purposes only. The Data User agrees to make these products publicly available in a timely manner.

5) The Data User is fully responsible for all errors in analysis and judgment that are derived from UMBS Data.

6) Violation of any of the terms of this Data Use Agreement by the Data User may result in the immediate forfeiture of all UMBS Data and the denial of future use of the UMBS Information Management System.


While the UMBS strives to provide data of the highest quality, all data secured from the UMBS is provided "as is." The UMBS is not responsible for errors in or conclusions drawn from the use of UMBS Data.


“Data Set” – Digital data and its metadata derived from any research activity such as field observations, collections, laboratory analysis, experiments, or the post-processing of existing data and identified by a unique identifier issued by a recognized cataloging authority such as a site, university, agency, or other organization.
“Data User” - individual to whom access has been granted to this Data Set, including his or her immediate collaboration sphere, defined here as the institutions, partners, students and staff with whom the Data User collaborates, and with whom access must be granted, in order to fulfill the Data User's intended use of the Data Set
“Data Originator” - individual or institution that produced the Data Set
“UMBS Data” – Data sets that are archived or otherwise in the care of the UMBS Information Management System. UMBS Data is subject to all restrictions and requirements outlined in the UMBS Data Management Policy

Note: These definitions are adapted from the LTER Network Data Access Policy, Data Access Requirements, and General Data Use Agreement

Posted by kkwaiser at 12:06 PM | Comments (0)

LTER Data Management Policy

This is the suggested data management policy for LTER sites.


LTER Network Data Access Policy, Data Access Requirements, and General Data Use Agreement

approved by the LTER Coordinating Committee April 6, 2005

Long Term Ecological Research Network Data Access Policy

The LTER data policy includes three specific sections designed to express shared network policies regarding the release of LTER data products, user registration for accessing data, and the licensing agreements specifying the conditions for data use.

LTER Network Data Release Policy

Data and information derived from publicly funded research in the U.S. LTER Network, totally or partially from LTER funds from NSF, Institutional Cost-Share, or Partner Agency or Institution where a formal memorandum of understanding with LTER has been established, are made available online with as few restrictions as possible, on a nondiscriminatory basis. LTER Network scientists should make every effort to release data in a timely fashion and with attention to accurate and complete metadata.


There are two data types:

Type I – data are to be released to the general public according to the terms of the general data use agreement (see Section 3 below) within 2 years from collection and no later than the publication of the main findings from the dataset and,

Type II - data are to be released to restricted audiences according to terms specified by the owners of the data. Type II data are considered to be exceptional and should be rare in occurrence. The justification for exceptions must be well documented and approved by the lead PI and Site Data Manager. Some examples of Type II data restrictions may include: locations of rare or endangered species, data that are covered under prior licensing or copyright (e.g., SPOT satellite data), or covered by the Human Subjects Act. Researchers that make use of Type II Data may be subject to additional restrictions to protect any applicable commercial or confidentiality interests.

While the spirit of this document is to promote maximum availability for ecological data in either Type I or II status, there are criteria by which priority for data release may be determined. Primary observations collected for core research activities directly supported by LTER research must receive the highest priority for data release. Data collected by other sources to which LTER supported research has added value is also a high priority Other types of data including non-LTER data that was acquired for LTER research, student thesis data, schoolyard LTER data, or legacy data that already suffer from inadequate documentation or format obsolescence may be ranked a lower priority by a site with justifications provided in their data management policy. Finally, some data may be determined of lowest priority for archiving on the grounds that they are interim data that led to final products that carry the scientific value. These might include data files created during stages within an analytic workflow, raw or replicate data values that were subsequently aggregated or processed for release, or individual outputs from stochastic models.


1. Metadata documenting archived/online data sets of all types listed above will be made available when, or before, the dataset itself is released according to the terms above.
2. All metadata will be publicly available regardless of any restrictions on access to the data.
3. All metadata will follow LTER recommended standards and will minimally contain adequate information on proper citation, access, contact information, and discovery. Complete information including methods, structure, semantics, and quality control/assurance is expected for most datasets and is strongly encouraged.

LTER Network Data Access Requirements

The access to all LTER data is subject to requirements set forth by this policy document to enable data providers to track usage, evaluate its impact in the community, and confirm users' acceptance of the terms of acceptable use. These requirements are standardized across the LTER Network to provide contractual exchange of data between Site Data Providers, Network Data Providers, and Data Users that can be encoded into electronic form and exchanged between computers. This will allow direct access to data via a common portal once these requirements have been fulfilled. The following information may be required directly or by proxy prior to the transference of any data object:

1. Name
2. Affiliation
3. Email Address
4. Full Contact Information

* Acceptance of the General Public Use Agreement or Restricted Data Use Agreement, as applicable.

* A Statement of Intended Use that is compliant with the above agreements. Such statements may be made submitted explicitly or made implicitly via the data access portal interface.

Data providers wishing to impose further requirements beyond these are encouraged to include them in their Restricted Data Use Agreements accompanying the datasets.

Data Use Agreements

Datasets released by LTER sites or the network will be accompanied with a use agreement that specifies the conditions for data use. For Type I data, this shall be the General Data Use Agreement (see appendix II). This document specifies general roles and the obligations and rights enjoyed by each regarding the use of most dataset released for general public use. For Type II datasets, a Restricted Data Use Agreement must be provided with the dataset that identifies the specific restrictions on the use of the data and their justification. Because these are expected to be unique to the dataset, no template is provided although in most cases the General Data Use Agreement can be modified to serve. Grounds for restricting data may include the need to restrict access to species, habitats or cultural resources protected by legislation; rights of privacy granted by human subjects legislation; or protection of intellectual, financial or legal rights over the data held by a third party.

This policy becomes effective when approved by the LTER Network Coordinating Committee. It may be revised by, or at the request of, the same body.

General Data Use Agreement


“Data Set” – Digital data and its metadata derived from any research activity such as field observations, collections, laboratory analysis, experiments, or the post-processing of existing data and identified by a unique identifier issued by a recognized cataloging authority such as a site, university, agency, or other organization.
“Data User” - individual to whom access has been granted to this Data Set, including his or her immediate collaboration sphere, defined here as the institutions, partners, students and staff with whom the Data User collaborates, and with whom access must be granted, in order to fulfill the Data User's intended use of the Data Set
“Data Set Creator” - individual or institution that produced the Data Set
“Data Set Owner” – individual or institution that holds intellectual property rights to the dataset. Note that this may or may not be defined as a legal copyright. If no other party is designated in the metadata as Data Set Owner, it may be presumed that these rights are held by the Data Set Creator.
“Data Set Distributor” - individual or institution providing access to the Data Sets.
“Data Set Contact” - party designated in the accompanying metadata of the Data Set as the primary contact for the Data Set.

Conditions of Use

The re-use of scientific data has the potential to greatly increase communication, collaboration and synthesis within and among disciplines, and thus is fostered, supported and encouraged. Permission to use this dataset is granted to the Data User free of charge subject to the following terms:

1) Acceptable use. Use of the dataset will be restricted to academic, research, educational, government, recreational, or other not-for-profit professional purposes. The Data User is permitted to produce and distribute derived works from this dataset provided that they are released under the same license terms as those accompanying this Data Set. Any other uses for the Data Set or its derived products will require explicit permission from the dataset owner.
2 ) Redistribution. The data are provided for use by the Data User. The metadata and this license must accompany all copies made and be available to all users of this Data Set. The Data User will not redistribute the original Data Set beyond this collaboration sphere.
3 ) Citation. It is considered a matter of professional ethics to acknowledge the work of other scientists. Thus, the Data User will properly cite the Data Set in any publications or in the metadata of any derived data products that were produced using the Data Set. Citation should take the following general form: Creator, Year of Data Publication, Title of Dataset, Publisher, Dataset identifier. For example:

McKee, W. 2001. Vascular plant list on the Andrews Experimental Forest and nearby Research Natural Areas: Long-Term Ecological Research. Corvallis, OR: Forest Science Data Bank: SA002. [Database]. http://www.fsl.orst.edu/lter/data/abstract.cfm?dbcode=SA002. (21 October 2004)

4 ) Acknowledgement. The Data User should acknowledge any institutional support or specific funding awards referenced in the metadata accompanying this dataset in any publications where the Data Set contributed significantly to its content. Acknowledgements should identify the supporting party, the party that received the support, and any identifying information such as grant numbers. For example:

Data sets were provided by the Forest Science Data Bank, a partnership between the Department of Forest Science, Oregon State University, and the U.S. Forest Service Pacific Northwest Research Station, Corvallis, Oregon. Significant funding for collection of these data was provided by the National Science Foundation Long-Term Ecological Research program (NSF Grant numbers BSR-90-11663 and DEB-96-32921).

5 ) Notification. The Data User will notify the Data Set Contact when any derivative work or publication based on or derived from the Data Set is distributed. The Data User will provide the data contact with two reprints of any publications resulting from use of the Data Set and will provide copies, or on-line access to, any derived digital products. Notification will include an explanation of how the Data Set was used to produce the derived work.

6 ) Collaboration. The Data Set has been released in the spirit of open scientific collaboration. Data Users are thus strongly encouraged to consider consultation, collaboration and/or co-authorship with the Data Set Creator.

By accepting this Data Set, the Data User agrees to abide by the terms of this agreement. The Data Owner shall have the right to terminate this agreement immediately by written notice upon the Data User's breach of, or non-compliance with, any of its terms. The Data User may be held responsible for any misuse that is caused or encouraged by the Data User's failure to abide by the terms of this agreement.


While substantial efforts are made to ensure the accuracy of data and documentation contained in this Data Set, complete accuracy of data and metadata cannot be guaranteed. All data and metadata are made available "as is". The Data User holds all parties involved in the production or distribution of the Data Set harmless for damages resulting from its use or interpretation.

Posted by kkwaiser at 09:01 AM | Comments (0)