OMB No. 0925-0001 and 0925-0002, DMS Plan Format Page

DATA MANAGEMENT AND SHARING PLAN

An example from an application focusing on secondary data analysis on data from human subjects.

If any of the proposed research in the application involves the generation of scientific data, this application is subject to the NIH Policy

for Data Management and Sharing and requires submission of a Data Management and Sharing Plan. If the proposed research in the

application will generate large-scale genomic data, the Genomic Data Sharing Policy also applies and should be addressed in this Plan.

Refer to the detailed instructions in the application guide for developing this plan as well as to additional guidance on sharing.nih.gov.

The Plan is recommended not to exceed two pages. Text in italics should be deleted (but this has not been done in the sample below).

There is no “form page” for the Data Management and Sharing Plan. The DMS Plan may be provided in the format shown below.

Element 1: Data Type

A. Types and amount of scientific data expected to be generated in the project:

Summarize the types and estimated amount of scientific data expected to be generated in the project.

The data to be shared will include MRI images and clinical assessments from human research participants. This

application is focused on secondary data analysis from existing data but will also deposit privately held data to a

public repository. The existing data is available from the NIMH Data Archive (NDA) in collections 2134 (148

subjects) and 2433 (47 subjects). In addition, we have data from a previous study involving 155 research

participants with major depressive disorder that have not yet been shared with the research community but will

be uploaded to NDA during the second quarter of the first year of funding. As discussed in the application,

structural MRI scans are available for time points before and after treatment along with relevant clinical data.

B. Scientific data that will be preserved and shared, and the rationale for doing so:

Describe which scientific data from the project will be preserved and shared and provide the rationale

for this decision.

This is a secondary data analysis application, so new data is not being measured. Much of the data is already

available through NDA. Clinical and imaging data from 155 new subjects will be shared.

C. Metadata, other relevant data, and associated documentation:

Briefly list the metadata, other relevant data, and any associated documentation (e.g., study protocols

and data collection instruments) that will be made accessible to facilitate interpretation of the scientific

data.

Preparation for submitting existing data to NDA is largely complete. Within the first six months following the

award, we will submit the Data Submission Agreement to NDA and will create the Data Expected list (see

Standards section) in our new NDA Collection. The policies of our institution mandate that exact dates will not be

shared (see Access section).

Element 2: Related Tools, Software and/or Code:

State whether specialized tools, software, and/or code are needed to access or manipulate shared

scientific data, and if so, provide the name(s) of the needed tool(s) and software and specify how they

can be accessed.

The basic statistical analyses described in the application will be done using R. We plan to use the MRI data

analysis tools in the FMRIB Software Library (FSL) for multi-level modeling of group effects. BrainVoyager

software will be used for anatomical segmentation to isolate regions of interest within individual subjects, and the

AI-powered analyses described in the application will use custom code written with the PyTorch library for

Python. R, FSL, Python, and PyTorch are all freely available to the research community. BrainVoyager is

commercial software, with licenses available for purchase.

All R and Python code (including trained model weights) will be available on our lab Bitbucket page (located by

searching for “labname” on the Bitbucket web site) no later than when publications are submitted. The Bitbucket

page is publicly assessable and will be hosted for at least 5 years after the grant award ends.

Element 3: Standards:

State what common data standards will be applied to the scientific data and associated metadata to

enable interoperability of datasets and resources and provide the name(s) of the data standards that

will be applied and describe how these data standards will be applied to the scientific data generated by

the research proposed in this project. If applicable, indicate that no consensus standards exist.

The data that will be used for some of the proposed secondary data analysis is already in NDA and is formatted

using NDA data dictionaries. The new data we will deposit will also use existing NDA data dictionaries. Since

the data set to be deposited into NDA was collected prior to the publication of NOT-MH-20-067, not all of the

common data elements expected by NIMH are available. However, we will transform some existing demographic

and clinical data into the formats expected for:

1) Age (ndar_subject01)

2) Sex at Birth (ndar_subject01)

3) Patient Health Questionnaire-9 (PHQ-9, cde_phq901 NDA data dictionary).

In addition, information from the Beck Depression Inventory will be deposited for all 155 research participants

using the NDA bdi01 data dictionary. Deposited images will use the NDA image03 data dictionary. Data derived

from the MRI images will be deposited into NDA using the imagingcollection01 data dictionary.

Element 4: Data Preservation, Access, and Associated Timelines

A. Repository where scientific data and metadata will be archived:

Provide the name of the repository(ies) where scientific data and metadata arising from the project will

be archived; see Selecting a Data Repository

All previously unshared data will be deposited to NDA no later than 12 months after the award begins.

B. How scientific data will be findable and identifiable:

Describe how the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or

other standard indexing tools.

Data will be findable for the research community through the NDA collection that will be established when this

application is funded. For all publications, an NDA study will be created, and the data relevant to that publication

will be shared immediately. Each of those studies is assigned a digital object identifier (DOI). This data DOI will

be referenced in the publication to allow the research community easy access to the exact data used in the

publication.

C. When and how long the scientific data will be made available:

Describe when the scientific data will be made available to other users (i.e., no later than time of an

associated publication or end of the performance period, whichever comes first) and for how long data

will be available

The research community will have access to the previously unshared data at the end of the grant award.

Researchers will request data using the standard processes at NDA, and the NDA data access committee will

decide which requests to grant. The standard NDA data access process allows access for one year and is

renewable. Once the data are submitted to NDA, that archive will control the long-term persistence of the data

set.

Element 5: Access, Distribution, or Reuse Considerations

A. Factors affecting subsequent access, distribution, or reuse of scientific data:

NIH expects that in drafting Plans, researchers maximize the appropriate sharing of scientific data.

Describe and justify any applicable factors or data use limitations affecting subsequent access,

distribution, or reuse of scientific data related to informed consent, privacy and confidentiality

protections, and any other considerations that may limit the extent of data sharing. See Frequently

Asked Questions for examples of justifiable reasons for limiting sharing of data.

The two existing data sets from NDA used consents that allow broad data sharing. The new dataset to be

uploaded to NDA also was collected using informed consent terms that allow broad data sharing. Access to data

housed by the NDA requires the completion of a Data Use Certification (see the Get Data section of the NDA

web site), which prohibits any redistribution or attempts to re-identify research participants.

B. Whether access to scientific data will be controlled:

State whether access to the scientific data will be controlled (i.e., made available by a data repository

only after approval).

To request access of the data, researchers will use the standard processes at NDA, and the NDA Data Access

Committee will decide which requests to grant. The standard NDA data access process allows access for one

year and is renewable. Once the data are submitted to NDA, that archive will control the long-term persistence of

the data set. Currently, NDA has no process for deleting or retiring data sets.

C. Protections for privacy, rights, and confidentiality of human research participants:

If generating scientific data derived from humans, describe how the privacy, rights, and confidentiality of

human research participants will be protected (e.g., through de-identification, Certificates of

Confidentiality, and other protective measures).

The NDA GUID tool allows researchers to aggregate data from the same research participant without different

laboratories having to share personally identifiable information about that research participant. The NDA data

dictionaries do not permit personally identifiable information to be shared. NDA maintains a Certificate of

Confidentiality.

For the 155 participants from our previous study, exact dates have been obscured via the Shift and Truncate

method [1], which preserves within-case temporal relations.

Element 6: Oversight of Data Management and Sharing:

Describe how compliance with this Plan will be monitored and managed, frequency of oversight, and by

whom at your institution (e.g., titles, roles).

The Office of Sponsored Programs at University X has created a data management and sharing plan compliance

system as part of their process for submitting the annual NIH progress report. That Office is collecting

information related to the number of research participants that are deposited each reporting year. For this award,

all of the data will be uploaded in the first year, so the data deposition oversight will end then. The Office of

Sponsored Programs will look for the NDA data DOIs when papers are published and will include that information

in the annual progress report.

Validation Schedule (this section is required by NIMH)

Since this is a secondary data analysis application, validation of newly collected data will not occur. The new

data to be deposited to NDA will go through their validation tool when the data are initially uploaded.

Reference

1. Hripcsak, G., Mirhaji, P., Low, A. F., & Malin, B. A. (2016). Preserving temporal relations in clinical data while

maintaining privacy. Journal of the American Medical Informatics Association, 23(6), 1040-1045.