Data Sharing Guidelines and Questions

Kickstarting a discussion by and for the I³ community

Published onMar 01, 2021
A perennial challenge is to streamline data sharing, and to find the best repository to host an open dataset.

Some things we’re working on:

  1. A set of community data-sharing guidelines and best practices

  2. A catalog of datasets used across the community

  3. Experimenting with office hours and other resources to work through related issues

Please find below an initial set of questions to ask yourself when sharing data, and a top-line set of core guidelines. These will evolve over time, and your feedback and troubleshooting are invaluable.

Universal Questions for sharing datasets

What kind of data do you want to share? (file size? compression? updatable?)

Where is the best place to put the data? (for how long? requiring registration for users?
what sort of usage statistics? what affiliation? will I be maintaining and responding to queries?)  

Guidelines for sharing datasets

Describing the project, the data, the code; links, terms of use

 Description of the project/dataset:

What was the project about and why share this data, and who should use it?

Who are the individuals involved (if more than just yourself) and contact details?

Any plans/schedule for updating the data, and how users can submit issues/requests (GitHub has this inbuilt)

Describe the data:

Data files + datasets (size, relevance to what sort of analysis, etc.)

Data schemas and/or field descriptions

How is the data updated (if relevant)

Describe the code:

Document the source code and process used to create the dataset

How to use and build on the data

Tuneable parameters in key steps, explicit nods to replication, etc.

External links:

URL / link to other datasets/software/code used

URL / link to related papers

Terms of use:

How should it be cited (‘If you use the data, please cite…’)

Any license details (code, third party data, associated copyright or terms of use, etc.)

Here is a link to submit reference data sets or papers you feel are useful to working with your data sets : _________________.

We look forward to discussing these guidelines and any issues you have encountered as you prepare to share data sets in our upcoming virtual office hours.

