Skip to main content

Building the iiindex

A collaborative home for innovation data

Published onDec 07, 2021
Building the iiindex
·

This week, the I3 released the beta version of our collaboratively-edited index for open innovation datasets and tools: iiindex.org. We hope this can become a place where resources across disparate platforms may be shared, annotated, and linked to one another by the research community.

The index is the result of a longer thought process about what a useful home for innovation data might look like. Early on in developing this site, we realised we should not try to replicate the many platforms available for data publication (Dataverse, BigQuery, Zenodo, Dryad, &c.), nor push people to use any one of these — each has its own affordances suited to different projects. Instead, this is a lightweight index of resources, an overlay pointing to their canonical home, with metadata that can be curated and updated by its community of users.

Edit this index! It is kept in sync with the Google sheet you see here.

Design requirements

When we set out to design this site, we had a number of requirements. The first, and perhaps the most important, was that anyone should be able to contribute edits, with the lowest possible friction. Another was that the site should be fully versioned, so changes over time can be tracked and annotated. We also wanted it to be easy to archive a static version of the site that could be kept up with minimal maintenance, so that it can remain online for years in a stable state.

Lastly, the first version of the index was a public Google sheet, a popular workflow that we did not want to disturb. As a result, the site remains editable by anyone, via this spreadsheet, and it is managed, hosted and versioned using Github infrastructure (and so fully versioned, while remaining an essentially static site). In addition to the sheet, valid pull requests to the Open Innovation Dataset Index Github repository are automatically integrated into the site, without contributing accounts requiring write access or prior approval.

The state of the Index (v 0.2)

The index currently houses lists of datasets and tools (and soon, data-publishing platforms) that have been recommended by people and institutions linked to innovation data research.

Each of these lists corresponds to a different tab of the Google Sheet, where each row contains metadata about a particular resource. This metadata including common fields such as title, DOI, and authors; but also harder-to-find information such as licensing details, derivative and superceding resources, and the range of years each resource covers.

When additions or edits are made to the sheet, a corresponding change is made to a markdown file on Github, containing this metadata information (in the file header) and a space for freeform notes, annotations, code samples and annotations (in the body of the file). These files are used to generate the website itself. Currently, the site also includes a basic search and tag-based filtering.

Curated collections

Another question that arose when we designed this site was the role that the Index plays in relation to existing catalogs of innovation data, such as the Lens Labs Apps and Data collection, the NBER Research Data portal, and Google Patents’ project to publish well-used datasets (public and private) as queryable resources on BigQuery. We hope the I3 Index will help track and version changes in these sources, while creating space for others to contribute similar guides and resources without needing the infrastructure to maintain their own website.

As a result, there is a section of the Index dedicated to Collections, where we invite contributions of and to curated sets of resources. These can be thought of as a ‘start point’ for a strand of research — for example, take a look at mine and Matt Marx’s Essential Patent Analysis Datasets collection, which is also rendered on the home page of the site. It is within this section that we also track and link out to collections curated by others in the community.

Contributing to & linking back to the Index

In the near term, a key goal is to gather contributions from a broad section of the innovation research community, with particular focus on the development of collections. If you have a dataset that you think people should know about, please add it to the Google Sheet! Likewise, if you see information about a dataset that’s inaccurate or incomplete, the sheet may be used to edit that information. In order to add or edit longer-form text, code, notes, or a collection, contributions can be made by making a pull request to the Github repository. Full instructions (with videos) about how to do these things may be found on the about page of the index site.

Another form of contribution that we are excited to see is how people use this data to produce other resources. Versioned .csv files are available for each of the different indexes (datasets, tools, data publishing platforms), within the Github repository. We also want to let more researchers know about the Index — so if you plan to make use of this in your research, do write about and cite it! As the site does not yet have its own DOI, for now you can simply reference the main URL, https://iiindex.org.

Next steps

Other than broadening contributions, a key next step is to automate more of the processes that obtain metadata about resources in the index, and run them on a regular basis, flagging any broken links and version changes, to ensure the site stays up-to-date. At present, contributions made through the GitHub are augmented with MediaWiki resource search results, and in the near term we’d like to expand that to include calls to common APIs such as Dataverse, BigQuery and Github, and also to replicate this behaviour in the Google Sheet.

One longer-term goal is also to index and link the schemas of datasets as well, so that someone could browse by the indexed variables of different datasets and explore how they might be composed.

This project is still at an early stage, and we are excited to see where it goes, and how people use it. Please email us witha any questions, feedback and feature requests.

Thanks to Ian Wetherbee for the recommendation of GitHub actions as a lightweight way to manage the index, and to Matt Marx, Lia Sheer and Cyril Verluise for support, feedback and contributions. This project was developed by the Innovation Information Initiative, and is supported by a grant from the Alfred P. Sloan Foundation.

Comments
0
comment

No comments here

Why not start the discussion?