At the end of last year, the first iteration of the iiindex was presented at I3’s 2021 Technical Working Group meeting. Currently it functions as a living repository of pointers to innovation datasets, curated and used by the I3 community, with an ongoing record of current status and features detailed here. Since then, we have developed open-source versions of the tool that can be used by anyone, prototyped OAI-PMH and expanded metadata scraping, schema indexing, and added functionality for integrating relationships between records. For a full overview of development, see the Status and Updates doc.
Continued development will have two overarching directions: the first is the development of the tool itself, expanding what we index, and adding features to the platform, and expanding contributions, and the second involves the release of a general version of the tool for other communities interested in indexing datasets.
To contact us about anything mentioned in this release, or more generally, please email [email protected].
A key step in using datasets to answer research questions is knowing what values are indexed by particular datasets — e.g. whether a dataset includes patent identifiers, PubMed IDs, trademark information, etc. Having developed tools to index schema fields, and fuzzy-match to collaboratively sourced ‘fields of interest’, the next step is to develop a search tool that can make use of this expanded metadata.
While some of this work will be technical, much will be based on designing flows that help people find what they need by linking together different pieces of information: imagining interfaces that allow people to ask questions about multiple datasets.
Open Question: What kind of fields would you like to be able to filter for?
Additionally, having added the functionality to include relationships between datasets, we’d like to invite people to add any that they’re aware of and expand the i3 data citation network.
Ask: Would you like to add a collection?
Making the index easier to find, both when searching for datasets and also by looking at related pages (e.g. faculty data/research pages). Maybe the addition of a ‘find me on the i3 index’ HTML badge…
Ask: Can you link to the iiindex on your faculty webpage?
As the amount of relational data stored by the index grows (e.g. recording relationships, schemas, other one-to-many relations), so it might become more useful to publish a fully-relational version of the index, as well as the flattened version that can be seen in the Google Sheet or the archived .csv files. We would plan to do this using the Datasette tool, compiling a SQLite database at the same time as the site gets compiled. This will open up a second way to query the index, and allow us to explore possibilities for incorporating more structured data.
Open Question: Let us know if you have any use cases for a database of this format.
We would like to include integrations with open-source automation tools like n8n and Zapier, as well as formalising the Github actions used. This will allow us to have an effectively ‘open API’ (currently just a spreadsheet), and create more opportunities for people to do what they like with the data. At present, we have expanded the open-source version to work with Airtable instead of Google Sheets, but ideally that flexibility could be built into the tool (so you don’t need a different code base to integrate with a different input).
Open Question: what integrations do you already use in your work currently?
We have released an open-source version of the tool that replicates the core functionality of the iiindex, without tying whoever is using it to a particular use-case. This has been used already by the HelioPhysics Knowledge Network to collaboratively track useful tools, and is being experimented with by other groups. We’d like to continue to make this tool as easy-to-setup and robust as possible.
Open Question: If you are involved in, or know of, another dataset indexing project, please get in touch.