Add questions to the collaborative notes for this session
To conclude our Spring 2021 Workshop Series, we are hosting a set of small-group workshops on tools and techniques for sharing and editing datasets. When: 1200 EST / 1600 UTC, Wed, April 28th.
This is an open workshop, suitable for students, colleagues, and collaborators outside of academia who work with and transform public datasets.
We will open with an introduction to how to contribute to the I3 data catalog, and a brief overview of each breakout group. After that, the session will split into breakout rooms, each led by a member of the I3 community. Then we will reconvene to share summaries with the wider group.
Main session + Introduction : https://youtu.be/TsivuAsuPLM
Collaborative Data Design : https://youtu.be/LQMrLWO-efc
Data Cleaning + Reconciliation : https://youtu.be/pDn8L2BUGpE
Using BigQuery + Kaggle for data analysis : https://youtu.be/cwC-MwFCMEU
Collaborative Data Design: using GitHub as a data-sharing platform (Cyril Verluise, slides)
Theoretical models have long been thought with continuous improvement in mind. By contrast, datasets are often shared (if shared) as a snapshot, making collective continuous improvement difficult. In this session, I will argue that collaborative data design is both key for the future of empirical research and easy to implement. I will share practical insights on tools and workflows for implementing a collaborative data design standard. This session will be interactive, for anyone interested in developing and contributing to collaborative data design projects.
Data Cleaning and Reconciliation tools (Agnes Cameron + Sam Klein, slides)
A discussion about current tools for entity resolution and data cleaning, building a shared repository of scripts, and demonstrations of tools including OpenRefine.
Using BigQuery and Kaggle for data analysis and distribution (Ian Wetherbee + Jay Yonamine, slides)
Using BigQuery for performing large-scale analysis over multiple sets of patent data, and for distributing and collaborating on large data analyses
Open Discussion (Adam Jaffe)
Please edit the notes below; an overview will be posted here next week.
Google doc — One section per session.