Skip to main content

ML + data analysis roadmap

Published onJun 01, 2021
see published version
view latest release
ML + data analysis roadmap

Edit this page

Open tasks

  • Start contributing to the repository for (interlace/iii) code + data.

    • identify code bases + scripts that should go here (articles, grants, patents)

  • Outline how the Explore code is working. Link to code + its status

    • Metrics generation (Neo4j on AWS running a block of code)

    • Visualization code (Explore)

  • Outline how PatCit modules work (Cyril Verluise)

    • Schemas + parsers for new citation types

  • Outline how current Lever grants data is analyzed

    • How Torque sites work (the platform that hosts their grant docs)

    • Migrate existing small repo

    • Brainstorm visualizations we could generate from current proposal-docs (w/ limited cites + metadata): topic clusters, language level, geocoding?

  • Briefly review + comment on goals for the OAG commons

  • Settle on toolchain

    • chat — Zulip: interlace

    • notes + docs — Pubpub (regular log; this roadmap)

    • issues — Github: (iii repo)

General templates

  • Building an overview

    • Index; Concordance

    • Sources; Outputs; Pipelines

    • Registry; Services/endpoints

  • Self-documenting work

    • Roadmap

    • Daily log

    • Weekly chat (Thurs)


Explore: Scaling Science

  • Current explorer: v2

    • Description, JJ’s roadmap

    • Ideal use cases

    • Index: Metrics + implementations

    • ?Product: Lens-like API for a wide range of metrics. host for computing open metrics on open data

    • ?Product: OWID-style source of citable visuals

    • Dataset product: metrics by year, indexed to [SSID]

  • Defining metrics

Explore: Scaling Grants

  • Current Grants framework

  • Ideas

    • Index: Parallels + partners

    • ?Product (site): catalog of vetted but unfunded grant proposals

    • ?Dataset product: citation/coauthor graph? for grants

    • Viz product: visualizations of Lever/SOLVE proposal networks

Open academic graph

  • MAG and what comes next

  • Derivative / supplemental datasets we can contribute to the OAG

  • Use cases

  • Dataset product: concordance of components + derivatives


No comments here

Why not start the discussion?