Skip to main content

Lens Lab update, June 2020

Published onMay 29, 2020
Lens Lab update, June 2020

The first year’s objectives

  1. Providing initial bulk patent and cited data dumps to collaborators

  2. Progressing the new Lens patent data architecture to provide patent API

  3. Investigating the INPADOC database for new data.  

  4. Providing a Lens Lab portal to the Lens, for easier integration into the workflows of researchers.

1. Data provision and assistance to collaborators

  1. MIT datasets:

    • Patents citing MIT scholarly works from 1950-2018, in December 2019. (link?)

    • Data on 43 top journals accessed via the Lens Scholarly API

  2. Ecole Polytechnique Federale de Lausanne (EPFL)-Sloan collaborator

    • Lens Scholarly API access in January 2020.

  3. Patent full text (stripped of HTML tags) for “Nanotechnology” US patent applications for use in Patent Disclosure - An Economic Analysis Using Computational Linguistics

  4. Innovation datasets on human coronaviruses to the global community  with additional support from Rockefeller foundation: the Lens was able  to release the human coronavirus data initiative which lead to the public release of more than 38 patent and scholarly datasets including collections of biological sequences disclosed in patents (

  5. Provided  MIT and Sloan collaborators with the US full-text dataset from 2018, training datasets and other contributed data along with some quality control datasets under a non commercial license to engage a larger network of engineers and researchers interested in innovation and in new algorithms or applications to improve disambiguation of patent and scholarly data.

2. Improved patent architecture

Building on Lens earlier work in this project, the engineers have built a new patent data store, ported the old Lens patent data from the US, EP full-text, and WIPO full-text data into it and are in the process of aligning it with a common data model. The improved architecture will be next extended to implement Elasticsearch text indexing of patent meta records and linking other functionality.

Figure 1. Refactored architecture of The Lens patent and scholarly data with current (dark blue) and planned (light blue) data sources.

API survey to better understand users needs

To gain insights into user preferences re access to bulk patent data or through APIs and interoperability requirements of data elements, we have conducted an online Lens API and web service survey and we share the preliminary results from 150 respondents in Figure 2
For updated results, please see the online survey report at:

Figure 2. Preliminary responses from a user-engaged survey on the Lens Scholarly and potentially patent API and its web services

We are pleased to report that a Beta version of the Lens patent API will be released by the end of June in a test environment. API Implementation, documentation, and support infrastructure  will be modeled based on the Lens scholarly API, including data schema. Please see

3. Investigating the INPADOC database

INPADOC is a unique but incremental database (continuously evolving) since 1978, covers data from about 60M applications of more than 70 patent authorities, and has now almost 300M legal events.  In this year, the Lens team started investigating its complexity and learning about its various features such as legal status, and timeline of legal events, and estimated expiration date of a patent. Since 2017, EPO has started harmonizing legal events to align with WIPO ST.27 standard and introduced a classification hierarchy for legal events. By 2019, EPO has improved the coverage of EP, JP and US legal event data in the INPADOC database. These include:

  • EP: a revision of the operating processes; availability of more data

  • JP: earlier availability of legal event data; complete revision of the operating processes; gaps closed

  • US: availability of information on the status of US applications and patents.

In this upcoming year,  the Lens will implement various features at different stages.  For example, starting with the Beta release of the patent API , the Lens will release the first stage of the legal status activity for a published patent application or a granted patent along with US assignment data and will provide information on the Latest owner of a patent, whenever available.

4. Lens Labs portal in the Lens

MIT’s Knowledge Futures Group and are developing a Lens Labs portal that highlights relevant patent datasets, engages diverse communities with the broader Lens open innovation data, which includes linked scholarly data, MetaRecords, and other knowledge innovation artifacts, and surfaces science and technology influence on our society through the use of diverse open and granular metrics.  Requirements for the portal site on are now developed and we are in the process of implementing the various features,  linking various resources, and testing the site in the staging environment at

The site will feature links to the Lens API & Data facilities, the MIT bulk patent and scholarly works datasets and associated data schemas, as well as example dashboards for MIT and the Broad Institute. To enhance participation.


  1. Presentation by Aaron Ballagh at the Center of Behaviour Economics, society, and Technology (BEST)  meeting at QUT (Event page, PDF)

  2. Presentation by OA Jefferson at the I3 Technical meeting in December 2019 (

  3. Presentation by OA Jefferson at the Broad institute on I3 and patent data collaboration in December 2019.

  4. Webinar by OA Jefferson on the Lens open patent data to Science and technology observatory (OST) group in Paris who advises stakeholders in higher education by analyzing innovation research data and contributes to evaluation of public policies’ impact on October 2019

  5. Webinar on prior art and patent data to IPOS (the Singapore patent office training branch) in March 2020.

No comments here
Why not start the discussion?