The first year’s objectives include
Providing initial bulk patent and cited data dumps to collaborators
Progressing the new Lens patent data architecture to provide patent API
Investigating the INPADOC database for new data.
Providing a Lens Lab portal to the Lens, for easier integration into the workflows of researchers.
MIT scholarly works (1950-2018): https://s3.us-west-2.amazonaws.com/lens-public/export/6672cb1a-0f74-4fc1-8c6d-84598947dca7/lens-export.zip (372MB) delivered on September 1, 2019.
MIT scholarly works cited by patents (1950-2018): https://s3.us-west-2.amazonaws.com/lens-public/export/8978bdbd-4a83-4542-8ba8-9e4140a4911b/lens-export.zip (87.6MB) delivered on September 1, 2019
Patents citing MIT scholarly works from 1950-2018, in December 2019.
MIT draft patent portfolio (1950-2018), extracted using this search query https://link.lens.org/AuPUS7vlCMf delivered in December 2019.
Data on 43 top journals accessed via the Lens Scholarly API (This is beyond the scope of this year’s deliverables).
Ecole Polytechnique Federale de Lausanne (EPFL)-Sloan collaborator
Lens Scholarly API access in January 2020.
Provided patent full text (stripped of HTML tags) for “Nanotechnology” US patent applications for use in Nancy Kong and Adam Jaffe ‘s project on Patent Disclosure - An Economic Analysis Using Computational Linguistics.
Innovation datasets on human coronaviruses to the global community with additional support from Rockefeller foundation: the Lens was able to release the human coronavirus data initiative which lead to the public release of more than 38 patent and scholarly datasets including collections of biological sequences disclosed in patents (https://about.lens.org/covid-19/) (Also beyond the scope of this year’s deliverables).
Provided data publication and sharing advice to Laura for developing the data sharing guidelines and processes for the project.
Provided MIT and Sloan collaborators with the US full-text dataset from 2018, training datasets and other contributed data along with some quality control datasets under a non commercial license to engage a larger network of engineers and researchers interested in innovation and in new algorithms or applications to improve disambiguation of patent and scholarly data.
Building on Lens earlier work in this project, the engineers have built a new patent data store, ported the old Lens patent data from the US, EP full-text, and WIPO full-text data into it and are in the process of aligning it with a common data model. The improved architecture will be next extended to implement Elasticsearch text indexing of patent meta records and linking other functionality.
Figure 1. Refactored architecture of The Lens patent and scholarly data with current (dark blue) and planned (light blue) data sources.
To gain insights into user preferences re access to bulk patent data or through APIs and interoperability requirements of data elements, we have conducted an online Lens API and web service survey and we share the preliminary results from 150 respondents in Figure 2. For updated results, please see the online survey report at: https://lensorg.typeform.com/report/QM6aMm/id12pfJNEwm6lBUz.
Figure 2. Preliminary responses from a user-engaged survey on the Lens Scholarly and potentially patent API and its web services
We are pleased to report that a Beta version of the Lens patent API will be released by the end of June in a test environment. API Implementation, documentation, and support infrastructure will be modeled based on the Lens scholarly API, including data schema. Please see https://docs.api.lens.org/.
INPADOC is a unique but incremental database (continuously evolving) since 1978, covers data from about 60M applications of more than 70 patent authorities, and has now almost 300M legal events. In this year, the Lens team started investigating its complexity and learning about its various features such as legal status, and timeline of legal events, and estimated expiration date of a patent. Since 2017, EPO has started harmonizing legal events to align with WIPO ST.27 standard and introduced a classification hierarchy for legal events. By 2019, EPO has improved the coverage of EP, JP and US legal event data in the INPADOC database. These include:
EP: a revision of the operating processes; availability of more data
JP: earlier availability of legal event data; complete revision of the operating processes; gaps closed
US: availability of information on the status of US applications and patents.
In this upcoming year, the Lens will implement various features at different stages. For example, starting with the Beta release of the patent API , the Lens will release the first stage of the legal status activity for a published patent application or a granted patent along with US assignment data and will provide information on the Latest owner of a patent, whenever available.
MIT’s Knowledge Futures Group and Lens.org are developing a Lens Labs portal that highlights relevant patent datasets, engages diverse communities with the broader Lens open innovation data, which includes linked scholarly data, MetaRecords, and other knowledge innovation artifacts, and surfaces science and technology influence on our society through the use of diverse open and granular metrics. Requirements for the portal site on lens.org are now developed and we are in the process of implementing the various features, linking various resources, and testing the site in the staging environment at https://www.lens.org/lens/labs.
The site will feature links to the Lens API & Data facilities, the MIT bulk patent and scholarly works datasets and associated data schemas, as well as example dashboards for MIT and the Broad Institute. To enhance participation.
Presentation by OA Jefferson at the I3 Technical meeting in December 2019 (https://iii.pubpub.org/pub/2019-wg-agenda/release/3)
Presentation by OA Jefferson at the Broad institute on I3 and patent data collaboration in December 2019.
Webinar by OA Jefferson on the Lens open patent data to Science and technology observatory (OST) group in Paris who advises stakeholders in higher education by analyzing innovation research data and contributes to evaluation of public policies’ impact on October 2019
Webinar on prior art and patent data to IPOS (the Singapore patent office training branch) in March 2020.