Skip to main content

Requests for data from the USPTO

Published onAug 10, 2021
see published version
view latest release
Requests for data from the USPTO
·
connecting...

edit or comment on this doc

Community requests

  • Form || Responses

  • Discussion (Google Hangouts) : Monday 8/16 1300-1430 ET

  • Ask: ‘Describe any USPTO data that are not currently available publicly, and would foster useful research if systematically assembled and released.’

General groups of requests:

  1. Improve access to current data (public + private)

  2. Ask for or mandate clean identifiers in submissions

  3. Ask for or mandate new data in submissions (race + gender)

  4. Better tracking / mapping (of typos to cleaner names; of assignments)

Summary / TOC

Additional data
Create/use IDs for patent-related drug coumpounds (1.)
—> Policy proposal: Ask/mandate that applicants provide INNs for compounds

Use standardized company name/ID (2.)
—> Inform: Point to historical PTO work here (BHH)
—> Policy: Ask for self-reported ID? (SJK)

Ungranted patents pre-2000 (6.)
—> Policy: Small statutory change.
—> Process: Create general environment for confidential research access (similar to Census data centers / via the Chief Economists’s Office?)

Characteristics of assignees + inventors (7., BHH)
—> Policy proposal: Ask for collecting race + gender info. Considerable interest among researchers; find out how far the PTO has gone with this

Timeline + clustering of related TM registrations (11.)
—> Research: Asking applicants to link related marks? Better a research Q

Mapping assignees + patent similarity
Map assignees to company (2., 8.)
—> Ask: Code a correspondence b/t misspelled/nonstanard company names

Track assignee/acquisition history (3, 8, 9) - cf. dynamic assignee panel (9)
—> Policy ask: Add obligation of recording change in assignment
—> Inform: In principle this is in PAIR but it would be nice if it were more convenient to access. Of course that doesn’t solve the nonreporting problem - I am not sure how big this is. (BHH)

Publish pairwise (or closest historical) similarity (5.)
—> Research: Very popular request. Algorithm challenge; better imagined as a research API

Litigation data
Outcomes of patent litigation (4., 10.)
—> Research: Mostly outside USPTO scope. Ask David Schwartz about this

PTAB data:
Data on ex post outcomes specifically + related metrics of quality (10.)
Data currently hard to access via public PAIR (new), and limited release e.g. of rich metadata such as correspondence between applicant and examiner, disclosure of NPLs
—> Policy: recently this has become harder to access, no batch access
Todo: Write this up in detail? (OAJ)

Policy change requestsaddressed above
Offer a streamlined way for students/researchers to access private data (6.)
Invite TM registrants to indicate related marks (11.)
Require assignees to reveal the Real Party of Interest, for public benefit. (9.)

Other
Data feed of new USPTO data products (12.)
—> Research: in the PTO newsletter; add to catalog?
Hold more educational workshops for students (13.) —> request to the CEO
—> Req to the CEO: Workshop on what data is available; how to find new data
Where a request can’t be done yet, link to related resources (14.)
—> Research: add to catalog
Help researchers run periodic surveys (new)
—> Policy: maintain a process for this? (BS) PTO could maintain a panel of applicants to opt into surveys

Full responses

  1. [Lucy Xiaolu Wang]

    IDs for patented or patent-related drug compounds
    Category: additional data (pharma ID)

Data wish: clean & processed identifiers of patented or patent-related drug compounds understanding IP-related issues in the pharmaceutical sector.

Key features: understanding IP-related issues in the pharmaceutical sector

Research this facilitates: firms' strategic patenting behavior; drug prices and patents; currently most health economists don't use patent data given the lack of relevant training and easily accessible compound-specific patent data.

  1. [Josh Krieger 1]

    Data wish: Better mapping from patent assignees to standardized company name, company type and location (i.e., public: ticker, private company: address, incorporated/registered year; individual).
    Category: mapping + company ID

  2. [Josh Krieger 2]
    Data wish: Assignee and acquisition history
    Research this facilitates: Ex: Wyeth Pharma patent granted in 2008....would be nice to have a file indicating whether or not the patent became owned by Pfizer in the 2009 merger
    Category: additional data

  3. [Josh Krieger 3]
    Data wish: Outcomes of patent litigation (to go with the Patent litigation dockets file)
    Category: additional data

  4. [Josh Krieger 4]
    Data wish: Pairwise patent text similarity files (or at least max backwards similarity, overall and within CPC)
    Category: additional data + mapping

  5. [Heidi Williams]
    Pre-Nov.2000 data on PTO applications that were not granted patents
    + A streamlined process to visit/access this data for research.
    Category: additional (private) data, policy

Data wish: Pre-Nov 2000 data on applications to the USPTO that were not granted patents

Key features: The post-Nov 2000 "unsuccessful patent applications" data has been incredibly useful in facilitating a variety of research projects

Other requests: I understand that these data can't be made publicly available, but it would be great to set up -- if possible -- a standardized, streamlined process through which students or others could visit the USPTO to analyze this data for research purposes.

  1. [Yi Qian]
    Characteristics of assignees or innovators, merging individual + firm data.
    + Links to where else (EPO, &c) IP is being protected.
    + Providing panel/merged data if available. (or listing relevant sources)
    Category: additional data

Data wish: assignees' or innovators' characteristics, where else (eg. EPO, etc.) being IP protected.

Key features: The merge of individual and firm (not just public firms but also private ones) characteristics and other syndicated databases

Research this facilitates: These characteristics could help analyze incentives and responses to innovate in different environment

Other requests: Availability of panel data or merged data, if available

  1. [Xixi Hu]

    Map assignees to companies (name, ticker, other ID).
    Track assignment changes over time (eg., between companies)
    Category: mapping, company ID, assignment tracking

Data wish: Better mapping from patent assignee to company name or ticker, as well as better tracking of the assignment changes between companies.

Key features: To understand the intellectual property development in the pharmaceutical industry.

Research this facilitates: why do firms strategically choose research/patent areas, firm behaviors. Currently, there is still a gap in research on firm's motivation, efficiency and behaviors.

Other requests: A list of the different data source where students can go and merge data if the data cannot be released publicly.

  1. [Tim Simcoe 1]

    Standard data on the ultimate patent owner (“Real Party in Interest” [RPI])
    + Improve the dynamic assignee panel
    + Adopt rules requiring assignees to reveal RPI, for public benefit.
    Category: mapping, assignment tracking, policy change

Data wish: Standardized data on ultimate owner (or what attorneys call “Real Party in Interest”).

Key features: Creating improvements to the dynamic assignee panel

Research this facilitates: Better measurement for papers with assignee effects, and improved patent-to-firm matching

Other requests: Ideally, PTO should adopt rules requiring assignees to reveal RPI (for public benefit).

  1. [ Tim Simcoe #2 ]

    Data on ex post outcomes in PTAB and in courts.
    Category: new data (trial outcomes)

Data wish: Curated data on ex post outcomes in PTAB and courts.
There is lots to unpack here, but the idea is to track: Was it ever asserted?
Was it challenged at PTAB? Did a court rule on validity or infringement?
Did the owner make a public licensing commitment? See: https://patentlyo.com/patent/2021/06/contreras-shepardizing-patents.html

Key features: Systematic collection of ex post "quality" metrics

Research this facilitates: Would enable more research into relationship between prosecution and long-term indicators of "quality" as interpreted by PTAB and district courts

  1. [Other 1]
    Timeline of TM registrations.
    Clustering of related marks assigned to the same entities.
    Adopt rules inviting registrants to identify clusters of related marks
    Category: mapping, cluster ID, policy

Data wish: Timeline of trademark registrations, clustered by related marks, showing when each comes into and falls out of force

Key features: Explicit rather than implicit end dates; explicit clustering of related marks; some cluster ID

Research this facilitates: Understanding the evolution of a family of marks over time, seeing gaps in registration coverage; distinguishing when similar marks appear in different fields vs. when a founding company branches out into a new field under the old mark.

Other requests: A feed of new PTO data products of all kinds, w/ links to existing products that it enhances or replaces

  1. [Other 2]
    A feed of new PTO data products, referencing related / superseded products
    Category: new data feed

  2. [Other 3 : Lucy Wang]

Hold more educational workshops (online) for interested learners
Category: new workshops

  1. [Other 4 : Xixi Hu]
    Other request: For the above: where requested data exists but can’t [yet] be compiled for lack of time, add pointers to related data sources on the PTO site for the closest existing dataset
    Category: website update


[Template]
Data wish:
Key features:
Research this facilitates:
Other requests:

Comments
2
Samuel Klein:

10 million patents ^2 —> a quadrillion pairwise cosine comparisons. not small. that said, cutting pre-1976 scopes it down a bit. restricting to within-class cuts substantially. maybe it could be trillions or even billions. that could be computationally tractable. even so, it is definitely an AWS/Azure/GCloud task, nothing a university infrastructure could possibly handle. and that means expensive. [MM]

—> Some MapReduce solutions: linear?

Samuel Klein:

There is some question in my mind whether full mapping is the responsibility of the USPTO. But at least coding a correspondence between misspelled/ nonstandardized company names would make sense. I believe for legal reasons they cannot change the assignee name even if it is misspelled or not in a standard form (BHH)