Part of the 2020 I³ Fall Workshops series.
Geography seminar : Wed. 11/11, 12 pm – 1:10 pm ET
NB: to download these presentations, use the text links labeled “slides”
Gabriele Cristelli: Linking foreign inventors to immigrant-census profiles (slides)
Collaborative notes from the session:
Low coverage pre-1980 when it comes to digitisation, limiting understanding historical/long-term innovation dynamics (+ biased toward the US because of work on USPTO)
Contribution: Extending existing historical datasets from US, UK, France and Germany since 1900, in a ready-to-use and community maintained format
v5.0 for OCR and SpaCy for Named Entity Recognition, using a specific model trained with manual entries.
😎 Cool aside: some UK patents also list the occupation of the filer (e.g. Pattern Maker, Shoemaker)
Multiple inventors on one patent presents a specific issue. Geographic granularity varies a lot country-to-country.
😎Cool parallel work via Gaétan de Rassenfosse: the geocoded worldwide patent data since 1980; available in open access
Dataset not currently avaliable, but will be released in coming weeks. Subscribe to the newsletter for project updates.
Questions + Comments
C (Geert Boedt): The graph is based on the inventors/applicants data, which is equally available in EPO data. Great work :)
Q (Bronwyn Hall): The graph of German shares of German patents shows a break at around 1980, which is probably due to the introductin of the EPO. How should we think about EPO patents in this context - are they included if validated in Germany?
A: The best proxy for validation is the renewal fee.
Q: a number of us worry about the quality of OCR in extant datasets. do you know what version of Tesseract Google uses, and how do you quantify "considerable" improvement? It would be a contribution alone just to post better full-text OCR
A: Google did OCR years ago. Using their latest algorithm, results seem better than what’s in the Google Patents interface. Worth redoing this regularly. (check w/ Ian W about this!). For France, the challenge was to get all of the texts from the patent office. The plan is for us to release all of the texts and our OCR. then it takes 1-2 mo (in our practice) (to redo the OCR.
Q: The chart of net migration position covered 2001-10; how is this changing over time?
A: See figures 3 to 5 in this WIPO publication.
Q: the information about citizenship for historical USPTO, do is come from the patent document itself? or do you obtain it from other sources ? How do you deal with immigrants who naturalise ?
A: the information about citizenship for historical USPTO is obtained from the text itself.
(Bitsy Perlman) I'm not clear if the citizenship is consistently noted on 19th C US patents, thought it is noted from time to time.
(Antonin Bergeaud) It is actually an important challenge to understand changes in reporting behaviours in the patent publications (occupations, citizenship, inventor names etc.
(Andrea Morrison) to the best of my knowledge citizenship information declines after 1930s in USPTO, and then disappears
(Bitsy Perlman) The laws in the US as to whether foreginers pay different fees, etc, vary over time.
Q (Bronwyn Hall): Dubious about the post-1980 behaviour in Germany times series (dubious that the German share of patenting in germany has risen again after 1980). In addition to the increase you see, there’s splicing of data at 1980 and is it the splicing that is causing that change? Might be useful to overlap with de Rassenfosse et al dataset to check that you get the same counts.
A: We do do this overlap in order to validate. Graph stops at 1990, confusion over time scale.
Q (Tania Babina): Difficult to construct a panel that has consistent geographic units over a period of time… tracking changes in counties etc (US post-1900 is only about 13%, but pre-1900 is very non trivial). How do you address these issues?
A: That’s one of the big difficulties we face! For the US, we use a Google Maps API (specific link + version?) which does a decent job. You’ll have an indication that the street still exists. For France, we manually track changes in county names over time. For the US, counties that change were usually in areas w/ few patents, it didn’t impact the final dataset too much - but this is sth we could improve.
Q: (Eugenie Dugoua) It seems to me if you’re tracking changes in some employment data w/ other changes, you want that data to be available by historical data for the same region [even if there were no inventors there before]. I’ve found there is no accepted way to deal with this… would be great to discuss this offline at some point.
A: (Bitsy Perlman) I think this is where I plug my county crosswalk. I've got some programs that do geocoding that compare some name matches vs. google maps that I need to clean up and share. Also better v’s for 1790-1900, by decade, if of interest.
~80% of things come up OK, but the rest require some work. Best success is by working with both — name match with cross-validation. Hand-checking 10-20% of locations.
(AM) About historical counties and change in UA frontier perhaps this work by Bazzi provides some useful information
Identifying inventors with an immigrant visa in Switzerland. Matching to administrative records from Swiss immigration authorities (ZEMIS).
Using a fuzzy match approach adapted from Feigenbaum (2016), for when a ‘ground truth’ dataset is not available.
Switzerland presents an interesting case as very high proportion of immigrant (vs emigrant) inventors.
Used to measure immigration policy shock. Different distributions for different kinds of visas. Focus on cross-border visas to study impact of freedom of movement. In areas close to the border, invention increases dramatically with the introduction of free movement.
Data is not freely available due to confidentiality agreement w/ Swiss Federal Statistical Office. Can enquire for access. (can we ask for permanent access? -Ed.)
Questions for Gabriele:
Q: Could there be East German patentees among the "foreigners"?
Q: I have a general question regarding inventor names on patent publications. Is there variation (across countries and year) in the extend that inventors get their names added to patent publications? In particular, when inventors work as employees of firms, do we have evidence that inventor names get systematically reported? Thanks!
What is our current understanding of how IP is decided — do rights stay w/ the firm, does the inventor get the right to put their name on the publication, is this systematized (or tracked)?
A: (Tania Babina) I do not think there is any systematic evidence on the question you bring. However, prominent economic historians argue that, at least in the US, firms use standard patent assignment rights contracts were standard by 1900, which would suggest there is little incentive for firms to underreport actual inventors on patents.
Antonin: More generally need to understand reporting requirements over time.
Adam: At least in the US, firms do have an incentive to have a correct list of names of inventors. Having the wrong names could invalidate a patent.
Bronwyn: Internationally that’s currently true, but in the 19th century, even in the US, that might not be so true. Think about the German professorial system… it could be the person that puts up the money could have their name on the patent. This is really work for a historian.
Fabian: is it possible to research this, and send it around by email. A: yes!
Dietmar has some work that may be relevant: Institutionalized incentives for ingenuity
Felix Pöge: for Germany, inventor reporting only becomes mandatory with a law change in 1936, in the 20 years before that, where you actually have inventors separate from applicants these listings are heavily biased towards large firms
Francesco Lissoni: if you look at the number of patents that list Steve Jobs as an inventor, it scratches credibility. Younger and female inventors that appear on the publication do not appear on the patent, and that’s in the US in the past 30 years.
AJ: For perspective: not everything is patented. There’s a lot of noise there as a measure of invention. The idea that there’s another slip between cup and lip re: whether all inventors are the right names on patents is worth noting, but only part of this overall noise.
FL: Another example: comparing Pirelli tires and Michelin, graph of inventor centrality. There were 3 people on all the Pirelli patents, in Michelin no central nodes. In France, different requirements for listing inventors than in Italy.
BP: I've seen some modern patents from small startups that seem to just list everyone at the company on the patent. Not sure how risky this is.
C: (Fabian Gessler) It would be interesting to find out whether the order of inventors listed on the publication follows any kind of pattern. I see more and more papers arguing that the position indicates their role (first inventor had the idea vs. last inventor is manager), similar to scientific articles in the natural sciences