Understanding the API
Source:vignettes/articles/understanding-the-api.Rmd
understanding-the-api.Rmd
Oh, the interesting things you’ll learn when you take the time to read the API’s documentation! Here are two gems gleaned from a jupyter notebook in PatentsView’s PatentsView-Code-Snippets repo.
Fields Shorthand
The notebook starts out fairly fluffy but things really get interesting really quickly. See this under “constructing your query”, I don’t remember seeing this anywhere else:
Some endpoints contain groups of fields representing related entities connected to one of that endpoint’s primary entity type; for example, the patent endpoint contains a field “inventors”, which contains information on all inventors associated with any given patent. The fields for related entities can be requested in the API request’s fields parameter as a group by using the group name in the fields parameter, or individually by specifying the required field as “{entity_type}.{subfield}”.
Mind blown, so we can, for example, request all the nested application fields from the patent endpoint by simply requesting “application” in the fields list.
library(patentsview)
query <- qry_funs$eq(patent_id = "10568228")
shorthand_results <- search_pv(query, fields = c("application"), method = "POST")
# Now that the R package uses httr2, we can use its last_request()
# to see what was POSTed to the API
cat(httr2::last_request()$body$data)
#> {"q":{"_eq":{"patent_id":"10568228"}},"f":["application","patent_id"],"s":[],"o":{"size":1000}}
# Here we view the results
shorthand_results$data$patent$application
#> [[1]]
#> application_id application_type filing_date series_code rule_47_flag
#> 1 15/995745 15 2018-06-01 15 FALSE
#> filing_type
#> 1 15
# Now we'll try to explicitly request all the application fields and make a POST to the API
explicit_fields <- get_fields("patent", groups = "application")
explicit_fields
#> [1] "application.application_id" "application.application_type"
#> [3] "application.filing_date" "application.filing_type"
#> [5] "application.rule_47_flag" "application.series_code"
explicit_results <- search_pv(query, fields = explicit_fields, method = "POST")
# but the R package figured out that the shorthand could be used instead
# so what was POSTed to the API is the same!
cat(httr2::last_request()$body$data)
#> {"q":{"_eq":{"patent_id":"10568228"}},"f":["application","patent_id"],"s":[],"o":{"size":1000}}
# and, of course, the results from the API are the same
explicit_results$data$patent$application
#> [[1]]
#> application_id application_type filing_date series_code rule_47_flag
#> 1 15/995745 15 2018-06-01 15 FALSE
#> filing_type
#> 1 15
# (Observation reported to the API team: application_type, series_code and filing_type
# all seem to have the same values and not just in this one example.)
The motivation to adopt the API’s shorthand is that, with a modest query, explicitly requesting all of the patent endpoint’s fields can be too much to send via a GET request (the resulting URL can exceed 4K).
Unexpected Results
Then, as if that wasn’t enough, some non-obvious behavior appears under the second bullet point under the “Queries using related entity fields” header:
When applying multiple conditions to related-entity fields, a central entity record will be returned if any combination of its related entities satisfy those conditions.
In their example, they use George Washington as an inventor. Humorously, there are modern inventors with that name! Abraham Lincoln is also used as an inventor. Good ol’ Abe is the only US president so far to receive a patent but it’s too early to be in the patentsview database and there are no modern Abraham Lincolns to be found as inventors.
To demonstrate the API’s not-exactly-intuitive behavior, we’ll keep George as an inventor but substitute Thomas Jefferson for Abe, as there are inventors going by that famous name, though they aren’t on nickels or two dollar bills in the US.
library(dplyr)
patents_query <-
with_qfuns(
or(
and(
text_phrase(inventors.inventor_name_first = "George"),
text_phrase(inventors.inventor_name_last = "Washington")
),
and(
text_phrase(inventors.inventor_name_first = "Thomas"),
text_phrase(inventors.inventor_name_last = "Jefferson")
)
)
)
patent_fields <-c("patent_id", "inventors.inventor_name_first", "inventors.inventor_name_last")
pat_res <- search_pv(patents_query, fields=patent_fields, endpoint="patent")
dl <- unnest_pv_data(pat_res$data)
# We got back all the inventors on the patents that met our search criteria. We'll filter out
# the inventors that didn't strictly meet our criteria (they're coinventors that came along for
# the ride with the ones that met our criteria), we want the noted behavior to be clear.
display_inventors <-
dl$inventors %>%
filter(grepl("^(George|Thomas)", inventor_name_first ) | grepl("^(Washington|Jefferson)", inventor_name_last))
display_inventors
#> patent_id inventor_name_first inventor_name_last
#> 1 10374815 Thomas J. Bonola
#> 2 10374815 Lorri L Jefferson
#> 3 10568228 George Elliott Washington
#> 4 10180440 Stanley T. Jefferson
#> 5 10180440 Thomas FAY
#> 6 11032709 Thomas J. Bonola
#> 7 11032709 Lorri L Jefferson
#> 8 10664808 Joel Washington
#> 9 7598629 George E. Burke, Jr.
#> 10 7598629 Rodney B. Washington
#> 11 8717367 Thomas M. Clifton
#> 12 8717367 Bradley C. Jefferson
#> 13 7971908 Thomas Tilly
#> 14 7971908 Thomas M. DiMambro
#> 15 7971908 Alfred A. Jefferson
#> 16 7144505 George Washington
#> 17 4104193 Thomas Jefferson
#> 18 4078607 Thomas Jefferson
#> 19 6881337 George Washington
#> 20 6905071 Thomas Amundsen
#> 21 6905071 George Kolis
#> 22 6905071 Matthew Jefferson
#> 23 6218441 George Washington
#> 24 5643452 George Washington
#> 25 5645778 George Washington
#> 26 5914971 George E. Burke, Jr.
#> 27 5914971 Rodney B. Washington
#> 28 5897817 George Washington
#> 29 5736046 George Washington
#> 30 8347213 Thomas M. Clifton
#> 31 8347213 Bradley C. Jefferson
Some rows act as you’d expect, like patent 4078607’s Thomas Jefferson. In others, two inventors combine to meet the search cititeria, like 6905071’s Thomas Amundsen and Matthew Jefferson. This might be a match we didn’t intend.
Now we’ll hit the inventor endpoint with a similar query, as the jupyter notebook suggests.
inventors_query <-
with_qfuns(
or(
and(
text_phrase(inventor_name_first = "George"),
text_phrase(inventor_name_last = "Washington")
),
and(
text_phrase(inventor_name_first = "Thomas"),
text_phrase(inventor_name_last = "Jefferson")
)
)
)
inventor_fields <- c("inventor_id","inventor_name_first","inventor_name_last")
inventor_res <- search_pv(inventors_query, fields=inventor_fields, endpoint="inventor")
actual_inventors <- unnest_pv_data(inventor_res$data)
actual_inventors[[1]]
#> inventor_id inventor_name_first inventor_name_last
#> 1 fl:ge_ln:washington-1 George Elliott Washington
#> 2 fl:ge_ln:washington-2 George Washington
#> 3 fl:th_ln:jefferson-1 Thomas Jefferson
Now, with actual_inventors’ inventor_ids in hand, we’ll ask the patent endpoint for their patents. The results are quite different than what the first query returned. (These patents would have names matching at least one of our two famous forefather’s names. The first query unintuitively matched names where the first and last name matches did not necessarily both occur on the same inventor.)
id_query <- qry_funs$eq(inventors.inventor_id=actual_inventors$inventors$inventor_id)
patent_fields <-c("patent_id", "inventors.inventor_name_first", "inventors.inventor_name_last",
"inventors.inventor_id")
pat_res <- search_pv(id_query, fields=patent_fields, sort=c(patent_id = "asc"))
dl <- unnest_pv_data(pat_res$data)
# we'll apply a similar name filter like we did on the first query's results
# we requested inventors.inventor_id but we also get back inventor, a HATEOAS link we don't need
display_inventors <-
dl$inventors %>%
filter(grepl("^(George|Thomas)", inventor_name_first ) | grepl("^(Washington|Jefferson)", inventor_name_last)) %>%
select(-inventor)
display_inventors
#> patent_id inventor_id inventor_name_first inventor_name_last
#> 1 10568228 fl:ge_ln:washington-1 George Elliott Washington
#> 2 4078607 fl:th_ln:jefferson-1 Thomas Jefferson
#> 3 4104193 fl:th_ln:jefferson-1 Thomas Jefferson
#> 4 5643452 fl:ge_ln:washington-2 George Washington
#> 5 5645778 fl:ge_ln:washington-2 George Washington
#> 6 5736046 fl:ge_ln:washington-2 George Washington
#> 7 5897817 fl:ge_ln:washington-2 George Washington
#> 8 6218441 fl:ge_ln:washington-2 George Washington
#> 9 6881337 fl:ge_ln:washington-2 George Washington
#> 10 7144505 fl:ge_ln:washington-2 George Washington
Acknowledgment
Again, credit goes to the Patentsview API team for creating the cited jupyter notebook. This is just portions of it in R package form. The repo doesn’t have a stated license but when I checked, I was told:
For the repo license we are looking at the GNU General Public License v3 (GPL3).
That is the same license as R itself so I don’t think we’ve violated anything. For extra fun check out Russ’ fork where there’s python code for retrieving Mr. Jefferson’s patents etc. There was no reply when we asked if they’d be receptive to a PR.