Getting started

The new version of the API requires an API key, or all of your requests will be rejected. Request an API key using this link: https://patentsview-support.atlassian.net/servicedesk/customer/portals Once you have one, you’ll need to set an environmental variable PATENTSVIEW_API_KEY to the value of your API key for the R package to use.

A basic example

Let’s start with a basic example of how to use the package’s primary function, search_pv():

library(patentsview)

search_pv(
  query = '{"_gte":{"patent_date":"2007-01-01"}}',
  endpoint = "patent",
  fields = c("patent_id", "patent_title", "patent_date")
)
#> $data
#> #### A list with a single data frame on patents level:
#> 
#> List of 1
#>  $ patents:'data.frame': 1000 obs. of  3 variables:
#>   ..$ patent_id   : chr [1:1000] "10045335" ...
#>   ..$ patent_title: chr [1:1000] "Method of delivering data for use by base s"..
#>   ..$ patent_date : chr [1:1000] "2018-08-07" ...
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 5,530,246

This call to search_pv() sends our query to the patent endpoint (the default). The API has 27 endpoints, corresponding to 27 different entity types. patent/rel_app_text and publication/rel_app_text both both return a rel_app_text entity, though they are slightly different. Here is the list of entities the API returns: assignees, attorneys, cpc_classes, cpc_groups, cpc_subclasses, foreign_citations, g_brf_sum_texts, g_claims, g_detail_desc_texts, g_draw_desc_texts, inventors, ipcs, locations, otherreferences, pg_brf_sum_texts, pg_claims, pg_detail_desc_texts, pg_draw_desc_texts, patents, publications, rel_app_texts, us_application_citations, us_patent_citations, uspc_mainclasses, uspc_subclasses, wipo.¹ Your choice of endpoint determines which entity your query is applied to, as well as the structure of the data that is returned (more on this in the “27 endpoints for 27 entities section”). For now, let’s turn our attention to the query parameter.

Writing queries

The PatentsView query syntax is documented on their query language page note also the change to the Options parameter for the new version of the API mentioned on that page.² However, it can be difficult to get your query right if you’re writing it by hand (i.e., just writing the query in a string like '{"_gte":{"patent_date":"2007-01-01"}}', as we did in the example shown above). The patentsview package comes with a simple domain specific language (DSL) to make writing queries a breeze. I recommend using the functions in this DSL for all but the most basic queries, especially if you’re encountering errors and don’t understand why. To get a feel for how it works, let’s rewrite the query shown above using one of the functions in the DSL, qry_funs$gte():

qry_funs$gte(patent_date = "2007-01-01")
#> {"_gte":{"patent_date":"2007-01-01"}}

More complex queries are also possible:

with_qfuns(
  and(
    gte(patent_date = "2007-01-01"),
    text_phrase(patent_abstract = c("computer program", "dog leash"))
  )
)
#> {"_and":[{"_gte":{"patent_date":"2007-01-01"}},{"_or":[{"_text_phrase":{"patent_abstract":"computer program"}},{"_text_phrase":{"patent_abstract":"dog leash"}}]}]}

Check out the writing queries vignette for more details on using the DSL.

Fields

Each endpoint has a different set of fields. The new version of the API allows all fields to be queried. You can specify which fields you want using the fields argument. If you don’t specify any, you will get the primary key(s) for the specified endpoint.

# search_pv defaults the endpoint parameter to "patent" if not specified
result = search_pv(
  query = '{"_gte":{"patent_date":"2007-01-01"}}',
  fields = c("patent_id", "patent_title")
)
result
#> $data
#> #### A list with a single data frame on patents level:
#> 
#> List of 1
#>  $ patents:'data.frame': 1000 obs. of  2 variables:
#>   ..$ patent_id   : chr [1:1000] "10045335" ...
#>   ..$ patent_title: chr [1:1000] "Method of delivering data for use by base s"..
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 5,530,246

To list all of the fields for a given endpoint, use get_fields():

retrvble_flds <- get_fields(endpoint = "patent")
head(retrvble_flds)
#> [1] "applicants.applicant_designation"  "applicants.applicant_name_first"  
#> [3] "applicants.applicant_name_last"    "applicants.applicant_organization"
#> [5] "applicants.applicant_sequence"     "applicants.applicant_type"

Nested fields can be fully qualified or a new API shorthand can be used, where group names can specified. When group names are used, all of the group’s nested fields will be returned by the API. E.g., the new version of the API and R package will accept fields=c("applicants")

See the Swagger UI page for the API, the fields returned are listed for each endpoint in the 200 Response body sections. The API’s endpoint documentation has a similar look and feel.

You can also visit an endpoint’s online documentation page to see a list of its fields (e.g., see the inventor field list table). In earlier versions of the API not all fields were queryable as they are now. The field tables for all of the endpoints can be found in the fieldsdf data frame, which you can load using data("fieldsdf") or View(patentsview::fieldsdf).

An important note: PatentsView uses disambiguated versions of assignees, inventors, and locations, instead of raw data. For example, let’s say you search for all inventors whose first name is “john.” The PatentsView API is going to return all of the inventors who have a preferred first name (as per the disambiguation results) of john, which may not necessarily be their raw first name. You could be getting back inventors whose first name appears on the patent as, say, “jonathan,” “johnn,” or even “john jay.”, see the PatentsView Inventor Disambiguation Technical Workshop website.

In the original version of the API, rawinventor_first_name and rawinventor_last_name were available from the patents, inventors and assignees endpoints. In the new version of the API these fields are no longer available.

Paginated responses

By default, search_pv() returns 1,000 records per page and only gives you the first page of results. I suggest starting with something smaller, like the size = 150 below, while you’re figuring out the details of your request, such as the query you want to use and the fields you want returned. Once you have those items finalized, you can use the size argument to download up to 1,000 records per page.

You can download all pages of output in one call by setting all_pages = TRUE. This will set size equal to 1,000 and loop over all pages of output:

fields <- c("patent_id", "inventors.inventor_name_last", "inventors.inventor_name_first")
search_pv(
  query = qry_funs$eq(inventors.inventor_name_last = "Chambers"),
  all_pages = TRUE, size = 1000, fields = fields
)
#> $data
#> #### A list with a single data frame (with list column(s) inside) on patents level:
#> 
#> List of 1
#>  $ patents:'data.frame': 2497 obs. of  2 variables:
#>   ..$ patent_id: chr [1:2497] "10000988" ...
#>   ..$ inventors:List of 2497
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 2,497

See the result set paging vignette for information on custom paging.

Entity counts

Our last two calls to search_pv() gave the same value for total_hits, even though we got a lot more data from the second call. This is because the entity counts returned by the API refer to the number of distinct entities across all downloadable pages of output, not just the page that was returned.

27 endpoints for 27 entities

With the recent API change, the patent endpoint supplies the basic patent data and the other endpoints return more specific data for those patents.

get_endpoints()
#>  [1] "assignee"                       "cpc_class"                     
#>  [3] "cpc_group"                      "cpc_subclass"                  
#>  [5] "g_brf_sum_text"                 "g_claim"                       
#>  [7] "g_detail_desc_text"             "g_draw_desc_text"              
#>  [9] "inventor"                       "ipc"                           
#> [11] "location"                       "patent"                        
#> [13] "patent/attorney"                "patent/foreign_citation"       
#> [15] "patent/other_reference"         "patent/rel_app_text"           
#> [17] "patent/us_application_citation" "patent/us_patent_citation"     
#> [19] "pg_brf_sum_text"                "pg_claim"                      
#> [21] "pg_detail_desc_text"            "pg_draw_desc_text"             
#> [23] "publication"                    "publication/rel_app_text"      
#> [25] "uspc_mainclass"                 "uspc_subclass"                 
#> [27] "wipo"

query <- qry_funs$eq(inventors.inventor_name_last = "Chambers")

# Here we'll request patent_id and the inventor fields from the patent endpoint
fields <- get_fields(endpoint = "patent", groups ="inventors")
fields <- c("patent_id", fields)
fields
#> [1] "patent_id"                     "inventors.inventor_id"        
#> [3] "inventors.inventor_city"       "inventors.inventor_country"   
#> [5] "inventors.inventor_name_first" "inventors.inventor_name_last" 
#> [7] "inventors.inventor_sequence"   "inventors.inventor_state"

result <- search_pv(query, endpoint = "patent", fields = fields)
result
#> $data
#> #### A list with a single data frame (with list column(s) inside) on patents level:
#> 
#> List of 1
#>  $ patents:'data.frame': 1000 obs. of  2 variables:
#>   ..$ patent_id: chr [1:1000] "10046778" ...
#>   ..$ inventors:List of 1000
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 2,497

# Here's the first inventors
result$data$patents$inventors[[1]]
#>                                                                inventor
#> 1      https://search.patentsview.org/api/v1/inventor/fl:st_ln:crane-1/
#> 2   https://search.patentsview.org/api/v1/inventor/fl:mi_ln:chambers-4/
#> 3 https://search.patentsview.org/api/v1/inventor/fl:to_ln:yarrington-1/
#> 4      https://search.patentsview.org/api/v1/inventor/fl:da_ln:bardo-1/
#> 5      https://search.patentsview.org/api/v1/inventor/fl:ch_ln:pallo-1/
#> 6     https://search.patentsview.org/api/v1/inventor/fl:se_ln:gitmez-1/
#> 7     https://search.patentsview.org/api/v1/inventor/fl:ph_ln:tullai-1/
#>             inventor_id inventor_name_first inventor_name_last
#> 1      fl:st_ln:crane-1     Stephen Michael              Crane
#> 2   fl:mi_ln:chambers-4               Misty           Chambers
#> 3 fl:to_ln:yarrington-1                Todd         Yarrington
#> 4      fl:da_ln:bardo-1               David              Bardo
#> 5      fl:ch_ln:pallo-1               Chris              Pallo
#> 6     fl:se_ln:gitmez-1              Serkan             Gitmez
#> 7     fl:ph_ln:tullai-1                Phil             Tullai
#>   inventor_gender_code                 inventor_location_id inventor_city
#> 1                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 2                    F 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 3                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 4                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 5                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 6                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#> 7                    M 9100070f-16c8-11ed-9b5f-1234bde3cd05          Erie
#>   inventor_state inventor_country inventor_sequence
#> 1             PA               US                 0
#> 2             PA               US                 5
#> 3             PA               US                 3
#> 4             PA               US                 1
#> 5             PA               US                 6
#> 6             PA               US                 2
#> 7             PA               US                 4

# Now we will see what the inventor endpoint returns for a similar query.
# We use get_fields() to get all the available for the inventor endpoint.
query <- qry_funs$eq(inventor_name_last = "Chambers")
fields <- get_fields(endpoint = "inventor")

search_pv(query, endpoint = "inventor", fields = fields)
#> $data
#> #### A list with a single data frame (with list column(s) inside) on inventors level:
#> 
#> List of 1
#>  $ inventors:'data.frame':   442 obs. of  16 variables:
#>   ..$ inventor_id                 : chr [1:442] "8au06rg5lq96f7pqd62sfgq8q" ...
#>   ..$ inventor_name_first         : chr [1:442] "Dwight M." ...
#>   ..$ inventor_name_last          : chr [1:442] "Chambers" ...
#>   ..$ inventor_gender_code        : chr [1:442] "M" ...
#>   ..$ inventor_lastknown_city     : chr [1:442] "Atlanta" ...
#>   ..$ inventor_lastknown_state    : chr [1:442] "GA" ...
#>   ..$ inventor_lastknown_country  : chr [1:442] "US" ...
#>   ..$ inventor_lastknown_latitude : num [1:442] 33.7 ...
#>   ..$ inventor_lastknown_longitude: num [1:442] -84.4 ...
#>   ..$ inventor_lastknown_location : chr [1:442] "https://search.patentsview.o"..
#>   ..$ inventor_num_patents        : int [1:442] 1 1 ...
#>   ..$ inventor_num_assignees      : int [1:442] 2 1 ...
#>   ..$ inventor_first_seen_date    : chr [1:442] "2023-01-03" ...
#>   ..$ inventor_last_seen_date     : chr [1:442] "2023-01-03" ...
#>   ..$ inventor_years_active       : num [1:442] 1 1 ...
#>   ..$ inventor_years              :List of 442
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 442

Your choice of endpoint determines two things:

Which entity your query is applied to. The first call shown above used the patent endpoint, so the API searched for patents that have at least one inventor listed on them with the last name “Chambers.” The second call used the inventor endpoint to show what it returns for a similar query.
The structure of the data frame that is returned. The first call returned a data frame on the patent level, meaning that each row corresponded to a different patent. Fields that were not on the patent level (e.g., inventors.inventor_name_last) were returned in list columns that are named after the entity associated with the field (e.g., the inventors entity).³ Meanwhile, the second call gave us a data frame on the inventor level (one row for each inventor) because it used the inventor endpoint.

Most of the time you will want to use the patent endpoint. Note that you can still effectively filter on fields that are not at the patent-level when using the patent endpoint (e.g., you can filter on assignee name or CPC category). This is because patents are relatively low-level entities. For higher level entities like assignees, if you filter on a field that is not at the assignee-level (e.g., inventor name), the API will return data on any assignee that has at least one inventor whose name matches your search, which is probably not what you want.

FAQs

I’m sure my query is well formatted and correct but I keep getting an error. What’s the deal?

The API query syntax guidelines do not cover all of the API’s behavior. Specifically, there are several things that you cannot do which are not documented on the API’s webpage. The writing queries vignette has more details on this. You can also try the string version of your query in the API’s Swagger UI page. Its error messages can sometimes help determine the problem.

Now that the R package is using httr2, users can make use of its last_request() method to see what was sent to the API. This could be useful when trying to fix an invalid request.

httr2::last_request()

Does the API have any rate limiting/throttling controls?

Yes, the API currently allows 45 calls per minute for each API key. If this limit is exceeded the API will return an http status of 429 with a response header Retry-After set to the number of seconds to wait before making subsequent requests. The R package should handle this for you. You will need to request an API key and set the environmental variable PATENTSVIEW_API_KEY to the value of your key.

How do I access the data frames inside the list columns returned by `search_pv()`?

Let’s consider the following data, in which patents are the primary entity while “application”, “assignees”, and “gov_interest_organizations” are the secondary entities (also referred to as subentities):

# Create field list -
fields <- c("patent_id", "patent_date", "patent_title",
  "assignees", "application",  "gov_interest_organizations" )

# Pull data
res <- search_pv(
  query = qry_funs$text_any(inventors.inventor_name_last = "Smith"), 
  endpoint = "patent", 
  fields = fields
)
res$data
#> #### A list with a single data frame (with list column(s) inside) on patents level:
#> 
#> List of 1
#>  $ patents:'data.frame': 1000 obs. of  6 variables:
#>   ..$ patent_id                 : chr [1:1000] "10045399" ...
#>   ..$ patent_title              : chr [1:1000] "System and method for providi"..
#>   ..$ patent_date               : chr [1:1000] "2018-08-07" ...
#>   ..$ application               :List of 1000
#>   ..$ assignees                 :List of 1000
#>   ..$ gov_interest_organizations:List of 1000

res$data has vector columns for those fields that belong to the primary entity (e.g., res$data$patents$patent_id) and list columns for those fields that belong to any secondary entity (e.g., res$data$patents$gov_interest_organizations). You have two good ways to pull out the data frames that are nested inside these list columns:

Use tidyr::unnest. (This is probably the easier choice of the two).

library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:magrittr':
#> 
#>     extract

# Get assignee data:
res$data$patents %>% 
  unnest(assignees) %>%
  head()
#> # A tibble: 6 × 16
#>   patent_id patent_title            patent_date application assignee assignee_id
#>   <chr>     <chr>                   <chr>       <list>      <chr>    <chr>      
#> 1 10045399  System and method for … 2018-08-07  <df>        https:/… f0c24f2e-5…
#> 2 10045452  Electronic device stru… 2018-08-07  <df>        https:/… 86c4fec0-2…
#> 3 10045764  Minimally invasive imp… 2018-08-14  <df>        https:/… 90fc558b-6…
#> 4 10045807  Bone positioning and p… 2018-08-14  <df>        https:/… f3952062-5…
#> 5 10045844  Post-implant accommoda… 2018-08-14  <df>        https:/… e9dcc023-d…
#> 6 10045989  Quinazoline derivative… 2018-08-14  <df>        https:/… 520ef995-7…
#> # ℹ 10 more variables: assignee_type <chr>,
#> #   assignee_individual_name_first <chr>, assignee_individual_name_last <chr>,
#> #   assignee_organization <chr>, assignee_location_id <chr>,
#> #   assignee_city <chr>, assignee_state <chr>, assignee_country <chr>,
#> #   assignee_sequence <int>, gov_interest_organizations <list>

Use patentsview::unnest_pv_data. unnest_pv_data() creates a series of data frames (one for each entity level) that are like tables in a relational database. You provide it with the data returned by search_pv() and a field that can act as a unique identifier for the primary entities:

unnest_pv_data(data = res$data, pk = "patent_id")
#> List of 4
#>  $ application               :'data.frame':  1000 obs. of  7 variables:
#>   ..$ patent_id       : chr [1:1000] "10045399" ...
#>   ..$ application_id  : chr [1:1000] "14/883264" ...
#>   ..$ application_type: chr [1:1000] "14" ...
#>   ..$ filing_date     : chr [1:1000] "2015-10-14" ...
#>   ..$ series_code     : chr [1:1000] "14" ...
#>   ..$ rule_47_flag    : logi [1:1000] FALSE ...
#>   ..$ filing_type     : chr [1:1000] "14" ...
#>  $ assignees                 :'data.frame':  981 obs. of  12 variables:
#>   ..$ patent_id                     : chr [1:981] "10045399" ...
#>   ..$ assignee                      : chr [1:981] "https://search.patentsview"..
#>   ..$ assignee_id                   : chr [1:981] "f0c24f2e-5fe5-4945-84e6-8b"..
#>   ..$ assignee_type                 : chr [1:981] "2" ...
#>   ..$ assignee_individual_name_first: chr [1:981] NA ...
#>   ..$ assignee_individual_name_last : chr [1:981] NA ...
#>   ..$ assignee_organization         : chr [1:981] "AT&T Intellectual Property"..
#>   ..$ assignee_location_id          : chr [1:981] "ec2f0cf3-16c7-11ed-9b5f-12"..
#>   ..$ assignee_city                 : chr [1:981] "Atlanta" ...
#>   ..$ assignee_state                : chr [1:981] "GA" ...
#>   ..$ assignee_country              : chr [1:981] "US" ...
#>   ..$ assignee_sequence             : int [1:981] 0 0 ...
#>  $ gov_interest_organizations:'data.frame':  45 obs. of  5 variables:
#>   ..$ patent_id     : chr [1:45] "10045989" ...
#>   ..$ fedagency_name: chr [1:45] "National Institutes of Health" ...
#>   ..$ level_one     : chr [1:45] "Department of Health and Human Services" ...
#>   ..$ level_two     : chr [1:45] "National Institutes of Health" ...
#>   ..$ level_three   : chr [1:45] NA ...
#>  $ patents                   :'data.frame':  1000 obs. of  3 variables:
#>   ..$ patent_id   : chr [1:1000] "10045399" ...
#>   ..$ patent_title: chr [1:1000] "System and method for providing integrated "..
#>   ..$ patent_date : chr [1:1000] "2018-08-07" ...

Now we are left with a series of flat data frames instead of having a single data frame with other data frames nested inside of it. These flat data frames can be joined together as needed via the primary key (patent_id) for this endpoint.