Skip to contents

This is a proposed Tech Note to be submitted to rOpenSci. It’s here as an Rmd so it will be knitted by the build process but can be submitted as an md file.

The Patentsview API team has released a new version of their API, which is used by a correspondingly new version of the patentsview R package. The problem for users is that the API team has made breaking changes, existing programs will not run with the new version of the R package. Please don’t shoot the messenger!

The new version of the R package handles some of the API team’s changes where possible, however an API key is now required. The Patentsview API team plans to shutdown the original version of the API in the near future. At that time the original version of the R package will stop working.

The original version of the R package is available on CRAN with the new version available on r-universe. After the original version of the API is shutdown, the updated R package will be submitted to CRAN.

User Impacting API changes:

  1. Users will need to request an API key and set an environmental variable PATENTSVIEW_API_KEY to its value.

  2. Endpoint changes:

    • The nber_subcategories, one of the original seven endpoints, was removed
    • cpc_subsections is now cpc_group
    • The remaining five original endpoints went from plural to singular, “patents” is now “patent” for example. Interestingly, the returned data structures are still plural for the most part.
    • There are now 27 endpoints, some may need to be called to retrieve fields that are currently (soon-to-be-were) available from the original endpoints (now some endpoint’s returns are lighter, requiring additional calls to be made).
    • Now some of the endpoints return HATEOAS (Hypermedia as the Engine of Application State) links to retrieve more data (URLs for additional calls back to the API)
  3. Some fields are now nested and need to be fully qualified when used in a query, for instance, search_pv('{"cpc_current.cpc_group_id":"A01B1/00"}') when using the patent endpoint.

    In the fields parameter, nested fields can be fully qualified or a new API shorthand can be used, where group names can specified. When group names are used, all of the group’s nested fields will be returned by the API. For example, the new version of the API and R package will accept fields=c(“assignees”) when using the patent endpoint and all nested assignees’ fields will be returned by the API.

  4. Some field’s names have changed, most significantly, patent_number is now patent_id, and some fields were removed entirely, for instance, rawinventor_first_name and rawinventor_last_name.

  5. The original version of the API had queryable fields and additional fields which could be retrieved but couldn’t be part of a conditional query. That notion does not apply to the new version of the API as all fields are now queryable. You may be able to simplify your code if you found yourself doing post processing on returned data because a field you were interested in was not queryable.

  6. Now there isn’t supposed to be a difference between operators used on strings vs full text fields, as there was in the original version of the API. See the tip below the Syntax section.

  7. Result set paging has changed significantly. This would matter only if users implemented their own paging, the R package continues to handle result set paging when search_pv’s all_pages = TRUE. There is a new result set paging vignette to explain the way the API now pages, using the size and after parameters rather than using per_page and page.

  8. Result set sizes are seemingly unbounded now. The original version of the API capped result sets at 100,000 rows.

The API team also renamed the API, PatentsView’s Search API is now the PatentSearch API. Note that the R package will retain its name, continue to use library(patentsview)

Highlights of the R package:

  1. Throttling is now enforced by the API and handled by the R package (sleep as specified by the throttle response before retry)
  2. There are four new vignettes
    • There is a new “Converting an existing script” vignette
    • The rOpenSci post that announced the original version of the R package has been changed to work with the new version of the API and is now a new vignette.
    • Understanding the API, the API team’s jupyter notebook, converted to R
    • On writing custom result set paging
  3. The R package changed internally from using httr to httr2. This only affects users if they passed additional arguments (…) to search_pv(). Previously if they passed config = httr::timeout(40) they’d now pass timeout = 40 (name-value pairs of valid curl options, as found in curl::curl_options() see req_options)
  4. Now that the R package is using httr2, users can make use of its last_request() method to see what was sent to the API. This could be useful when trying to fix an invalid request. Also fun would be seeing the raw API response.
httr2::last_request()
httr2::last_response()
httr2::last_response() |> httr2::resp_body_json() 
  1. A new function was added
  2. An existing function was removed. With the API changes, there is less of a need for cast_pv_data() which was previously part of the R package. The API now returns most fields as appropriate types, boolean, numeric etc., instead of always returning strings.

Online Documentation

The API team has thoughtfully provided a Swagger UI page for the new version of the API at https://search.patentsview.org/swagger-ui/. Think of it as an online version of Postman already loaded with the API’s new endpoints and returns. The Swagger UI page documents what fields are returned by each endpoint on a successful call. (Response http code 200). You can even send in requests and see actual API responses if you enter your API key and press an endpoint’s “Try it out” and “Execute” buttons. Even error responses can be informative, usually pointing out what went wrong.

In a similar format, the updated API documentation lists what each endpoint does. Additionally, the R package’s fieldsdf data frame has been updated, now listing the new set of endpoints and fields that can be queried and/or returned. The R package’s reference pages have also been updated.

Final Thoughts

As shown in the updated Top Assignees vignette, there will be occasions now where multiple API calls are needed to retrieve the same data as in a single API call in the original version of the API and R package. Additionally, the reworked rOpenSci post explains what changes had to be made since assignee latitude and longitude are no longer available from the patent endpoint.

Issues or questions about the API itself can be raised in the API’s portal or in the API’s forum. Issues for the R package can be raised in the patentsview repo.