caRecall R Package

Photo by Artem Beliaikin from Pexels

Motivation

The caRecall R package is a wrapper for the Government of Canada Vehicles Recalls Database (VRD) API. The package was built as part of the Master of Data Science (MDS) degree at UBC and was completed with Mitch Harris and Ryan Koenig. The project goal was to self figure out how build an R package with the requirement that it be an API wrapper. There was no requirement to open source the project or submit the package to CRAN (Comprehensive R Archive Network), but this seemed like a good opportunity as part of the process.

The VRD API was selected for the project for the following reasons:

  • The project timeline was limited to 3 weeks; accordingly, we were looking for an API with sufficient functionality to build a small package around, yet not too complicated that it would have been a struggle to submit the package to CRAN within the timeline.
  • The VRD API had not yet been wrapped. A fair amount of time was initially spent looking for an API that met the above requirement and did not already have an R wrapper package available for it (on CRAN or just available on Github). It was actually quite surprising that many of the public APIs already been wrapped. Many of the wrappers had been built in the last one to three years and this may in part be due to the repeating project through the UBC MDS program (or other programs) over the years.
  • The VRD provides data that was interesting while working on the project.

The code is available on github and the package is available on CRAN. It can also be installed in R using: install.packages('caRecall').

Package Overview

The VRD is used by the Defect Investigations and Recalls Division for vehicles, tires, and child car seats. The caRecall API wrapper provides access to recall summary information that can be searched using make, model, and year range, as well as detailed recall information searched using recall number.

The package solely focuses on querying data from the Vehicle Recalls Database and functions fall into the following categories:

  • Recall summary information
  • Recall counts
  • Detailed recall information

The recall summary information functions retrieve summary data from the VRD based on user search terms including: make, model, recall numbers, or year ranges. The functions also allow partial or non-partial matching search terms and a user can specify the limit of information they want returned. A simple example of using the recall_by_years() function in the package to return all recalls manufactured in 2000 could be done as follows:

recall_summary_2000 <- recall_by_years(start_year = 2000, end_year = 2000, limit = 3000)
recall_summary_2000
#> # A tibble: 2,422 x 6
#>    `Recall number` `Manufacturer N~ `Model name` `Make name`  Year `Recall date`
#>    <chr>           <chr>            <chr>        <chr>       <int> <date>       
#>  1 1993076         MERCEDES-BENZ    300          MERCEDES-B~  2000 1993-05-31   
#>  2 1999056         FIAT CHRYSLER A~ NEON         CHRYSLER     2000 1999-04-14   
#>  3 1999108         FIAT CHRYSLER A~ NEON         CHRYSLER     2000 1999-06-07   
#>  4 1999111         FLEETWOOD        TIOGA        FLEETWOOD    2000 1999-06-08   
#>  5 1999137         POLARIS          SNOWMOBILE   POLARIS      2000 1999-07-26   
#>  6 1999138         MAZDA            MPV          MAZDA        2000 1999-07-27   
#>  7 1999147         MAZDA            MPV          MAZDA        2000 1999-07-30   
#>  8 1999151         GENERAL MOTORS   S10          CHEVROLET    2000 1999-08-16   
#>  9 1999151         GENERAL MOTORS   SONOMA       GMC          2000 1999-08-16   
#> 10 1999155         GENERAL MOTORS   SUNFIRE      PONTIAC      2000 1999-08-19   
#> # ... with 2,412 more rows

The count functions have similar function arguments but only return the total number of recalls available in the database. The detailed information function recall_details() provides additional information for a specific recall number including details of what the recall is for (for example, ‘Seats and Restraints’) as well as a description on the reason for the recall.

The data from the package could then be used to generate a summary figure as follows:

Additional details on the package are available in the vignette. The vignette discusses how to get and use an API key within caRecall and provides a walk through of examples.

Project Description

The following sections provide a brief overview of several components of the project including:

  • Building the R package
  • Using Github Actions for unit testing, coverage, package checks, and automated documentation
  • Submitting the package to CRAN

Most of the references used in the project are also provided and may be useful to others looking to build an API wrapper R package.

Building caRecall

caRecall uses the httr package to call the VRD API. The httr vignette provides guidance on best practices for API packages and was followed for the package. However, the bigger challenge with building caRecall was not writing the API wrapper functionality but figuring out how to put together the pieces for the complete package. The R Packages book was used as the guideline for this and is likely very familiar to anyone that has put together a package in R. The book goes through testing, documentation, organization, and use of the devtools and usethis packages. These two packages provide automated building of an R package framework and include functionality for checking the package and submission to CRAN. The R Packages book was constantly referred to and finishing the caRecall package within the project timeline would likely not have been possible without it.

Github Actions and pkgdown

Github Actions was used for automated unit testing, coverage checks, package checks (using R CMD check) and building a website for the package through pkgdown. The website build for caRecall can be found here. Github Actions workflow templates are available at: https://github.com/r-lib/actions/ and these were used/modified as required. The usethis package also has functions that auto-populate files from this workflow repository and useful guidance on setting up the workflows are described in the Github Actions with R book.

Github Secrets were used to manage the API key required to run the various workflows in the public Github repository. Setting this up was quite straightforward and is discussed in Github Actions with R; however, one of the bigger challenges was setting up the package to run the workflows with and without the API key as discussed below.

Submitting the Package to CRAN

Automated checks are completed when an R package is submitted to CRAN. Some of these include checking that R code in vignettes can run and that all unit tests pass. However, an API key cannot be sent to CRAN safely to use in an environment which makes this somewhat complicated.

Skipping tests is relatively easy and is described in the httr vignette for managing secrets which involves writing a small helper function to skip a test if the API key is not found. A more challenging task was having a vignette Rmarkdown file that could be run by CRAN without an API key, yet could also build the full html vignette for rendering in the pkgdown website. A solution in the httr vignette suggested setting up a condition in the Rmarkdown file to skip code if the API key is not found in the environment, but this did not work for us. Instead, a method suggested in an ROpenSci post worked by precomputing the vignette which satisfied both the pkdgown and CRAN requirements. If anyone is looking to build an API wrapper R package for submission to CRAN, I recommend using this method.

The R Packages book has an entire chapter on releasing a package which includes an overview of what needs to be done/looked out for when submitting a package to CRAN. The devtools package also has a function that runs through a checklist and then submits the package to CRAN. There were certainly several items that had to be fixed when running through the checklist prior to submission. The wait time after submitting to CRAN was approximately three days but we were fortunate that the package was approved on the first try and we did not need to fix and resubmit.

Summary

Building caRecall and submitting the small package to CRAN was good project and provided a complete overview of all steps involved in building an R package. Working with an API key for the various Github Action workflows and submitting to CRAN provided a few challenges but there are articles written by the R community that helped navigate this.

While the caRecall package is not likely to grow much beyond the current functionality of wrapping the VRD API, the plan is certainly to maintain the package, fix any bugs that come up, and keep it available on CRAN. However, any enhancement requests are more than welcome if they can extend the package useability.

Nathan Smith
Nathan Smith
Data Science | Water Resource Engineering

My interests include machine learning and using interactive applications to convey results and uncertainties.