Skip to contents

DOI JOSS

The package rdefra allows to retrieve air pollution data from the Air Information Resource UK-AIR of the Department for Environment, Food and Rural Affairs in the United Kingdom. UK-AIR does not provide a public API for programmatic access to data, therefore this package scrapes the HTML pages to get relevant information.

This package follows a logic similar to other packages such as waterData and rnrfa: sites are first identified through a catalogue, data are imported via the station identification number, then data are visualised and/or used in analyses. The metadata related to the monitoring stations are accessible through the function ukair_catalogue(), missing stations’ coordinates can be obtained using the function ukair_get_coordinates(), and time series data related to different pollutants can be obtained using the function ukair_get_hourly_data().

DEFRA’s servers can handle multiple data requests, therefore concurrent calls can be sent simultaneously using the parallel package. Although the limit rate depends on the maximum number of concurrent calls, traffic and available infrastructure, data retrieval is very efficient. Multiple years of data for hundreds of sites can be downloaded in only few minutes.

For similar functionalities see also the openair package, which relies on a local copy of the data on servers at King’s College (UK), and the ropenaq which provides UK-AIR latest measured levels (see https://uk-air.defra.gov.uk/latest/currentlevels) as well as data from other countries.

Installation

Get the released version from CRAN:

Or the development version from GitHub using the package remotes:

install.packages("remotes")
remotes::install_github("ropensci/rdefra")

Load the rdefra package:

Functions

The package logic assumes that users access the UK-AIR database in the following steps:

  1. Browse the catalogue of available stations and selects some stations of interest (see function ukair_catalogue()).
  2. Get missing coordinates (see function ukair_get_coordinates()).
  3. Retrieves data for the selected stations (see functions ukair_get_site_id() and ukair_get_hourly_data()).

For an in-depth description of the various functionalities and example applications, please refer to the package vignette.

Meta

  • This package and functions herein are part of an experimental open-source project. They are provided as is, without any guarantee.
  • Please report any issues or bugs.
  • License: GPL-3
  • This package was reviewed by Maëlle Salmon and Hao Zhu for submission to ROpenSci (see review here) and the Journal of Open Source Software (see review here).
  • Cite rdefra: citation(package = "rdefra")