Skip to contents

Download, read and format STATS19 data in one function.

Usage

get_stats19(
  year = NULL,
  type = "collision",
  data_dir = get_data_directory(),
  file_name = NULL,
  format = TRUE,
  ask = FALSE,
  silent = FALSE,
  output_format = "tibble",
  ...
)

Arguments

year

A year matching file names on the STATS19 data release page e.g. 2020

type

One of 'collision', 'casualty', 'Vehicle'; defaults to 'collision'. This text string is used to match the file names released by the DfT.

data_dir

Parent directory for all downloaded files. Defaults to tempdir().

file_name

The file name (DfT named) to download.

format

Switch to return raw read from file, default is TRUE.

ask

Should you be asked whether or not to download the files? TRUE by default.

silent

Boolean. If FALSE (default value), display useful progress messages on the screen.

output_format

A string that specifies the desired output format. The default value is "tibble". Other possible values are "data.frame", "sf" and "ppp", that, respectively, returns objects of class data.frame, sf::sf and spatstat.geom::ppp. Any other string is ignored and a tibble output is returned. See details and examples.

...

Other arguments be passed to format_sf() or format_ppp() functions. Read and run the examples.

Details

This function uses gets STATS19 data. Behind the scenes it uses dl_stats19() and read_* functions, returning a tibble (default), data.frame, sf or ppp object, depending on the output_format parameter. The function returns data for a specific year (e.g. year = 2022)

Note: for years before 2016 the function may return data from more years than are requested due to the nature of the files hosted at data.gov.uk.

As this function uses dl_stats19 function, it can download many MB of data, so ensure you have a sufficient disk space.

If output_format = "data.frame" or output_format = "sf" or output_format = "ppp" then the output data is transformed into a data.frame, sf or ppp object using the as.data.frame() or format_sf() or format_ppp() functions, as shown in the examples.

Examples

# \donttest{
if(curl::has_internet()) {
x = get_stats19(2022, silent = TRUE, format = TRUE)
class(x)
# data.frame output
x = get_stats19(2022, silent = TRUE, output_format = "data.frame")
class(x)

# Run tests only if endpoint is alive:
if(nrow(x) > 0) {

# sf output
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf")

# sf output with lonlat coordinates
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf", lonlat = TRUE)
sf::st_crs(x_sf)

if (requireNamespace("spatstat.geom", quietly = TRUE)) {
# ppp output
x_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp")

# We can use the window parameter of format_ppp function to filter only the
# events occurred in a specific area. For example we can create a new bbox
# of 5km around the city center of Leeds

leeds_window = spatstat.geom::owin(
xrange = c(425046.1, 435046.1),
yrange = c(428577.2, 438577.2)
)

leeds_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp", window = leeds_window)
spatstat.geom::plot.ppp(leeds_ppp, use.marks = FALSE, clipwin = leeds_window)

# or even more fancy examples where we subset all the events occurred in a
# pre-defined polygon area

# The following example requires osmdata package
# greater_london_sf_polygon = osmdata::getbb(
# "Greater London, UK",
# format_out = "sf_polygon"
# )
# spatstat works only with planar coordinates
# greater_london_sf_polygon = sf::st_transform(greater_london_sf_polygon, 27700)
# then we extract the coordinates and create the window object.
# greater_london_polygon = sf::st_coordinates(greater_london_sf_polygon)[, c(1, 2)]
# greater_london_window = spatstat.geom::owin(poly = greater_london_polygon)

# greater_london_ppp = get_stats19(2022, output_format = "ppp", window = greater_london_window)
# spatstat.geom::plot.ppp(greater_london_ppp, use.marks = FALSE, clipwin = greater_london_window)
}
}
}
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Warning: some mark values are NA in the point pattern x
#> Rows: 106004 Columns: 36
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (6): accident_index, accident_reference, date, local_authority_ons_dis...
#> dbl  (29): accident_year, location_easting_osgr, location_northing_osgr, lon...
#> time  (1): time
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Warning: 105096 points were rejected as lying outside the specified window
#> Warning: some mark values are NA in the point pattern x

# }