Download, read and format STATS19 data in one function.
Usage
get_stats19(
year = NULL,
type = "collision",
data_dir = get_data_directory(),
file_name = NULL,
format = TRUE,
ask = FALSE,
silent = FALSE,
output_format = "tibble",
...
)
Arguments
- year
A year matching file names on the STATS19 data release page e.g.
2020
- type
One of 'collision', 'casualty', 'Vehicle'; defaults to 'collision'. This text string is used to match the file names released by the DfT.
- data_dir
Parent directory for all downloaded files. Defaults to
tempdir()
.- file_name
The file name (DfT named) to download.
- format
Switch to return raw read from file, default is
TRUE
.- ask
Should you be asked whether or not to download the files?
TRUE
by default.- silent
Boolean. If
FALSE
(default value), display useful progress messages on the screen.- output_format
A string that specifies the desired output format. The default value is
"tibble"
. Other possible values are"data.frame"
,"sf"
and"ppp"
, that, respectively, returns objects of classdata.frame
,sf::sf
andspatstat.geom::ppp
. Any other string is ignored and a tibble output is returned. See details and examples.- ...
Other arguments be passed to
format_sf()
orformat_ppp()
functions. Read and run the examples.
Details
This function uses gets STATS19 data. Behind the scenes it uses
dl_stats19()
and read_*
functions, returning a
tibble
(default), data.frame
, sf
or ppp
object, depending on the
output_format
parameter.
The function returns data for a specific year (e.g. year = 2022
)
Note: for years before 2016 the function may return data from more years than are requested due to the nature of the files hosted at data.gov.uk.
As this function uses dl_stats19
function, it can download many MB of data,
so ensure you have a sufficient disk space.
If output_format = "data.frame"
or output_format = "sf"
or output_format = "ppp"
then the output data is transformed into a data.frame, sf or ppp
object using the as.data.frame()
or format_sf()
or format_ppp()
functions, as shown in the examples.
Examples
# \donttest{
if(curl::has_internet()) {
x = get_stats19(2022, silent = TRUE, format = TRUE)
class(x)
# data.frame output
x = get_stats19(2022, silent = TRUE, output_format = "data.frame")
class(x)
# Run tests only if endpoint is alive:
if(nrow(x) > 0) {
# sf output
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf")
# sf output with lonlat coordinates
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf", lonlat = TRUE)
sf::st_crs(x_sf)
if (requireNamespace("spatstat.geom", quietly = TRUE)) {
# ppp output
x_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp")
# We can use the window parameter of format_ppp function to filter only the
# events occurred in a specific area. For example we can create a new bbox
# of 5km around the city center of Leeds
leeds_window = spatstat.geom::owin(
xrange = c(425046.1, 435046.1),
yrange = c(428577.2, 438577.2)
)
leeds_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp", window = leeds_window)
spatstat.geom::plot.ppp(leeds_ppp, use.marks = FALSE, clipwin = leeds_window)
# or even more fancy examples where we subset all the events occurred in a
# pre-defined polygon area
# The following example requires osmdata package
# greater_london_sf_polygon = osmdata::getbb(
# "Greater London, UK",
# format_out = "sf_polygon"
# )
# spatstat works only with planar coordinates
# greater_london_sf_polygon = sf::st_transform(greater_london_sf_polygon, 27700)
# then we extract the coordinates and create the window object.
# greater_london_polygon = sf::st_coordinates(greater_london_sf_polygon)[, c(1, 2)]
# greater_london_window = spatstat.geom::owin(poly = greater_london_polygon)
# greater_london_ppp = get_stats19(2022, output_format = "ppp", window = greater_london_window)
# spatstat.geom::plot.ppp(greater_london_ppp, use.marks = FALSE, clipwin = greater_london_window)
}
}
}
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> 22 rows removed with no coordinates
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> 22 rows removed with no coordinates
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> 22 rows removed with no coordinates
#> date and time columns present, creating formatted datetime column
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> 22 rows removed with no coordinates
#> Warning: 105096 points were rejected as lying outside the specified window
# }