Download, read and format STATS19 data in one function.
Usage
get_stats19(
year = NULL,
type = "collision",
data_dir = get_data_directory(),
file_name = NULL,
format = TRUE,
ask = FALSE,
silent = FALSE,
output_format = "tibble",
...
)
Arguments
- year
A year matching file names on the STATS19 data release page e.g.
2020
- type
One of 'collision', 'casualty', 'Vehicle'; defaults to 'collision'. This text string is used to match the file names released by the DfT.
- data_dir
Parent directory for all downloaded files. Defaults to
tempdir()
.- file_name
The file name (DfT named) to download.
- format
Switch to return raw read from file, default is
TRUE
.- ask
Should you be asked whether or not to download the files?
TRUE
by default.- silent
Boolean. If
FALSE
(default value), display useful progress messages on the screen.- output_format
A string that specifies the desired output format. The default value is
"tibble"
. Other possible values are"data.frame"
,"sf"
and"ppp"
, that, respectively, returns objects of classdata.frame
,sf::sf
andspatstat.geom::ppp
. Any other string is ignored and a tibble output is returned. See details and examples.- ...
Other arguments be passed to
format_sf()
orformat_ppp()
functions. Read and run the examples.
Details
This function uses gets STATS19 data. Behind the scenes it uses
dl_stats19()
and read_*
functions, returning a
tibble
(default), data.frame
, sf
or ppp
object, depending on the
output_format
parameter.
The function returns data for a specific year (e.g. year = 2022
)
Note: for years before 2016 the function may return data from more years than are requested due to the nature of the files hosted at data.gov.uk.
As this function uses dl_stats19
function, it can download many MB of data,
so ensure you have a sufficient disk space.
If output_format = "data.frame"
or output_format = "sf"
or output_format = "ppp"
then the output data is transformed into a data.frame, sf or ppp
object using the as.data.frame()
or format_sf()
or format_ppp()
functions, as shown in the examples.
Examples
# \donttest{
if(curl::has_internet()) {
col = get_stats19(year = 2022, type = "collision")
cas2 = get_stats19(year = 2022, type = "casualty")
veh = get_stats19(year = 2022, type = "vehicle")
class(col)
# data.frame output
x = get_stats19(2022, silent = TRUE, output_format = "data.frame")
class(x)
# Run tests only if endpoint is alive:
if(nrow(x) > 0) {
# sf output
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf")
# sf output with lonlat coordinates
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf", lonlat = TRUE)
sf::st_crs(x_sf)
if (requireNamespace("spatstat.geom", quietly = TRUE)) {
# ppp output
x_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp")
# We can use the window parameter of format_ppp function to filter only the
# events occurred in a specific area. For example we can create a new bbox
# of 5km around the city center of Leeds
leeds_window = spatstat.geom::owin(
xrange = c(425046.1, 435046.1),
yrange = c(428577.2, 438577.2)
)
leeds_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp", window = leeds_window)
spatstat.geom::plot.ppp(leeds_ppp, use.marks = FALSE, clipwin = leeds_window)
# or even more fancy examples where we subset all the events occurred in a
# pre-defined polygon area
# The following example requires osmdata package
# greater_london_sf_polygon = osmdata::getbb(
# "Greater London, UK",
# format_out = "sf_polygon"
# )
# spatstat works only with planar coordinates
# greater_london_sf_polygon = sf::st_transform(greater_london_sf_polygon, 27700)
# then we extract the coordinates and create the window object.
# greater_london_polygon = sf::st_coordinates(greater_london_sf_polygon)[, c(1, 2)]
# greater_london_window = spatstat.geom::owin(poly = greater_london_polygon)
# greater_london_ppp = get_stats19(2022, output_format = "ppp", window = greater_london_window)
# spatstat.geom::plot.ppp(greater_london_ppp, use.marks = FALSE, clipwin = greater_london_window)
}
}
}
#> Files identified: dft-road-casualty-statistics-collision-2022.csv
#> https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2022.csv
#> Data already exists in data_dir, not downloading
#> Reading in:
#> /tmp/RtmpDKqbC5/dft-road-casualty-statistics-collision-2022.csv
#> date and time columns present, creating formatted datetime column
#> Files identified: dft-road-casualty-statistics-casualty-2022.csv
#> https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2022.csv
#> Data already exists in data_dir, not downloading
#> Warning: The following named parsers don't match the column names: accident_severity, carriageway_hazards, collision_index, collision_reference, collision_year, date, day_of_week, did_police_officer_attend_scene_of_accident, did_police_officer_attend_scene_of_collision, enhanced_collision_severity, first_road_class, first_road_number, junction_control, junction_detail, latitude, legacy_collision_severity, light_conditions, local_authority_district, local_authority_highway, local_authority_ons_district, location_easting_osgr, location_northing_osgr, longitude, lsoa_of_accident_location, lsoa_of_collision_location, number_of_casualties, number_of_vehicles, pedestrian_crossing_human_control, pedestrian_crossing_physical_facilities, police_force, road_surface_conditions, road_type, second_road_class, second_road_number, special_conditions_at_site, speed_limit, time, trunk_road_flag, urban_or_rural_area, weather_conditions, adjusted_serious, adjusted_slight, injury_based, accident_ref_no, effective_date_of_change, previously_published_value, replacement_value, variable, age_band_of_driver, age_of_driver, age_of_vehicle, dir_from_e, dir_from_n, dir_to_e, dir_to_n, driver_distance_banding, driver_home_area_type, driver_imd_decile, engine_capacity_cc, escooter_flag, first_point_of_impact, generic_make_model, hit_object_in_carriageway, hit_object_off_carriageway, journey_purpose_of_driver, junction_location, lsoa_of_driver, propulsion_code, sex_of_driver, skidding_and_overturning, towing_and_articulation, vehicle_direction_from, vehicle_direction_to, vehicle_leaving_carriageway, vehicle_left_hand_drive, vehicle_location_restricted_lane, vehicle_manoeuvre, vehicle_type
#> Warning: NAs introduced by coercion
#> Files identified: dft-road-casualty-statistics-vehicle-2022.csv
#> https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2022.csv
#> Data already exists in data_dir, not downloading
#> Warning: The following named parsers don't match the column names: accident_severity, carriageway_hazards, collision_index, collision_reference, collision_year, date, day_of_week, did_police_officer_attend_scene_of_accident, did_police_officer_attend_scene_of_collision, enhanced_collision_severity, first_road_class, first_road_number, junction_control, junction_detail, latitude, legacy_collision_severity, light_conditions, local_authority_district, local_authority_highway, local_authority_ons_district, location_easting_osgr, location_northing_osgr, longitude, lsoa_of_accident_location, lsoa_of_collision_location, number_of_casualties, number_of_vehicles, pedestrian_crossing_human_control, pedestrian_crossing_physical_facilities, police_force, road_surface_conditions, road_type, second_road_class, second_road_number, special_conditions_at_site, speed_limit, time, trunk_road_flag, urban_or_rural_area, weather_conditions, age_band_of_casualty, age_of_casualty, bus_or_coach_passenger, car_passenger, casualty_class, casualty_distance_banding, casualty_home_area_type, casualty_imd_decile, casualty_reference, casualty_severity, casualty_type, enhanced_casualty_severity, lsoa_of_casualty, pedestrian_location, pedestrian_movement, pedestrian_road_maintenance_worker, sex_of_casualty, adjusted_serious, adjusted_slight, injury_based, accident_ref_no, effective_date_of_change, previously_published_value, replacement_value, variable
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
#> date and time columns present, creating formatted datetime column
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Warning: some mark values are NA in the point pattern x
#> date and time columns present, creating formatted datetime column
#> 22 rows removed with no coordinates
#> Warning: 105096 points were rejected as lying outside the specified window
#> Warning: some mark values are NA in the point pattern x
# }