Download time series from multiple stations/variables
Stijn Van Hoey
2024-12-28
Source:vignettes/download_timeseries_batch.Rmd
download_timeseries_batch.Rmd
Introduction
In many studies, the interest of the user is to download a batch of time series following on a selection criterion. Examples are:
- downloading air pressure data for the last day for all available measurement stations.
- downloading all measured variables at a frequency of 15 minutes for a given measurement station.
In this vignette, this type of batch downloads is explained, using
the available functions of the wateRinfo
package in
combination with already existing tidyverse functionalities.
## Error in get(paste0(generic, ".", class), envir = get_method_env()) :
## object 'type_sum.accel' not found
Download all stations for a given variable
Consider the scenario: “downloading air pressure data for the last
day for all available measurement stations”. We can achieve this by
downloading all the stations information providing air_pressure data
(get_stations()
) and for each of the ts_id
values in the resulting data.frame, applying the
get_timeseries_tsid()
function:
# extract the available stations for a predefined variable
variable_of_interest <- "air_pressure"
stations <- get_stations(variable_of_interest)
# Download the data for a given period for each of the stations
air_pressure <- stations %>%
group_by(ts_id) %>%
do(get_timeseries_tsid(.$ts_id, period = "P1D", to = "2017-01-02")) %>%
ungroup() %>%
left_join(stations, by = "ts_id")
As this results in a tidy data set, we can use the power of ggplot to plot the data of the individual measurement stations:
# create a plot of the individual datasets
air_pressure %>%
ggplot(aes(x = Timestamp, y = Value)) +
geom_point() + xlab("1 Jan 2017") +
facet_wrap(c("station_name", "stationparameter_name")) +
scale_x_datetime(date_labels = "%H:%M",
date_breaks = "6 hours")
Download set of variables from a station
Consider the scenario: “downloading all soil_moisture (in dutch:
‘bodemvocht’) variables at a frequency of 15 minutes for the measurement
station Liedekerke”. We can achieve this by downloading all the
variables information of the Liedekerke
station(get_variables()
) using the station code of the
waterinfo.be interface (ME07_006
), filtering on the
P.15
time series and for each of the ts_id
values, applying the get_timeseries_tsid()
function:
liedekerke_stat <- "ME07_006"
variables <- get_variables(liedekerke_stat)
variables_to_download <- variables %>%
filter(parametertype_name == "Bodemvocht") %>%
filter(ts_name == "P.15")
liedekerke <- variables_to_download %>%
group_by(ts_id) %>%
do(get_timeseries_tsid(.$ts_id, period = "P1M", from = "2017-01-01")) %>%
ungroup() %>%
left_join(variables, by = "ts_id")
As this results in a tidy data set, we can use the power of ggplot to plot the data of the individual measurement stations:
liedekerke %>%
ggplot(aes(x = Timestamp, y = Value)) +
geom_line() + xlab("") + ylab("bodemvocht") +
facet_wrap(c("ts_name", "stationparameter_name"), scales = "free") +
scale_x_datetime(date_labels = "%d-%m\n%Y",
date_breaks = "10 days")