Downloads data from Environment and Climate Change Canada (ECCC) for one or
more stations. For details and units, see the glossary vignette
(vignette("glossary", package = "weathercan")
) or the glossary online
https://climate.weather.gc.ca/glossary_e.html.
Usage
weather_dl(
station_ids,
start = NULL,
end = NULL,
interval = "hour",
trim = TRUE,
format = TRUE,
string_as = NA,
time_disp = "none",
stn = NULL,
encoding = "UTF-8",
list_col = FALSE,
verbose = FALSE,
quiet = FALSE
)
Arguments
- station_ids
Numeric/Character. A vector containing the ID(s) of the station(s) you wish to download data from. See the
stations
data frame or thestations_search
function to find IDs.- start
Date/Character. The start date of the data in YYYY-MM-DD format (applies to all stations_ids). Defaults to start of range.
- end
Date/Character. The end date of the data in YYYY-MM-DD format (applies to all station_ids). Defaults to end of range.
- interval
Character. Interval of the data, one of "hour", "day", "month".
- trim
Logical. Trim missing values from the start and end of the weather dataframe. Only applies if
format = TRUE
- format
Logical. If TRUE, formats data for immediate use. If FALSE, returns data exactly as downloaded from Environment and Climate Change Canada. Useful for dealing with changes by Environment Canada to the format of data downloads.
- string_as
Character. What value to replace character strings in a numeric measurement with. See Details.
- time_disp
Character. Either "none" (default) or "UTC". See details.
- stn
DEFUNCT. Now use
stations_dl()
to update internal data andstations_meta()
to check the date it was last updated.- encoding
Character. Text encoding for download.
- list_col
Logical. Return data as nested data set? Defaults to FALSE. Only applies if
format = TRUE
- verbose
Logical. Include progress messages
- quiet
Logical. Suppress all messages (including messages regarding missing data, etc.)
Details
Data can be returned 'raw' (format = FALSE) or can be formatted.
Formatting transforms dates/times to date/time class, renames columns, and
converts data to numeric where possible. If character strings are contained
in traditionally numeric fields (e.g., weather speed may have values such
as "< 30"), they can be replaced with a character specified by string_as
.
The default is NA. Formatting also replaces data associated with certain
flags with NA (M = Missing).
Start and end date can be specified, but if not, it will default to the start and end date of the range (this could result in downloading a lot of data!).
For hourly data, timezones are always "UTC", but the actual times are
either local time (default; time_disp = "none"
), or UTC (time_disp = "UTC"
). When time_disp = "none"
, times reflect the local time without
daylight savings. This means that relative measures of time, such as
"nighttime", "daytime", "dawn", and "dusk" are comparable among stations in
different timezones. This is useful for comparing daily cycles. When
time_disp = "UTC"
the times are transformed into UTC timezone. Thus
midnight in Kamloops would register as 08:00:00 (Pacific time is 8 hours
behind UTC). This is useful for tracking weather events through time, but
will result in odd 'daily' measures of weather (e.g., data collected in the
afternoon on Sept 1 in Kamloops will be recorded as being collected on Sept
2 in UTC).
Files are downloaded from the url stored in
getOption("weathercan.urls.weather")
. To change this location use
options(weathercan.urls.weather = "your_new_url")
.
Data is downloaded from ECCC as a series of files which are then bound together. Each file corresponds to a different month, or year, depending on the interval. Metadata (station name, lat, lon, elevation, etc.) is extracted from the start of the most recent file (i.e. most recent dates) for a given station. Note that important data (i.e. station name, lat, lon) is unlikely to change between files (i.e. dates), but some data may or may not be available depending on the date of the file (e.g., station operator was added as of April 1st 2018, so will be in all data which includes dates on or after April 2018).
Examples
if (FALSE) { # check_eccc()
kam <- weather_dl(station_ids = 51423,
start = "2016-01-01", end = "2016-02-15")
stations_search("Kamloops A$", interval = "hour")
stations_search("Prince George Airport", interval = "hour")
kam.pg <- weather_dl(station_ids = c(48248, 51423),
start = "2016-01-01", end = "2016-02-15")
library(ggplot2)
ggplot(data = kam.pg, aes(x = time, y = temp,
group = station_name,
colour = station_name)) +
geom_line()
}