Evaluate metrics at multiple scales of aggregation

Usage

ww_multi_scale(
  data = NULL,
  truth,
  estimate,
  metrics = list(yardstick::rmse, yardstick::mae),
  grids = NULL,
  ...,
  na_rm = TRUE,
  aggregation_function = "mean",
  autoexpand_grid = TRUE,
  progress = TRUE
)

Arguments

data: Either: a point geometry sf object containing the columns specified by the truth and estimate arguments; a SpatRaster from the terra package containing layers specified by the truth and estimate arguments; or NULL if truth and estimate are SpatRaster objects.
truth, estimate: If data is an sf object, the names (optionally unquoted) for the columns in data containing the true and predicted values, respectively. If data is a SpatRaster object, either (quoted) layer names or indices which will select the true and predicted layers, respectively, via terra::subset() If data is NULL, SpatRaster objects with a single layer containing the true and predicted values, respectively.
metrics: Either a yardstick::metric_set() object, or a list of functions which will be used to construct a yardstick::metric_set() object specifying the performance metrics to evaluate at each scale.
grids: Optionally, a list of pre-computed sf or sfc objects specifying polygon boundaries to use for assessments.
...: Arguments passed to sf::st_make_grid(). You almost certainly should provide these arguments as lists. For instance, passing n = list(c(1, 2)) will create a single 1x2 grid; passing n = c(1, 2) will create a 1x1 grid and a 2x2 grid.
na_rm: Boolean: Should polygons with NA values be removed before calculating metrics? Note that this does not impact how values are aggregated to polygons: if you want to remove NA values before aggregating, provide a function to aggregation_function which will remove NA values.
aggregation_function: The function to use to aggregate predictions and true values at various scales, by default mean(). For the sf method, you can pass any function which takes a single vector and returns a scalar. For raster methods, any function accepted by exactextractr::exact_extract() (note that built-in function names must be quoted). Note that this function does not pay attention to the value of na_rm; any NA handling you want to do during aggregation should be handled by this function directly.
autoexpand_grid: Boolean: if data is in geographic coordinates and grids aren't provided, the grids generated by sf::st_make_grid() may not contain all observations. If TRUE, this function will automatically expand generated grids by a tiny factor to attempt to capture all observations.
progress: Boolean: if data is NULL, should aggregation via exactextractr::exact_extract() show a progress bar? Separate progress bars will be shown for each time truth and estimate are aggregated.

Value

A tibble with six columns: .metric, with the name of the metric that the row describes; .estimator, with the name of the estimator used, .estimate, with the output of the metric function; .grid_args, with the arguments passed to sf::st_make_grid() via ... (if any), .grid, containing the grids used to aggregate predictions, as well as the aggregated values of truth and estimate as well as the count of non-NA values for each, and .notes, which (if data is an sf object) will indicate any observations which were not used in a given assessment.

Raster inputs

If data is NULL, then truth and estimate should both be SpatRaster objects, as created via terra::rast(). These rasters will then be aggregated to each grid using exactextractr::exact_extract(). If data is a SpatRaster object, then truth and estimate should be indices to select the appropriate layers of the raster via terra::subset().

Grids are calculated using the bounding box of truth, under the assumption that you may have extrapolated into regions which do not have matching "true" values. This function does not check that truth and estimate overlap at all, or that they are at all contained within the grid.

Creating grid blocks

The grid blocks can be controlled by passing arguments to sf::st_make_grid() via .... Some particularly useful arguments include:

cellsize: Target cellsize, expressed as the "diameter" (shortest straight-line distance between opposing sides; two times the apothem) of each block, in map units.
n: The number of grid blocks in the x and y direction (columns, rows).
square: A logical value indicating whether to create square (TRUE) or hexagonal (FALSE) cells.

If both cellsize and n are provided, then the number of blocks requested by n of sizes specified by cellsize will be returned, likely not lining up with the bounding box of data. If only cellsize is provided, this function will return as many blocks of size cellsize as fit inside the bounding box of data. If only n is provided, then cellsize will be automatically adjusted to create the requested number of cells.

Grids are created by mapping over each argument passed via ... simultaneously, in a similar manner to mapply() or purrr::pmap(). This means that, for example, passing n = list(c(1, 2)) will create a single 1x2 grid, while passing n = c(1, 2) will create a 1x1 grid and a 2x2 grid. It also means that arguments will be recycled using R's standard vector recycling rules, so that passing n = c(1, 2) and square = FALSE will create two separate grids of hexagons.

This function can be used for geographic or projected coordinate reference systems and expects 2D data.

References

Riemann, R., Wilson, B. T., Lister, A., and Parks, S. (2010). "An effective assessment protocol for continuous geospatial datasets of forest characteristics using USFS Forest Inventory and Analysis (FIA) data." Remote Sensing of Environment 114(10), pp 2337-2352, doi: 10.1016/j.rse.2010.05.010 .

Examples

data(ames, package = "modeldata")
ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)
ames_model <- lm(Sale_Price ~ Lot_Area, data = ames_sf)
ames_sf$predictions <- predict(ames_model, ames_sf)

ww_multi_scale(
  ames_sf,
  Sale_Price,
  predictions,
  n = list(
    c(10, 10),
    c(1, 1)
  ),
  square = FALSE
)
#> # A tibble: 4 × 6
#>   .metric .estimator .estimate .grid_args       .grid          .notes          
#>   <chr>   <chr>          <dbl> <list>           <list>         <list>          
#> 1 rmse    standard      82076. <tibble [1 × 2]> <sf [103 × 5]> <tibble [0 × 2]>
#> 2 mae     standard      60467. <tibble [1 × 2]> <sf [103 × 5]> <tibble [0 × 2]>
#> 3 rmse    standard      27862. <tibble [1 × 2]> <sf [5 × 5]>   <tibble [0 × 2]>
#> 4 mae     standard      23267. <tibble [1 × 2]> <sf [5 × 5]>   <tibble [0 × 2]>

# or, mostly equivalently
# (there will be a slight difference due to `autoexpand_grid = TRUE`)
grids <- list(
  sf::st_make_grid(ames_sf, n = c(10, 10), square = FALSE),
  sf::st_make_grid(ames_sf, n = c(1, 1), square = FALSE)
)
ww_multi_scale(ames_sf, Sale_Price, predictions, grids = grids)
#> # A tibble: 4 × 6
#>   .metric .estimator .estimate .grid_args       .grid          .notes          
#>   <chr>   <chr>          <dbl> <list>           <list>         <list>          
#> 1 rmse    standard      77853. <tibble [0 × 0]> <sf [103 × 5]> <tibble [0 × 2]>
#> 2 mae     standard      56796. <tibble [0 × 0]> <sf [103 × 5]> <tibble [0 × 2]>
#> 3 rmse    standard      27789. <tibble [0 × 0]> <sf [5 × 5]>   <tibble [0 × 2]>
#> 4 mae     standard      23173. <tibble [0 × 0]> <sf [5 × 5]>   <tibble [0 × 2]>