Skip to contents

The rsi package aims to help you download data from STAC APIs, to compute spectral indices from data, and to handle a handful of other spatial data wrangling tasks. This vignette walks through the main features of the package. Let’s start off by loading the package:

Downloading data

In order to download data from a STAC API, we’re going to need to specify both a spatial and temporal area of interest. To define our spatial area of interest, we’ll first mock up a small bounding box in northern Massachusetts (in a planar CRS appropriate for the region):

our_aoi <- sf::st_bbox(
  c(xmin = 200000, ymin = 900000, xmax = 200100, ymax = 900100),
  crs = 26986
)
our_aoi <- sf::st_as_sf(sf::st_as_sfc(our_aoi))
sf::st_area(our_aoi)
#> 10000 [m^2]
plot(sf::st_geometry(our_aoi))

This is the area we’re going to download and process data for.

Out of the box, rsi includes a number of functions to make it easy to download the most popular data sets from public STAC APIs. For instance, the get_landsat_imagery() has a number of default arguments for downloading Landsat 8 and 9 imagery from Microsoft’s Planetary Computer. We’ll pass our spatial area of interest along with a timeframe to this function, in order to get a cloud-masked composite image of all Landsat acquisitions for our spatiotemporal window. Let’s grab all the imagery from September 2023:

our_imagery <- get_landsat_imagery(
  our_aoi,
  "2023-09-01",
  "2023-09-30",
  output_filename = tempfile(fileext = ".tif")
)
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100 
#> ext 0: 200000 200090 900010 900100 
#> ext 1: 200000 200090 900010 900100 
#> ext 2: 200000 200090 900010 900100 
#> ext 3: 200000 200090 900010 900100 
#> ext 4: 200000 200090 900010 900100 
#> ext 5: 200000 200090 900010 900100
our_imagery
#> [1] "/tmp/RtmpMs7oes/file1743ff1029d.tif"

If we wanted to, we could use future::plan() here to specify a parallelization methodology to speed up downloads by using multiple threads. We could also use progressr::handlers() to get a progress bar to report our download status.

By default, get_landsat_imagery() will download a composite of all bands available in Landsat 8 and 9 imagery for our timeframe:

terra::rast(our_imagery) |>
  terra::plot()

Notice that this composite has been cloud-masked (using the QA pixel band) and rescaled (using the scale and offset specified in metadata provided by the STAC endpoint) automatically. You can control these behaviors via function arguments.

We’re also able to use rsi to download other data sets – for instance, we could also grab a DEM for this area from Planetary Computer, using the get_dem() function:

our_dem <- get_dem(our_aoi)
terra::rast(our_dem) |>
  terra::plot()

Under the hood, both of these functions (and their friends, get_sentinel2_imagery() and get_sentinel1_imagery()) are powered by a lower-level get_stac_data() function, which should theoretically work with any imagery provided by any STAC API, anywhere. These functions simply provide user-friendly defaults to make it faster to get the data you care about.

Finding spectral indices

In addition to these STAC-focused data-downloading functions, rsi also has an interface to the Awesome Spectral Indices project via the spectral_indices() function:

spectral_indices() |>
  head()
#> # A tibble: 6 × 9
#>   application_domain bands     contributor    date_of_addition formula long_name
#>   <chr>              <list>    <chr>          <chr>            <chr>   <chr>    
#> 1 vegetation         <chr [2]> https://githu… 2021-11-17       (N - 0… Aerosol …
#> 2 vegetation         <chr [2]> https://githu… 2021-11-17       (N - 0… Aerosol …
#> 3 water              <chr [6]> https://githu… 2022-09-22       (B + G… Augmente…
#> 4 vegetation         <chr [2]> https://githu… 2021-09-20       (1 / G… Anthocya…
#> 5 vegetation         <chr [3]> https://githu… 2022-04-08       N * ((… Anthocya…
#> 6 vegetation         <chr [4]> https://githu… 2021-05-11       (N - (… Atmosphe…
#> # ℹ 3 more variables: platforms <list>, reference <chr>, short_name <chr>

This function attempts to grab the newest version of the spectral indices JSON file from the ASI repo, and then stores that data in a cache folder on your computer. If the downloading fails, the package will fall back (with a warning) to use your possibly outdated cache instead; if you don’t have a cache and can’t download the files, the package will instead (with a different warning) resort to using a packaged version of the indices file. This ensures that you’re always getting the latest and greatest version of the ASI list possible, but that the package can still be used without an internet connection.

There are also functions in rsi to sort through the ASI list of indices. For instance, the filter_platforms() function can be used to, well, filter the list to only indices that can be calculated from a given platform. For instance, to filter to only indices that can be calculated using data from Landsat’s Operational Land Imager:

filter_platforms(platforms = "Landsat-OLI") |>
  head()
#> # A tibble: 6 × 9
#>   application_domain bands     contributor    date_of_addition formula long_name
#>   <chr>              <list>    <chr>          <chr>            <chr>   <chr>    
#> 1 vegetation         <chr [2]> https://githu… 2021-11-17       (N - 0… Aerosol …
#> 2 vegetation         <chr [2]> https://githu… 2021-11-17       (N - 0… Aerosol …
#> 3 water              <chr [6]> https://githu… 2022-09-22       (B + G… Augmente…
#> 4 vegetation         <chr [4]> https://githu… 2021-05-11       (N - (… Atmosphe…
#> 5 vegetation         <chr [4]> https://githu… 2021-05-14       sla * … Adjusted…
#> 6 vegetation         <chr [2]> https://githu… 2022-04-08       (N * (… Advanced…
#> # ℹ 3 more variables: platforms <list>, reference <chr>, short_name <chr>

There’s an equivalent function to filter indices based upon the bands that require. For instance, we can filter the list to only indices that use the red and blue band of images:

filter_bands(bands = c("R", "B"))
#> # A tibble: 2 × 9
#>   application_domain bands     contributor    date_of_addition formula long_name
#>   <chr>              <list>    <chr>          <chr>            <chr>   <chr>    
#> 1 vegetation         <chr [2]> https://githu… 2022-04-08       (R - B… Kawashim…
#> 2 vegetation         <chr [2]> https://githu… 2022-04-08       (R^2.0… Modified…
#> # ℹ 3 more variables: platforms <list>, reference <chr>, short_name <chr>

Arguments to these functions let you control whether you’re looking for indices that match all platforms and bands you’ve specified, or any of them.

Calculating indices

But rsi doesn’t simply make these formulas available in R, it also helps you compute these indices from imagery, via the calculate_indices() function. This function takes your imagery and a subset of spectral_indices() as arguments, and creates a raster containing all of those indices as an output. We can use filter_bands() to quickly get a list of the indices we can compute from our Landsat imagery, and then calculate_indices() to compute all 128 of those indices from our images:

our_indices <- calculate_indices(
  our_imagery,
  filter_bands(bands = names(terra::rast(our_imagery))),
  "our_indices.tif"
)
terra::rast(our_indices) |>
  terra::plot()

Note that calculate_indices() is evaluating the formulas in the formula column of the spectral indices data frame as if they were code. These formulas are evaluated inside of a very limited environment, which doesn’t have access to the global environment or most R fixtures, which does reduce the amount of harm malicious code could do:

evil_index <- spectral_indices()[1, ]
evil_index$formula <- "base::system('echo OHNO')"
try(
  calculate_indices(
    our_imagery,
    evil_index,
    tempfile(fileext = ".tif")
  )
)
#> Error in base::system : could not find function "::"

But it’s worth scanning your formulas before running calculate_indices(), just to make sure you aren’t going to be accidentally running something surprising!

Stacking rasters

Last but not least, rsi also provides a way to combine disparate data sets covering the same geographic region into a single VRT, quickly creating a file that you can treat as a single raster without taking up much additional storage space. This is a great way to create predictor bricks from your indices and downloaded data which you can then use for model fitting and prediction:

combined_layers <- stack_rasters(
  c(our_imagery, our_dem, our_indices),
  tempfile(fileext = ".vrt")
)

terra::rast(combined_layers) |>
  terra::plot()