Skip to contents

edge_dist returns edge-lists defined by a spatial distance within the user defined threshold. The function expects a data.table with relocation data, individual identifiers and a threshold argument. The threshold argument is used to specify the criteria for distance between points which defines a group. Relocation data should be in two columns representing the X and Y coordinates, or in a geometry column prepared by the helper function get_geometry().

Usage

edge_dist(
  DT = NULL,
  threshold,
  id = NULL,
  coords = NULL,
  timegroup,
  crs = NULL,
  splitBy = NULL,
  geometry = "geometry",
  returnDist = FALSE,
  fillNA = TRUE
)

Arguments

DT

input data.table

threshold

distance for grouping points, either numeric or units, in the units of the crs / the coordinates or geometry

id

character string of ID column name

coords

character vector of X coordinate and Y coordinate column names. Note: the order is assumed X followed by Y column names

timegroup

timegroup field in the DT within which the output will be calculated

crs

numeric or character defining the coordinate reference system to be passed to sf::st_crs. For example, either crs = "EPSG:32736" or crs = 32736. Used only if coords are provided, see details under Interface

splitBy

(optional) character string or vector of grouping column name(s) upon which the output will be calculated

geometry

simple feature geometry list column name, generated by get_geometry(). Default 'geometry', see details under Interface

returnDist

logical indicating if the distance between individuals should be returned. If FALSE (default), only individual columns (and timegroup, splitBy columns if provided) are returned. If TRUE, a column "distance" is also returned indicating the distance between individuals in the units of the crs, or if crs = NULL no units are set.

fillNA

logical indicating if NAs should be returned for individuals that were not within the threshold distance of any other. If TRUE, NAs are returned. If FALSE, only edges between individuals within the threshold distance are returned.

Value

edge_dist returns a data.table with columns ID1, ID2, timegroup (if supplied) and any columns provided in splitBy. If 'returnDist' is TRUE, column 'distance' is returned indicating the distance between ID1 and ID2.

The ID1 and ID2 columns represent the edges defined by the spatial (and temporal with group_times) thresholds.

Note: unlike many other functions (eg. group_pts) in spatsoc, edge_dist needs to be reassigned. See details in FAQ.

Details

The DT must be a data.table. If your data is a data.frame, you can convert it by reference using data.table::setDT().

The id, timegroup (and optional splitBy) arguments expect the names of columns in DT which correspond to the individual identifier, and timegroup (generated by group_times) and additional grouping columns.

The threshold provided should match the units of the coordinates. The threshold can be provided with units specified using the units package (eg. threshold = units::set_units(10, m)) which will be checked against the units of the coordinates using the crs. If units are not specified, the threshold is assumed to be in the units of the coordinates.

The timegroup argument is required to define the temporal groups within which edges are calculated. The intended framework is to group rows temporally with group_times then spatially with edge_dist. If you have already calculated temporal groups without group_times, you can pass this column to the timegroup argument. Note that the expectation is that each individual will be observed only once per timegroup. Caution that accidentally including huge numbers of rows within timegroups can overload your machine since all pairwise distances are calculated within each timegroup.

The splitBy argument offers further control over grouping. If within your DT, you have multiple populations, subgroups or other distinct parts, you can provide the name of the column which identifies them to splitBy. edge_dist will only consider rows within each splitBy subgroup.

See below under "Interface" for details on providing coordinates and under "Distance function" for details on underlying distance function used.

Interface

Two interfaces are available for providing coordinates:

  1. Provide coords and crs. The coords argument expects the names of the X and Y coordinate columns. The crs argument expects a character string or numeric defining the coordinate reference system to be passed to sf::st_crs. For example, for UTM zone 36S (EPSG 32736), the crs argument is crs = "EPSG:32736" or crs = 32736. See https://spatialreference.org for a list of EPSG codes.

  2. (New!) Provide geometry. The geometry argument allows the user to supply a geometry column that represents the coordinates as a simple feature geometry list column. This interface expects the user to prepare their input DT with get_geometry(). To use this interface, leave the coords and crs arguments NULL, and the default argument for geometry ('geometry') will be used directly.

Distance function

The underlying distance function used depends on the crs of the coordinates or geometry provided.

Note: in both cases, if the coordinates are NA then the result will be NA.

Examples

# Load data.table
library(data.table)

# Read example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
#>            ID        X       Y            datetime population
#>        <char>    <num>   <num>              <POSc>      <int>
#>     1:      A 715851.4 5505340 2016-11-01 00:00:54          1
#>     2:      A 715822.8 5505289 2016-11-01 02:01:22          1
#>     3:      A 715872.9 5505252 2016-11-01 04:01:24          1
#>     4:      A 715820.5 5505231 2016-11-01 06:01:05          1
#>     5:      A 715830.6 5505227 2016-11-01 08:01:11          1
#>    ---                                                       
#> 14293:      J 700616.5 5509069 2017-02-28 14:00:54          1
#> 14294:      J 700622.6 5509065 2017-02-28 16:00:11          1
#> 14295:      J 700657.5 5509277 2017-02-28 18:00:55          1
#> 14296:      J 700610.3 5509269 2017-02-28 20:00:48          1
#> 14297:      J 700744.0 5508782 2017-02-28 22:00:39          1

# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#>            ID        X       Y            datetime population minutes timegroup
#>        <char>    <num>   <num>              <POSc>      <int>   <int>     <int>
#>     1:      A 715851.4 5505340 2016-11-01 00:00:54          1       0         1
#>     2:      A 715822.8 5505289 2016-11-01 02:01:22          1       0         2
#>     3:      A 715872.9 5505252 2016-11-01 04:01:24          1       0         3
#>     4:      A 715820.5 5505231 2016-11-01 06:01:05          1       0         4
#>     5:      A 715830.6 5505227 2016-11-01 08:01:11          1       0         5
#>    ---                                                                         
#> 14293:      J 700616.5 5509069 2017-02-28 14:00:54          1       0      1393
#> 14294:      J 700622.6 5509065 2017-02-28 16:00:11          1       0      1394
#> 14295:      J 700657.5 5509277 2017-02-28 18:00:55          1       0      1440
#> 14296:      J 700610.3 5509269 2017-02-28 20:00:48          1       0      1395
#> 14297:      J 700744.0 5508782 2017-02-28 22:00:39          1       0      1396

# Edge-list generation
edges <- edge_dist(
    DT,
    threshold = 100,
    id = 'ID',
    coords = c('X', 'Y'),
    timegroup = 'timegroup',
    crs = 32736,
    returnDist = TRUE,
    fillNA = TRUE
  )

# Or, using the new geometry interface
get_geometry(DT, coords = c('X', 'Y'), crs = 32736)
#>            ID        X       Y            datetime population minutes timegroup
#>        <char>    <num>   <num>              <POSc>      <int>   <int>     <int>
#>     1:      A 715851.4 5505340 2016-11-01 00:00:54          1       0         1
#>     2:      A 715822.8 5505289 2016-11-01 02:01:22          1       0         2
#>     3:      A 715872.9 5505252 2016-11-01 04:01:24          1       0         3
#>     4:      A 715820.5 5505231 2016-11-01 06:01:05          1       0         4
#>     5:      A 715830.6 5505227 2016-11-01 08:01:11          1       0         5
#>    ---                                                                         
#> 14293:      J 700616.5 5509069 2017-02-28 14:00:54          1       0      1393
#> 14294:      J 700622.6 5509065 2017-02-28 16:00:11          1       0      1394
#> 14295:      J 700657.5 5509277 2017-02-28 18:00:55          1       0      1440
#> 14296:      J 700610.3 5509269 2017-02-28 20:00:48          1       0      1395
#> 14297:      J 700744.0 5508782 2017-02-28 22:00:39          1       0      1396
#>                        geometry
#>                     <sfc_POINT>
#>     1: POINT (715851.4 5505340)
#>     2: POINT (715822.8 5505289)
#>     3: POINT (715872.9 5505252)
#>     4: POINT (715820.5 5505231)
#>     5: POINT (715830.6 5505227)
#>    ---                         
#> 14293: POINT (700616.5 5509069)
#> 14294: POINT (700622.6 5509065)
#> 14295: POINT (700657.5 5509277)
#> 14296: POINT (700610.3 5509269)
#> 14297:   POINT (700744 5508782)
edge_dist(DT, threshold = 100, id = 'ID', timegroup = 'timegroup', returnDist = TRUE)
#> Key: <timegroup, ID1>
#>        timegroup    ID1    ID2  distance
#>            <int> <char> <char>     <num>
#>     1:         1      A   <NA>        NA
#>     2:         1      B      G  5.782904
#>     3:         1      C   <NA>        NA
#>     4:         1      D   <NA>        NA
#>     5:         1      E      H 65.061671
#>    ---                                  
#> 22985:      1440      G   <NA>        NA
#> 22986:      1440      H   <NA>        NA
#> 22987:      1440      I      C  2.831071
#> 22988:      1440      I      F  7.512922
#> 22989:      1440      J   <NA>        NA