edge_dist
returns edge lists defined by a spatial distance within the
user defined threshold. The function expects a data.table
with
relocation data, individual identifiers and a threshold argument. The
threshold argument is used to specify the criteria for distance between
points which defines a group. Relocation data should be in two columns
representing the X and Y coordinates.
Usage
edge_dist(
DT = NULL,
threshold,
id = NULL,
coords = NULL,
timegroup,
splitBy = NULL,
returnDist = FALSE,
fillNA = TRUE
)
Arguments
- DT
input data.table
- threshold
distance for grouping points, in the units of the coordinates
- id
character string of ID column name
- coords
character vector of X coordinate and Y coordinate column names. Note: the order is assumed X followed by Y column names.
- timegroup
timegroup field in the DT within which the grouping will be calculated
- splitBy
(optional) character string or vector of grouping column name(s) upon which the grouping will be calculated
- returnDist
boolean indicating if the distance between individuals should be returned. If FALSE (default), only ID1, ID2 columns (and timegroup, splitBy columns if provided) are returned. If TRUE, another column "distance" is returned indicating the distance between ID1 and ID2.
- fillNA
boolean indicating if NAs should be returned for individuals that were not within the threshold distance of any other. If TRUE, NAs are returned. If FALSE, only edges between individuals within the threshold distance are returned.
Value
edge_dist
returns a data.table
with columns ID1, ID2,
timegroup (if supplied) and any columns provided in splitBy. If
'returnDist' is TRUE, column 'distance' is returned indicating the distance
between ID1 and ID2.
The ID1 and ID2 columns represent the edges defined by the spatial (and
temporal with group_times
) thresholds.
Details
The DT
must be a data.table
. If your data is a
data.frame
, you can convert it by reference using
data.table::setDT
.
The id
, coords
timegroup
(and optional splitBy
)
arguments expect the names of a column in DT
which correspond to the
individual identifier, X and Y coordinates, timegroup (generated by
group_times
) and additional grouping columns.
If provided, the threshold
must be provided in the units of the coordinates and must be larger than 0.
If the threshold
is NULL, the distance to all other individuals will be returned. The coordinates must be planar
coordinates (e.g.: UTM). In the case of UTM, a threshold
= 50 would
indicate a 50m distance threshold.
The timegroup
argument is required to define the temporal groups
within which edges are calculated. The intended framework is to group rows
temporally with group_times
then spatially with edge_dist
.
If you have already calculated temporal groups without
group_times
, you can pass this column to the timegroup
argument. Note that the expectation is that each individual will be observed
only once per timegroup. Caution that accidentally including huge numbers of
rows within timegroups can overload your machine since all pairwise distances
are calculated within each timegroup.
The splitBy
argument offers further control over grouping. If within
your DT
, you have multiple populations, subgroups or other distinct
parts, you can provide the name of the column which identifies them to
splitBy
. edge_dist
will only consider rows within each
splitBy
subgroup.
See also
Other Edge-list generation:
edge_nn()
Examples
# Load data.table
library(data.table)
# Read example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
#> ID X Y datetime population
#> <char> <num> <num> <POSc> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1
#> ---
#> 14293: J 700616.5 5509069 2017-02-28 14:00:54 1
#> 14294: J 700622.6 5509065 2017-02-28 16:00:11 1
#> 14295: J 700657.5 5509277 2017-02-28 18:00:55 1
#> 14296: J 700610.3 5509269 2017-02-28 20:00:48 1
#> 14297: J 700744.0 5508782 2017-02-28 22:00:39 1
# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 14293: J 700616.5 5509069 2017-02-28 14:00:54 1 0 1393
#> 14294: J 700622.6 5509065 2017-02-28 16:00:11 1 0 1394
#> 14295: J 700657.5 5509277 2017-02-28 18:00:55 1 0 1440
#> 14296: J 700610.3 5509269 2017-02-28 20:00:48 1 0 1395
#> 14297: J 700744.0 5508782 2017-02-28 22:00:39 1 0 1396
# Edge list generation
edges <- edge_dist(
DT,
threshold = 100,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
returnDist = TRUE,
fillNA = TRUE
)