Skip to contents

edge_nn returns edge lists defined by the nearest neighbour. The function expects a data.table with relocation data, individual identifiers and a threshold argument. The threshold argument is used to specify the criteria for distance between points which defines a group. Relocation data should be in two columns representing the X and Y coordinates.

Usage

edge_nn(
  DT = NULL,
  id = NULL,
  coords = NULL,
  timegroup,
  splitBy = NULL,
  threshold = NULL,
  returnDist = FALSE
)

Arguments

DT

input data.table

id

character string of ID column name

coords

character vector of X coordinate and Y coordinate column names. Note: the order is assumed X followed by Y column names.

timegroup

timegroup field in the DT within which the grouping will be calculated

splitBy

(optional) character string or vector of grouping column name(s) upon which the grouping will be calculated

threshold

(optional) spatial distance threshold to set maximum distance between an individual and their neighbour.

returnDist

boolean indicating if the distance between individuals should be returned. If FALSE (default), only ID, NN columns (and timegroup, splitBy columns if provided) are returned. If TRUE, another column "distance" is returned indicating the distance between ID and NN.

Value

edge_nn returns a data.table with three columns: timegroup, ID and NN. If 'returnDist' is TRUE, column 'distance' is returned indicating the distance between ID and NN.

The ID and NN columns represent the edges defined by the nearest neighbours (and temporal thresholds with group_times).

If an individual was alone in a timegroup or splitBy, or did not have any neighbours within the threshold distance, they are assigned NA for nearest neighbour.

Details

The DT must be a data.table. If your data is a data.frame, you can convert it by reference using data.table::setDT.

The id, coords, timegroup (and optional splitBy) arguments expect the names of a column in DT which correspond to the individual identifier, X and Y coordinates, timegroup (generated by group_times) and additional grouping columns.

The threshold must be provided in the units of the coordinates. The threshold must be larger than 0. The coordinates must be planar coordinates (e.g.: UTM). In the case of UTM, a threshold = 50 would indicate a 50m distance threshold.

The timegroup argument is required to define the temporal groups within which edge nearest neighbours are calculated. The intended framework is to group rows temporally with group_times then spatially with edge_nn. If you have already calculated temporal groups without group_times, you can pass this column to the timegroup argument. Note that the expectation is that each individual will be observed only once per timegroup. Caution that accidentally including huge numbers of rows within timegroups can overload your machine since all pairwise distances are calculated within each timegroup.

The splitBy argument offers further control over grouping. If within your DT, you have multiple populations, subgroups or other distinct parts, you can provide the name of the column which identifies them to splitBy. edge_nn will only consider rows within each splitBy subgroup.

See also

Other Edge-list generation: edge_dist()

Examples

# Load data.table
library(data.table)

# Read example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

# Select only individuals A, B, C for this example
DT <- DT[ID %in% c('A', 'B', 'C')]

# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
#>           ID        X       Y            datetime population
#>       <char>    <num>   <num>              <POSc>      <int>
#>    1:      A 715851.4 5505340 2016-11-01 00:00:54          1
#>    2:      A 715822.8 5505289 2016-11-01 02:01:22          1
#>    3:      A 715872.9 5505252 2016-11-01 04:01:24          1
#>    4:      A 715820.5 5505231 2016-11-01 06:01:05          1
#>    5:      A 715830.6 5505227 2016-11-01 08:01:11          1
#>   ---                                                       
#> 4265:      C 702093.6 5510180 2017-02-28 14:00:44          1
#> 4266:      C 702086.0 5510183 2017-02-28 16:00:42          1
#> 4267:      C 702961.8 5509447 2017-02-28 18:00:53          1
#> 4268:      C 703130.4 5509528 2017-02-28 20:00:54          1
#> 4269:      C 702872.3 5508531 2017-02-28 22:00:18          1

# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#>           ID        X       Y            datetime population minutes timegroup
#>       <char>    <num>   <num>              <POSc>      <int>   <int>     <int>
#>    1:      A 715851.4 5505340 2016-11-01 00:00:54          1       0         1
#>    2:      A 715822.8 5505289 2016-11-01 02:01:22          1       0         2
#>    3:      A 715872.9 5505252 2016-11-01 04:01:24          1       0         3
#>    4:      A 715820.5 5505231 2016-11-01 06:01:05          1       0         4
#>    5:      A 715830.6 5505227 2016-11-01 08:01:11          1       0         5
#>   ---                                                                         
#> 4265:      C 702093.6 5510180 2017-02-28 14:00:44          1       0      1393
#> 4266:      C 702086.0 5510183 2017-02-28 16:00:42          1       0      1394
#> 4267:      C 702961.8 5509447 2017-02-28 18:00:53          1       0      1440
#> 4268:      C 703130.4 5509528 2017-02-28 20:00:54          1       0      1395
#> 4269:      C 702872.3 5508531 2017-02-28 22:00:18          1       0      1396

# Edge list generation
edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
        timegroup = 'timegroup')

# Edge list generation using maximum distance threshold
edges <- edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
        timegroup = 'timegroup', threshold = 100)

# Edge list generation, returning distance between nearest neighbours
edge_nn(DT, id = 'ID', coords = c('X', 'Y'),
        timegroup = 'timegroup', threshold = 100,
        returnDist = TRUE)
#>       timegroup     ID     NN distance
#>           <int> <char> <char>    <num>
#>    1:         1      A   <NA>       NA
#>    2:         1      B   <NA>       NA
#>    3:         1      C   <NA>       NA
#>    4:         2      A   <NA>       NA
#>    5:         2      B   <NA>       NA
#>   ---                                 
#> 4265:      1438      C   <NA>       NA
#> 4266:      1439      B   <NA>       NA
#> 4267:      1439      C   <NA>       NA
#> 4268:      1440      B   <NA>       NA
#> 4269:      1440      C   <NA>       NA