Using edge list generating functions and dyad_id
Source:vignettes/using-edge-and-dyad.Rmd
using-edge-and-dyad.Rmd
spatsoc
can be used in social network analysis to
generate edge lists from GPS relocation data.
Edge lists are generated using either the edge_dist
or
the edge_nn
function.
Note: The grouping functions and their application in social network analysis are further described in the vignette Using spatsoc in social network analysis - grouping functions.
Generate edge lists
spatsoc provides users with one temporal (group_times
)
and two edge list generating functions (edge_dist
,
edge_nn
) to generate edge lists from GPS relocations. Users
can consider edges defined by either the spatial proximity between
individuals (with edge_dist
), by nearest neighbour (with
edge_nn
) or by nearest neighbour with a maximum distance
(with edge_nn
). The edge lists can be used directly by the
animal social network package asnipe
to generate
networks.
1. Load packages and prepare data
spatsoc
expects a data.table
for all
DT
arguments and date time columns to be formatted
POSIXct
.
## Load packages
library(spatsoc)
library(data.table)
## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
Next, we will group relocations temporally with
group_times
and generate edges lists with one of
edge_dist
, edge_dist
. Note: these are mutually
exclusive, only select one edge list generating function at a time.
2. a) edge_dist
Distance based edge lists where relocations in each timegroup are
considered edges if they are within the spatial distance defined by the
user with the threshold
argument. Depending on species and
study system, relevant temporal and spatial distance thresholds are
used. In this case, relocations within 5 minutes and 50 meters are
considered edges.
This is the non-chain rule implementation similar to
group_pts
. Edges are defined by the distance threshold and
NAs are returned for individuals within each timegroup if they are not
within the threshold distance of any other individual (if
fillNA
is TRUE).
Optionally, edge_dist
can return the distances between
individuals (less than the threshold) in a column named ‘distance’ with
argument returnDist = TRUE
.
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 14293: J 700616.5 5509069 2017-02-28 14:00:54 1 0 1393
#> 14294: J 700622.6 5509065 2017-02-28 16:00:11 1 0 1394
#> 14295: J 700657.5 5509277 2017-02-28 18:00:55 1 0 1449
#> 14296: J 700610.3 5509269 2017-02-28 20:00:48 1 0 1395
#> 14297: J 700744.0 5508782 2017-02-28 22:00:39 1 0 1396
# Edge list generation
edges <- edge_dist(
DT,
threshold = 100,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
returnDist = TRUE,
fillNA = TRUE
)
2. b) edge_nn
Nearest neighbour based edge lists where each individual is connected
to their nearest neighbour. edge_nn
can be used to generate
edge lists defined either by nearest neighbour or nearest neighbour with
a maximum distance. As with grouping functions and
edge_dist
, temporal and spatial threshold depend on species
and study system.
NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold.
Optionally, edge_nn
can return the distances between
individuals (less than the threshold) in a column named ‘distance’ with
argument returnDist = TRUE
.
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
# Edge list generation
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)
# Edge list generation using maximum distance threshold
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100
)
# Edge list generation using maximum distance threshold, returning distances
edges <- edge_nn(
DT,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
threshold = 100,
returnDist = TRUE
)
Dyads
3. dyad_id
The function dyad_id
can be used to generate a unique,
undirected dyad identifier for edge lists.
# In this case, using the edges generated in 2. a) edge_dist
dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
#> Key: <timegroup, ID1>
#> timegroup ID1 ID2 distance dyadID
#> <int> <char> <char> <num> <char>
#> 1: 1 A <NA> NA <NA>
#> 2: 1 B G 5.782904 B-G
#> 3: 1 C <NA> NA <NA>
#> 4: 1 D <NA> NA <NA>
#> 5: 1 E H 65.061671 E-H
#> ---
#> 22942: 1457 G <NA> NA <NA>
#> 22943: 1458 H <NA> NA <NA>
#> 22944: 1459 I <NA> NA <NA>
#> 22945: 1460 J <NA> NA <NA>
#> 22946: 1461 J <NA> NA <NA>
Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. Note: since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2.
4. Dyad stats
# Get the unique dyads by timegroup
# NOTE: we are explicitly selecting only where dyadID is not NA
dyads <- unique(edges[!is.na(dyadID)], by = c('timegroup', 'dyadID'))
# NOTE: if we wanted to also include where dyadID is NA, we should do it explicitly
# dyadNN <- unique(DT[!is.na(NN)], by = c('timegroup', 'dyadID'))
# Get where NN was NA
# dyadNA <- DT[is.na(NN)]
# Combine where NN is NA
# dyads <- rbindlist(list(dyadNN, dyadNA))
# Set the order of the rows
setorder(dyads, timegroup)
## Count number of timegroups dyads are observed together
dyads[, nObs := .N, by = .(dyadID)]
## Count consecutive relocations together
# Shift the timegroup within dyadIDs
dyads[, shifttimegrp := shift(timegroup, 1), by = dyadID]
# Difference between consecutive timegroups for each dyadID
# where difftimegrp == 1, the dyads remained together in consecutive timegroups
dyads[, difftimegrp := timegroup - shifttimegrp]
# Run id of diff timegroups
dyads[, runid := rleid(difftimegrp), by = dyadID]
# N consecutive observations of dyadIDs
dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]
## Start and end of consecutive relocations for each dyad
# Dont consider where runs aren't more than one relocation
dyads[runCount > 1, start := fifelse(timegroup == min(timegroup), TRUE, FALSE), by = .(runid, dyadID)]
dyads[runCount > 1, end := fifelse(timegroup == max(timegroup), TRUE, FALSE), by = .(runid, dyadID)]
## Example output
dyads[dyadID == 'B-H',
.(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
#> timegroup nObs shifttimegrp difftimegrp runid runCount start end
#> <int> <int> <int> <int> <int> <int> <lgcl> <lgcl>
#> 1: 1340 29 NA NA 1 NA NA NA
#> 2: 1341 29 1340 1 2 3 TRUE FALSE
#> 3: 1342 29 1341 1 2 3 FALSE FALSE
#> 4: 1343 29 1342 1 2 3 FALSE TRUE
#> 5: 1346 29 1343 3 3 NA NA NA
#> 6: 1347 29 1346 1 4 3 TRUE FALSE
#> 7: 1348 29 1347 1 4 3 FALSE FALSE
#> 8: 1349 29 1348 1 4 3 FALSE TRUE
#> 9: 1351 29 1349 2 5 NA NA NA
#> 10: 1356 29 1351 5 6 NA NA NA
#> 11: 1357 29 1356 1 7 2 TRUE FALSE
#> 12: 1358 29 1357 1 7 2 FALSE TRUE
#> 13: 1360 29 1358 2 8 NA NA NA
#> 14: 1361 29 1360 1 9 1 NA NA
#> 15: 1364 29 1361 3 10 NA NA NA
#> 16: 1383 29 1364 19 11 NA NA NA
#> 17: 1384 29 1383 1 12 7 TRUE FALSE
#> 18: 1385 29 1384 1 12 7 FALSE FALSE
#> 19: 1386 29 1385 1 12 7 FALSE FALSE
#> 20: 1387 29 1386 1 12 7 FALSE FALSE
#> 21: 1388 29 1387 1 12 7 FALSE FALSE
#> 22: 1389 29 1388 1 12 7 FALSE FALSE
#> 23: 1390 29 1389 1 12 7 FALSE TRUE
#> 24: 1392 29 1390 2 13 NA NA NA
#> 25: 1393 29 1392 1 14 3 TRUE FALSE
#> 26: 1394 29 1393 1 14 3 FALSE FALSE
#> 27: 1395 29 1394 1 14 3 FALSE TRUE
#> 28: 1445 29 1395 50 15 NA NA NA
#> 29: 1446 29 1445 1 16 1 NA NA
#> timegroup nObs shifttimegrp difftimegrp runid runCount start end