group_pts groups rows into spatial groups. The function expects a
data.table with relocation data, individual identifiers and a
threshold argument. The threshold argument is used to specify the criteria
for distance between points which defines a group. Relocation data should be
in two columns representing the X and Y coordinates, or in a
geometry column prepared by the helper function
get_geometry().
Usage
group_pts(
DT = NULL,
threshold = NULL,
id = NULL,
coords = NULL,
timegroup,
crs = NULL,
splitBy = NULL,
geometry = "geometry"
)Arguments
- DT
input data.table
- threshold
distance for grouping points, either numeric or units, in the units of the crs / the coordinates or geometry
- id
character string of ID column name
- coords
character vector of X coordinate and Y coordinate column names. Note: the order is assumed X followed by Y column names
- timegroup
timegroup field in the DT within which the output will be calculated
- crs
numeric or character defining the coordinate reference system to be passed to sf::st_crs. For example, either
crs = "EPSG:32736"orcrs = 32736. Used only if coords are provided, see details under Interface- splitBy
(optional) character string or vector of grouping column name(s) upon which the output will be calculated
- geometry
simple feature geometry list column name, generated by
get_geometry(). Default 'geometry', see details under Interface
Value
group_pts returns the input DT appended with a
group column.
This column represents the spatialtemporal group. As with the other
grouping functions, the actual value of group is arbitrary and
represents the identity of a given group where 1 or more individuals are
assigned to a group. If the data was reordered, the group may
change, but the contents of each group would not.
A message is returned when a column named group already exists in
the input DT, because it will be overwritten.
See details for appending outputs using modify-by-reference in the FAQ.
Details
The DT must be a data.table. If your data is a
data.frame, you can convert it by reference using
data.table::setDT() or by reassigning using
data.table::data.table().
The id, timegroup (and optional splitBy)
arguments expect the names of columns in DT which correspond to the
individual identifier, and timegroup (generated by
group_times) and additional grouping columns.
The threshold provided should match the units of the coordinates. The
threshold can be provided with units specified using the units package (eg.
threshold = units::set_units(10, m)) which will be checked against the
units of the coordinates using the crs. If units are not specified, the
threshold is assumed to be in the units of the coordinates.
The timegroup argument is required to define the temporal groups
within which spatial groups are calculated. The intended framework is to
group rows temporally with group_times then spatially with
group_pts (or group_lines, group_polys).
If you have already calculated temporal groups without
group_times, you can pass this column to the timegroup
argument. Note that the expectation is that each individual will be observed
only once per timegroup. Caution that accidentally including huge numbers of
rows within timegroups can overload your machine since all pairwise distances
are calculated within each timegroup.
The splitBy argument offers further control over grouping. If within
your DT, you have multiple populations, subgroups or other distinct
parts, you can provide the name of the column which identifies them to
splitBy. The grouping performed by group_pts will only consider
rows within each splitBy subgroup.
See below under "Interface" for details on providing coordinates and under "Distance function" for details on underlying distance function used.
Interface
Two interfaces are available for providing coordinates:
Provide
coordsandcrs. Thecoordsargument expects the names of the X and Y coordinate columns. Thecrsargument expects a character string or numeric defining the coordinate reference system to be passed to sf::st_crs. For example, for UTM zone 36S (EPSG 32736), the crs argument iscrs = "EPSG:32736"orcrs = 32736. See https://spatialreference.org for a list of EPSG codes.(New!) Provide
geometry. Thegeometryargument allows the user to supply ageometrycolumn that represents the coordinates as a simple feature geometry list column. This interface expects the user to prepare their input DT withget_geometry(). To use this interface, leave thecoordsandcrsargumentsNULL, and the default argument forgeometry('geometry') will be used directly.
Distance function
The underlying distance function used depends on the crs of the coordinates or geometry provided.
If the crs is longlat degrees (as determined by
sf::st_is_longlat()), the distance function issf::st_distance()which passes tos2::s2_distance()ifsf::sf_use_s2()is TRUE andlwgeom::st_geod_distance()ifsf::sf_use_s2()is FALSE. The distance returned has units set according to the crs.If the crs is not longlat degrees (eg. NULL, NA_crs_, or projected), the distance function used is
stats::dist(), maintaining expected behaviour from previous versions. The distance returned does not have units set.
Note: in both cases, if the coordinates are NA then the result will be NA.
See also
group_times
Other Spatial grouping:
group_lines(),
group_polys()
Examples
# Load data.table
library(data.table)
# Read example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
# Select only individuals A, B, C for this example
DT <- DT[ID %in% c('A', 'B', 'C')]
# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
#> ID X Y datetime population
#> <char> <num> <num> <POSc> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1
# Temporal grouping
group_times(DT, datetime = 'datetime', threshold = '20 minutes')
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1 0 1393
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1 0 1394
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1 0 1440
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1 0 1395
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1 0 1396
# Spatial grouping with timegroup
group_pts(DT, threshold = 5, id = 'ID',
coords = c('X', 'Y'), timegroup = 'timegroup')
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1 0 1393
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1 0 1394
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1 0 1440
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1 0 1395
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1 0 1396
#> group
#> <int>
#> 1: 1
#> 2: 2
#> 3: 3
#> 4: 4
#> 5: 5
#> ---
#> 4265: 4228
#> 4266: 4229
#> 4267: 4230
#> 4268: 4231
#> 4269: 4232
# Spatial grouping with timegroup and splitBy on population
group_pts(DT, threshold = 5, id = 'ID', coords = c('X', 'Y'),
timegroup = 'timegroup', splitBy = 'population')
#> group column will be overwritten by this function
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1 0 1393
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1 0 1394
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1 0 1440
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1 0 1395
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1 0 1396
#> group
#> <int>
#> 1: 1
#> 2: 2
#> 3: 3
#> 4: 4
#> 5: 5
#> ---
#> 4265: 4228
#> 4266: 4229
#> 4267: 4230
#> 4268: 4231
#> 4269: 4232
# Or, using the new geometry interface
get_geometry(DT, coords = c('X', 'Y'), crs = 32736)
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1 0 1393
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1 0 1394
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1 0 1440
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1 0 1395
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1 0 1396
#> group geometry
#> <int> <sfc_POINT>
#> 1: 1 POINT (715851.4 5505340)
#> 2: 2 POINT (715822.8 5505289)
#> 3: 3 POINT (715872.9 5505252)
#> 4: 4 POINT (715820.5 5505231)
#> 5: 5 POINT (715830.6 5505227)
#> ---
#> 4265: 4228 POINT (702093.6 5510180)
#> 4266: 4229 POINT (702086 5510183)
#> 4267: 4230 POINT (702961.8 5509447)
#> 4268: 4231 POINT (703130.4 5509528)
#> 4269: 4232 POINT (702872.3 5508531)
group_pts(DT, threshold = 50, id = 'ID', timegroup = 'timegroup')
#> group column will be overwritten by this function
#> ID X Y datetime population minutes timegroup
#> <char> <num> <num> <POSc> <int> <int> <int>
#> 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
#> 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
#> 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
#> 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
#> 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
#> ---
#> 4265: C 702093.6 5510180 2017-02-28 14:00:44 1 0 1393
#> 4266: C 702086.0 5510183 2017-02-28 16:00:42 1 0 1394
#> 4267: C 702961.8 5509447 2017-02-28 18:00:53 1 0 1440
#> 4268: C 703130.4 5509528 2017-02-28 20:00:54 1 0 1395
#> 4269: C 702872.3 5508531 2017-02-28 22:00:18 1 0 1396
#> geometry group
#> <sfc_POINT> <int>
#> 1: POINT (715851.4 5505340) 1
#> 2: POINT (715822.8 5505289) 2
#> 3: POINT (715872.9 5505252) 3
#> 4: POINT (715820.5 5505231) 4
#> 5: POINT (715830.6 5505227) 5
#> ---
#> 4265: POINT (702093.6 5510180) 1393
#> 4266: POINT (702086 5510183) 1394
#> 4267: POINT (702961.8 5509447) 3719
#> 4268: POINT (703130.4 5509528) 1395
#> 4269: POINT (702872.3 5508531) 3720
