Run a randomization analysis for one or more biodiversity metrics
Source:R/cpr_rand_test.R
cpr_rand_test.Rd
The observed value of the biodiversity metric(s) will be calculated for the input community data, then compared against a set of random communities. Various statistics are calculated from the comparison (see Value below).
Arguments
- comm
Dataframe, tibble, or matrix; input community data with sites (communities) as rows and species as columns. Either presence-absence data (values only 0s or 1s) or abundance data (values >= 0) accepted, but calculations do not use abundance-weighting, so results from abundance data will be the same as if converted to presence-absence before analysis.
- phy
List of class
phylo
; input phylogeny.- null_model
Character vector of length 1 or object of class
commsim
; either the name of the model to use for generating random communities (null model), or a custom null model. For full list of available predefined null models, see the help file ofvegan::commsim()
, or runvegan::make.commsim()
. An object of classcommsim
can be generated withvegan::commsim()
(see Examples incpr_rand_comm()
).- n_reps
Numeric vector of length 1; number of random communities to replicate.
- n_iterations
Numeric vector of length 1; number of iterations to use for sequential null models; ignored for non-sequential models.
- thin
Numeric vector of length 1; thinning parameter used by some null models in
vegan
(e.g.,quasiswap
); ignored for other models.- metrics
Character vector; names of biodiversity metrics to calculate. May include one or more of:
pd
,rpd
,pe
,rpe
(case-sensitive).- site_col
Character vector of length 1; name of column in
comm
that contains the site names; only used ifcomm
is a tibble (object of classtbl_df
).- tbl_out
Logical vector of length 1; should the output be returned as a tibble? If
FALSE
, will return a dataframe. Defaults toTRUE
ifcomm
is a tibble.- quiet
Logical vector of length 1; if
TRUE
, suppress all warnings and messages that would be emitted by this function.
Value
Dataframe. For each of the biodiversity metrics, the following 9 columns will be produced:
*_obs
: Observed value*_obs_c_lower
: Count of times observed value was lower than random values*_obs_c_upper
: Count of times observed value was higher than random values*_obs_p_lower
: Percentage of times observed value was lower than random values*_obs_p_upper
: Percentage of times observed value was higher than random values*_obs_q
: Count of the non-NA random values used for comparison*_obs_z
: Standard effect size (z-score)*_rand_mean
: Mean of the random values*_rand_sd
: Standard deviation of the random values
So if you included pd
in metrics
, the output columns would include
pd_obs
, pd_obs_c_lower
, etc...
Details
The biodiversity metrics (metrics
) available for analysis include:
pd
: Phylogenetic diversity (Faith 1992)rpd
: Relative phylogenetic diversity (Mishler et al 2014)pe
: Phylogenetic endemism (Rosauer et al 2009)rpe
: Relative phylogenetic endemism (Mishler et al 2014)
(pe
and rpe
are needed for CANAPE with
cpr_classify_endem()
)
The choice of a randomization algorithm (null_model
) is not trivial, and
may strongly affect results. cpr_rand_test()
uses null models provided by
vegan
; for a complete list, see the help file of vegan::commsim()
or run
vegan::make.commsim()
. One frequently used null model is swap
(Gotelli &
Entsminger 2003), which randomizes the community matrix while preserving
column and row sums (marginal sums). For a review of various null models, see
Strona et al. (2018); swap
is an "FF" model in the sense of Strona et al.
(2018).
Instead of using one of the predefined null models in vegan::commsim()
, it
is also possible to define a custom null model; see Examples in
cpr_rand_comm()
Note that the pre-defined models in vegan
include binary models (designed
for presence-absence data) and quantitative models (designed for abundance
data). Although the binary models will accept abundance data, they treat it
as binary and always return a binary (presence-absence) matrix. The PD and PE
calculations in canaper
are not abundance-weighted, so they return the same
result regardless of whether the input is presence-absence or abundance. In
that sense, binary null models are appropriate for cpr_rand_test()
. The
quantitative models could also be used for abundance data, but the output
will be treated as binary anyways when calculating PD and PE. The effects of
using binary vs. quantitative null models for cpr_rand_test()
have not been
investigated.
A minimum of 5 species and sites are required as input; fewer than that is
likely cause the some randomization algorithms (e.g., swap
) to enter an
infinite loop. Besides, inferences on very small numbers of species and/or
sites is not recommended generally.
The following rules apply to comm
input:
If dataframe or matrix, must include row names (site names) and column names (species names).
If tibble, a single column (default,
site
) must be included with site names, and other columns must correspond to species names.Column names cannot start with a number and must be unique.
Row names (site names) must be unique.
Values (other than site names) should only include integers >= 0; non-integer input will be converted to integer.
The results are identical regardless of whether the input for comm
is
abundance or presence-absence data (i.e., abundance weighting is not used).
References
Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation, 61:1–10. doi:10.1016/0006-3207(92)91201-3
Gotelli, N.J. and Entsminger, N.J. (2003). Swap algorithms in null model analysis. Ecology 84, 532–535.
Mishler, B., Knerr, N., González-Orozco, C. et al. (2014) Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia. Nat Commun, 5: 4473. doi:10.1038/ncomms5473
Rosauer, D., Laffan, S.W., Crisp, M.D., Donnellan, S.C. and Cook, L.G. (2009) Phylogenetic endemism: a new approach for identifying geographical concentrations of evolutionary history. Molecular Ecology, 18: 4061-4072. doi:10.1111/j.1365-294X.2009.04311.x
Strona, G., Ulrich, W. and Gotelli, N.J. (2018), Bi-dimensional null model analysis of presence-absence binary matrices. Ecology, 99: 103-115. doi:10.1002/ecy.2043
Examples
# \donttest{
set.seed(12345)
data(phylocom)
# Returns a dataframe by defualt
cpr_rand_test(
phylocom$comm, phylocom$phy,
null_model = "curveball", metrics = "pd", n_reps = 10
)
#> Warning: Abundance data detected. Results will be the same as if using presence/absence data (no abundance weighting is used).
#> Warning: Dropping tips from the tree because they are not present in the community data:
#> sp16, sp23, sp27, sp28, sp30, sp31, sp32
#> pd_obs pd_rand_mean pd_rand_sd pd_obs_z pd_obs_c_upper
#> clump1 0.3018868 0.4622642 0.03799700 -4.220790 0
#> clump2a 0.3207547 0.4679245 0.03182166 -4.624831 0
#> clump2b 0.3396226 0.4698113 0.03009682 -4.325662 0
#> clump4 0.4150943 0.4698113 0.03138358 -1.743490 0
#> even 0.5660377 0.4773585 0.02819687 3.145003 10
#> random 0.5094340 0.4622642 0.02223606 2.121320 10
#> pd_obs_c_lower pd_obs_q pd_obs_p_upper pd_obs_p_lower
#> clump1 10 10 0 1
#> clump2a 10 10 0 1
#> clump2b 10 10 0 1
#> clump4 10 10 0 1
#> even 0 10 1 0
#> random 0 10 1 0
# Tibbles may be preferable because of the large number of columns
cpr_rand_test(
phylocom$comm, phylocom$phy,
null_model = "curveball", tbl_out = TRUE, n_reps = 10
)
#> Warning: Abundance data detected. Results will be the same as if using presence/absence data (no abundance weighting is used).
#> Warning: Dropping tips from the tree because they are not present in the community data:
#> sp16, sp23, sp27, sp28, sp30, sp31, sp32
#> # A tibble: 6 × 55
#> site pd_obs pd_rand_mean pd_rand_sd pd_obs_z pd_obs_c_upper pd_obs_c_lower
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 clump1 0.302 0.472 0.0308 -5.51 0 10
#> 2 clump2a 0.321 0.464 0.0270 -5.32 0 10
#> 3 clump2b 0.340 0.458 0.0418 -2.85 0 10
#> 4 clump4 0.415 0.475 0.0425 -1.42 0 8
#> 5 even 0.566 0.474 0.0288 3.22 10 0
#> 6 random 0.509 0.472 0.0436 0.866 7 1
#> # ℹ 48 more variables: pd_obs_q <dbl>, pd_obs_p_upper <dbl>,
#> # pd_obs_p_lower <dbl>, pd_alt_obs <dbl>, pd_alt_rand_mean <dbl>,
#> # pd_alt_rand_sd <dbl>, pd_alt_obs_z <dbl>, pd_alt_obs_c_upper <dbl>,
#> # pd_alt_obs_c_lower <dbl>, pd_alt_obs_q <dbl>, pd_alt_obs_p_upper <dbl>,
#> # pd_alt_obs_p_lower <dbl>, rpd_obs <dbl>, rpd_rand_mean <dbl>,
#> # rpd_rand_sd <dbl>, rpd_obs_z <dbl>, rpd_obs_c_upper <dbl>,
#> # rpd_obs_c_lower <dbl>, rpd_obs_q <dbl>, rpd_obs_p_upper <dbl>, …
# }