Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.
Usage
cc_dupl(
x,
lon = "decimallongitude",
lat = "decimallatitude",
species = "species",
additions = NULL,
value = "clean",
verbose = TRUE
)
Arguments
- x
data.frame. Containing geographical coordinates and species names.
- lon
character string. The column with the longitude coordinates. Default = “decimallongitude”.
- lat
character string. The column with the latitude coordinates. Default = “decimallatitude”.
- species
a character string. The column with the species name. Default = “species”.
- additions
a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number.
- value
character string. Defining the output value. See value.
- verbose
logical. If TRUE reports the name of the test and the number of records flagged.
Value
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = “clean”.
Examples
x <- data.frame(species = letters[1:10],
decimallongitude = sample(x = 0:10, size = 100, replace = TRUE),
decimallatitude = sample(x = 0:10, size = 100, replace = TRUE),
collector = "Bonpl",
collector.number = c(1001, 354),
collection = rep(c("K", "WAG","FR", "P", "S"), 20))
cc_dupl(x, value = "flagged")
#> Testing duplicates
#> Flagged 6 records.
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
#> [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [61] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
#> [73] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
#> [85] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [97] TRUE TRUE TRUE TRUE
cc_dupl(x, additions = c("collector", "collector.number"))
#> Testing duplicates
#> Removed 6 records.
#> species decimallongitude decimallatitude collector collector.number
#> 1 a 3 9 Bonpl 1001
#> 2 b 1 8 Bonpl 354
#> 3 c 1 8 Bonpl 1001
#> 4 d 6 3 Bonpl 354
#> 5 e 2 1 Bonpl 1001
#> 6 f 8 8 Bonpl 354
#> 7 g 9 0 Bonpl 1001
#> 8 h 6 4 Bonpl 354
#> 9 i 6 6 Bonpl 1001
#> 10 j 3 2 Bonpl 354
#> 11 a 9 7 Bonpl 1001
#> 12 b 10 4 Bonpl 354
#> 13 c 9 5 Bonpl 1001
#> 14 d 6 5 Bonpl 354
#> 15 e 3 4 Bonpl 1001
#> 16 f 4 0 Bonpl 354
#> 17 g 8 4 Bonpl 1001
#> 18 h 5 8 Bonpl 354
#> 19 i 3 1 Bonpl 1001
#> 20 j 1 4 Bonpl 354
#> 21 a 7 10 Bonpl 1001
#> 22 b 9 8 Bonpl 354
#> 23 c 8 5 Bonpl 1001
#> 24 d 6 8 Bonpl 354
#> 25 e 1 5 Bonpl 1001
#> 26 f 2 5 Bonpl 354
#> 27 g 0 6 Bonpl 1001
#> 28 h 3 0 Bonpl 354
#> 29 i 4 3 Bonpl 1001
#> 30 j 7 10 Bonpl 354
#> 31 a 0 10 Bonpl 1001
#> 32 b 2 2 Bonpl 354
#> 33 c 10 10 Bonpl 1001
#> 34 d 5 5 Bonpl 354
#> 35 e 0 9 Bonpl 1001
#> 36 f 8 7 Bonpl 354
#> 37 g 10 9 Bonpl 1001
#> 38 h 4 5 Bonpl 354
#> 39 i 0 1 Bonpl 1001
#> 40 j 4 5 Bonpl 354
#> 41 a 1 8 Bonpl 1001
#> 42 b 9 2 Bonpl 354
#> 43 c 3 7 Bonpl 1001
#> 44 d 9 0 Bonpl 354
#> 45 e 10 7 Bonpl 1001
#> 46 f 2 6 Bonpl 354
#> 48 h 0 2 Bonpl 354
#> 49 i 7 2 Bonpl 1001
#> 50 j 6 3 Bonpl 354
#> 51 a 4 7 Bonpl 1001
#> 52 b 7 9 Bonpl 354
#> 53 c 7 6 Bonpl 1001
#> 54 d 7 5 Bonpl 354
#> 55 e 2 0 Bonpl 1001
#> 56 f 10 1 Bonpl 354
#> 57 g 6 6 Bonpl 1001
#> 58 h 4 6 Bonpl 354
#> 59 i 7 6 Bonpl 1001
#> 60 j 7 9 Bonpl 354
#> 61 a 1 5 Bonpl 1001
#> 62 b 7 1 Bonpl 354
#> 64 d 10 0 Bonpl 354
#> 65 e 6 6 Bonpl 1001
#> 66 f 3 2 Bonpl 354
#> 67 g 7 4 Bonpl 1001
#> 69 i 9 3 Bonpl 1001
#> 70 j 2 4 Bonpl 354
#> 71 a 3 7 Bonpl 1001
#> 72 b 8 9 Bonpl 354
#> 73 c 2 0 Bonpl 1001
#> 74 d 8 3 Bonpl 354
#> 75 e 5 9 Bonpl 1001
#> 77 g 1 0 Bonpl 1001
#> 79 i 3 6 Bonpl 1001
#> 80 j 0 7 Bonpl 354
#> 81 a 7 3 Bonpl 1001
#> 82 b 3 1 Bonpl 354
#> 83 c 9 10 Bonpl 1001
#> 84 d 3 5 Bonpl 354
#> 86 f 9 3 Bonpl 354
#> 87 g 7 8 Bonpl 1001
#> 88 h 1 7 Bonpl 354
#> 89 i 1 3 Bonpl 1001
#> 90 j 9 8 Bonpl 354
#> 91 a 9 10 Bonpl 1001
#> 92 b 4 2 Bonpl 354
#> 93 c 5 4 Bonpl 1001
#> 94 d 0 1 Bonpl 354
#> 95 e 2 3 Bonpl 1001
#> 96 f 0 2 Bonpl 354
#> 97 g 5 8 Bonpl 1001
#> 98 h 4 2 Bonpl 354
#> 99 i 4 10 Bonpl 1001
#> 100 j 6 1 Bonpl 354
#> collection
#> 1 K
#> 2 WAG
#> 3 FR
#> 4 P
#> 5 S
#> 6 K
#> 7 WAG
#> 8 FR
#> 9 P
#> 10 S
#> 11 K
#> 12 WAG
#> 13 FR
#> 14 P
#> 15 S
#> 16 K
#> 17 WAG
#> 18 FR
#> 19 P
#> 20 S
#> 21 K
#> 22 WAG
#> 23 FR
#> 24 P
#> 25 S
#> 26 K
#> 27 WAG
#> 28 FR
#> 29 P
#> 30 S
#> 31 K
#> 32 WAG
#> 33 FR
#> 34 P
#> 35 S
#> 36 K
#> 37 WAG
#> 38 FR
#> 39 P
#> 40 S
#> 41 K
#> 42 WAG
#> 43 FR
#> 44 P
#> 45 S
#> 46 K
#> 48 FR
#> 49 P
#> 50 S
#> 51 K
#> 52 WAG
#> 53 FR
#> 54 P
#> 55 S
#> 56 K
#> 57 WAG
#> 58 FR
#> 59 P
#> 60 S
#> 61 K
#> 62 WAG
#> 64 P
#> 65 S
#> 66 K
#> 67 WAG
#> 69 P
#> 70 S
#> 71 K
#> 72 WAG
#> 73 FR
#> 74 P
#> 75 S
#> 77 WAG
#> 79 P
#> 80 S
#> 81 K
#> 82 WAG
#> 83 FR
#> 84 P
#> 86 K
#> 87 WAG
#> 88 FR
#> 89 P
#> 90 S
#> 91 K
#> 92 WAG
#> 93 FR
#> 94 P
#> 95 S
#> 96 K
#> 97 WAG
#> 98 FR
#> 99 P
#> 100 S