rgnparser: Parse Scientific Names
An R interface to gnparser
at https://github.com/gnames/gnparser
Installation
install.packages("rgnparser")
# OR
remotes::install_github("ropensci/rgnparser")
Install gnparser
The command line tool written in Go, gnparser, is required to use this package.
If you want to install gnparser on your own, instructions can be found at the gnparser repo.
There is a helper function in rgnparser for downloading and installing gnparser on major operating systems (macOS, Windows, Linux):
rgnparser::install_gnparser()
It installs the latest gnparser version by default, but you can specify which version to install. You can also install gnparser outside of R yourself (see above).
gn_version
gnparser version
gn_version()
#> $version
#> [1] "v1.0.0"
#>
#> $build
#> [1] "2021-01-19_14:45:28UTC"
gn_parse_tidy
output a data.frame with more minimal information
x <- c("Quadrella steyermarkii (Standl.) Iltis & Cornejo",
"Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse_tidy(x)
#> # A tibble: 3 x 9
#> id verbatim cardinality canonicalstem canonicalsimple canonicalfull
#> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 3e33… Quadrel… 2 Quadrella st… Quadrella stey… Quadrella st…
#> 2 e4e1… Parus m… 2 Parus maior Parus major Parus major
#> 3 e571… Heliant… 3 Helianthus a… Helianthus ann… Helianthus a…
#> # … with 3 more variables: authorship <chr>, year <dbl>, quality <dbl>
It’s pretty fast, thanks to gnparser of course
n <- 10000L
# get random scientific names from taxize
spp <- taxize::names_list(rank = "species", size = n)
timed <- system.time(gn_parse_tidy(spp))
timed
#> user system elapsed
#> 1.225 0.113 0.555
Just 0.555 sec for 10000 names
gn_parse
output a list of lists with more detailed information
x <- c("Quadrella steyermarkii (Standl.) Iltis & Cornejo",
"Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse(x)
#> [[1]]
#> [[1]]$parsed
#> [1] TRUE
#>
#> [[1]]$quality
#> [1] 3
#>
#> [[1]]$qualityWarnings
#> quality warning
#> 1 3 HTML tags or entities in the name
#>
#> [[1]]$verbatim
#> [1] "Quadrella steyermarkii (Standl.) Iltis & Cornejo"
#>
#> [[1]]$normalized
...