rgnparser: Parse Scientific Names
An R interface to gnparser
at https://github.com/gnames/gnparser
install.packages("rgnparser")
# OR
remotes::install_github("ropensci/rgnparser")
The command line tool written in Go, gnparser, is required to use this package.
If you want to install gnparser on your own, instructions can be found at the gnparser repo.
There is a helper function in rgnparser for downloading and installing gnparser on major operating systems (macOS, Windows, Linux):
rgnparser::install_gnparser()
It installs the latest gnparser version by default, but you can specify which version to install. You can also install gnparser outside of R yourself (see above).
gnparser version
gn_version()
#> $version
#> [1] "v1.0.0"
#>
#> $build
#> [1] "2021-01-19_14:45:28UTC"
output a data.frame with more minimal information
x <- c("Quadrella steyermarkii (Standl.) Iltis & Cornejo",
"Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse_tidy(x)
#> # A tibble: 3 x 9
#> id verbatim cardinality canonicalstem canonicalsimple canonicalfull
#> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 3e33… Quadrel… 2 Quadrella st… Quadrella stey… Quadrella st…
#> 2 e4e1… Parus m… 2 Parus maior Parus major Parus major
#> 3 e571… Heliant… 3 Helianthus a… Helianthus ann… Helianthus a…
#> # … with 3 more variables: authorship <chr>, year <dbl>, quality <dbl>
It’s pretty fast, thanks to gnparser of course
n <- 10000L
# get random scientific names from taxize
spp <- taxize::names_list(rank = "species", size = n)
timed <- system.time(gn_parse_tidy(spp))
timed
#> user system elapsed
#> 1.225 0.113 0.555
Just 0.555 sec for 10000 names
output a list of lists with more detailed information
x <- c("Quadrella steyermarkii (Standl.) Iltis & Cornejo",
"Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse(x)
#> [[1]]
#> [[1]]$parsed
#> [1] TRUE
#>
#> [[1]]$quality
#> [1] 3
#>
#> [[1]]$qualityWarnings
#> quality warning
#> 1 3 HTML tags or entities in the name
#>
#> [[1]]$verbatim
#> [1] "Quadrella steyermarkii (Standl.) Iltis & Cornejo"
#>
#> [[1]]$normalized
...