rgnparser: Parse Scientific Names

An R interface to gnparser at https://github.com/gnames/gnparser

Installation

install.packages("rgnparser")
# OR
remotes::install_github("ropensci/rgnparser")

Install gnparser

The command line tool written in Go, gnparser, is required to use this package.

If you want to install gnparser on your own, instructions can be found at the gnparser repo.

There is a helper function in rgnparser for downloading and installing gnparser on major operating systems (macOS, Windows, Linux):

rgnparser::install_gnparser()

It installs the latest gnparser version by default, but you can specify which version to install. You can also install gnparser outside of R yourself (see above).

gn_version

gnparser version

gn_version()
#> $version
#> [1] "v1.0.0"
#> 
#> $build
#> [1] "2021-01-19_14:45:28UTC"

gn_parse_tidy

output a data.frame with more minimal information

x <- c("Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo",
  "Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse_tidy(x)
#> # A tibble: 3 x 9
#>   id    verbatim cardinality canonicalstem canonicalsimple canonicalfull
#>   <chr> <chr>          <dbl> <chr>         <chr>           <chr>        
#> 1 3e33… Quadrel…           2 Quadrella st… Quadrella stey… Quadrella st…
#> 2 e4e1… Parus m…           2 Parus maior   Parus major     Parus major  
#> 3 e571… Heliant…           3 Helianthus a… Helianthus ann… Helianthus a…
#> # … with 3 more variables: authorship <chr>, year <dbl>, quality <dbl>

It’s pretty fast, thanks to gnparser of course

n <- 10000L
# get random scientific names from taxize
spp <- taxize::names_list(rank = "species", size = n)
timed <- system.time(gn_parse_tidy(spp))
timed
#>    user  system elapsed 
#>   1.225   0.113   0.555

Just 0.555 sec for 10000 names

gn_parse

output a list of lists with more detailed information

x <- c("Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo",
  "Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse(x)
#> [[1]]
#> [[1]]$parsed
#> [1] TRUE
#> 
#> [[1]]$quality
#> [1] 3
#> 
#> [[1]]$qualityWarnings
#>   quality                           warning
#> 1       3 HTML tags or entities in the name
#> 
#> [[1]]$verbatim
#> [1] "Quadrella steyermarkii (Standl.) Iltis & Cornejo"
#> 
#> [[1]]$normalized
...