rgnparser: Parse Scientific Names

An R interface to gnparser at https://gitlab.com/gogna/gnparser

Installation

install.packages("rgnparser")
# OR
remotes::install_github("ropensci/rgnparser")

Install gnparser

The command line tool written in Go, gnparser, is required to use this package.

If you want to install gnparser on your own, instructions can be found at the gnparser repo.

There is a helper function in rgnparser for downloading and installing gnparser on major operating systems (macOS, Windows, Linux):

rgnparser::install_gnparser()

It installs the latest gnparser version by default, but you can specify which version to install. You can also install gnparser outside of R yourself (see above).

gn_version

gnparser version

gn_version()
#> $version
#> [1] "v0.14.1"
#> 
#> $build
#> [1] "2020-05-07_12:35:40UTC"

gn_debug

debug output

gn_debug("Poa")
#> <rgnparser debug>
#> 
#> *** Complete Syntax Tree ***
#> SciName "Poa"
#>  Name "Poa"
#>   SingleName "Poa"
#>    NameUninomial "Poa"
#>     Uninomial "Poa"
#>      UninomialWord "Poa"
#>       CapWord "Poa"
#>        CapWord1 "Poa"
#>         NameUpperChar "P"
#>          UpperChar "P"
#>           UpperASCII "P"
#>         NameLowerChar "o"
#>          LowerChar "o"
#>           LowerASCII "o"
#>         NameLowerChar "a"
#>          LowerChar "a"
#>           LowerASCII "a"
#> 
#> *** Output Syntax Tree ***
#> SciName "Poa"
#>  Name "Poa"
#>   SingleName "Poa"
#>    Uninomial "Poa"
#>     UninomialWord "Poa"
#>  Tail ""

gn_parse_tidy

output a data.frame with more minimal information

x <- c("Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo",
  "Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse_tidy(x)
#> # A tibble: 3 x 9
#>   id    verbatim cardinality canonicalfull canonicalsimple canonicalstem
#>   <chr> <chr>          <dbl> <chr>         <chr>           <chr>        
#> 1 e571… Heliant…           3 Helianthus a… Helianthus ann… Helianthus a…
#> 2 fbd1… Quadrel…           2 Quadrella st… Quadrella stey… Quadrella st…
#> 3 e4e1… Parus m…           2 Parus major   Parus major     Parus maior  
#> # … with 3 more variables: authorship <chr>, year <dbl>, quality <dbl>

It’s pretty fast, thanks to gnparser of course

n <- 10000L
# get random scientific names from taxize
spp <- taxize::names_list(rank = "species", size = n)
timed <- system.time(gn_parse_tidy(spp))
timed
#>    user  system elapsed 
#>   0.971   0.150   0.446

Just 0.446 sec for 10000 names

gn_parse

output a list of lists with more detailed information

x <- c("Quadrella steyermarkii (Standl.) Iltis &amp; Cornejo",
  "Parus major Linnaeus, 1788", "Helianthus annuus var. texanus")
gn_parse(x)
#> [[1]]
#> [[1]]$parsed
#> [1] TRUE
#> 
#> [[1]]$quality
#> [1] 1
#> 
#> [[1]]$verbatim
#> [1] "Helianthus annuus var. texanus"
#> 
#> [[1]]$normalized
#> [1] "Helianthus annuus var. texanus"
#> 
#> [[1]]$cardinality
#> [1] 3
...