spplit - connect species occurrence data to literature

Possible workflow:

  • get species occurrences
  • get species list
  • get BHL metadata
  • get BHL ocr page content -> to corpus
  • (optionally: vizualize ocr text with matches)
  • save corpus


install dev versions of rgbif and spocc first, then install spplit

devtools::install_github(c("ropensci/rgbif", "ropensci/spocc"))

Example - connect iDigBio species occurrence data to BHL

For access to Biodiveristy Heritage Library data, you’ll need an API key from them. To get one fill out the brief form at http://www.biodiversitylibrary.org/getapikey.aspx - they’ll ask for your name and email address.

After getting text, could do one of a number of things:

a) Save text to disk (or any database, etc.)

b) Mine the text


there’s a tool for visualizing results from OCR. It’s still a work in progress.