Skip to contents

Taxonomic databases interface

Supported data sources and database structure

All are using SQLite as the database

  • NCBI: text files are provided by NCBI, which we stitch into a sqlite db

  • ITIS: they provide a sqlite dump, which we use here

  • The PlantList: created from stitching together csv files. this source is no longer updated as far as we can tell. they say they've moved focus to the World Flora Online

  • Catalogue of Life: created from Darwin Core Archive dump. Using the latest monthly edition via http://www.catalogueoflife.org/DCA_Export/archive.php

  • GBIF: created from Darwin Core Archive dump. right now we only have the taxonomy table (called gbif), but will add the other tables in the darwin core archive later

  • Wikidata: aggregated taxonomy of Open Tree of Life, GLoBI and Wikidata. On Zenodo, created by Joritt Poelen of GLOBI.

  • World Flora Online: http://www.worldfloraonline.org/

Update schedule for databases

  • NCBI: since db_download_ncbi creates the database when the function is called, it's updated whenever you run the function

  • ITIS: since ITIS provides the sqlite database as a download, you can delete the old file and run db_download_itis to get a new dump; they I think update the dumps every month or so

  • The PlantList: no longer updated, so you shouldn't need to download this after the first download

  • Catalogue of Life: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest COL data into a SQLite database thats hosted on Amazon S3

  • GBIF: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest COL data into a SQLite database thats hosted on Amazon S3

  • Wikidata: last updated April 6, 2018. Scripts are available to update the data if you prefer to do it yourself.

  • World Flora Online: since db_download_wfo creates the database when the function is called, it's updated whenever you run the function

  • NCBI: ftp://ftp.ncbi.nih.gov/pub/taxonomy/

  • ITIS: https://www.itis.gov/downloads/index.html

  • The PlantList - http://www.theplantlist.org/

  • Catalogue of Life: via http://www.catalogueoflife.org/content/annual-checklist-archive

  • GBIF: http://rs.gbif.org/datasets/backbone/

  • Wikidata: https://zenodo.org/record/1213477

  • World Flora Online: http://www.worldfloraonline.org/

Examples

if (FALSE) { # \dontrun{
library(dplyr)

# data source: NCBI
db_download_ncbi()
src <- src_ncbi()
df <- tbl(src, "names")
filter(df, name_class == "scientific name")

# data source: ITIS
## download ITIS database
db_download_itis()
## connect to the ITIS database
src <- src_itis()
## use SQL syntax
sql_collect(src, "select * from hierarchy limit 5")
### or pipe the src to sql_collect
src %>% sql_collect("select * from hierarchy limit 5")
## use dplyr verbs
src %>%
  tbl("hierarchy") %>%
  filter(ChildrenCount > 1000)
## or create tbl object for repeated use
hiers <- src %>% tbl("hierarchy")
hiers %>% select(TSN, level)

# data source: The PlantList
## download tpl datababase
db_download_tpl()
## connecto the tpl database
src <- src_tpl()
## do queries
tpl <- tbl(src, "tpl")
filter(tpl, Family == "Pinaceae")

# data source: Catalogue of Life
## download col datababase
db_download_col()
## connec to the col database
src <- src_col()
## do queries
names <- tbl(src, "taxa")
select(names, taxonID, scientificName)

# data source: GBIF
## download gbif datababase
db_download_gbif()
## connecto the gbif database
src <- src_gbif()
## do queries
df <- tbl(src, "gbif")
select(df, taxonID, scientificName)

# data source: Wikidata
db_download_wikidata()
src <- src_wikidata()
df <- tbl(src, "wikidata")
filter(df, rank_id == "Q7432")

# data source: World Flora Online
db_download_wfo()
src <- src_wfo()
df <- tbl(src, "wfo")
filter(df, taxonID == "wfo-0000000010")
} # }