Downloading A Long Species List

It is now possible to download up to 100,000 names on GBIF.

One good reason to download data using a long list of names, would be if your group of interest is non-monophyletic. It is required to set up your GBIF credentials to make downloads from GBIF. I suggest that you follow this short tutorial before continuing.

This requires the latest version of rgbif.

library(dplyr)
library(readr)  
library(rgbif) 

# 60,000 tree names file from BGCI
tree_file <- "https://gist.githubusercontent.com/jhnwllr/bd61bcd56d76beeacd03ea9ace0a31fd/raw/089d4c3a88b358719845a1394c9f88f9a2025e20/tree_names.tsv"

long_checklist <- readr::read_tsv(tree_file)

# match the names 
gbif_taxon_keys <- long_checklist %>% 
head(1000) %>% # only first 1000 names 
name_backbone_checklist() %>% # match to backbone 
filter(!matchType == "NONE") %>% # get matched names
pull(usageKey) 

# gbif_taxon_keys should be a long vector like this c(2977832,2977901,2977966,2977835,2977863)

# download the data
occ_download(
pred_in("taxonKey", gbif_taxon_keys), # important to use pred_in
pred("hasCoordinate", TRUE),
pred("hasGeospatialIssue", FALSE),
format = "SIMPLE_CSV"
)

If your request can easily be summarized into higher taxon groups, it still makes more sense to download just that taxon group. For example, if you just want to download all dragonflies, all mammals, or all vascular plants. These requests don’t require anything special.

John Waller

2021-12-20

About

Community

Resources