Skip to contents

Connect to GBIF remote directly. Can be much faster than downloading for one-off use or when using the package from a server in the same region as the data. See Details.


  version = gbif_version(),
  bucket = gbif_default_bucket(),
  to_duckdb = FALSE,
  safe = TRUE,
  unset_aws = getOption("gbif_unset_aws", TRUE),
  endpoint_override = Sys.getenv("AWS_S3_ENDPOINT", ""),



GBIF snapshot date


GBIF bucket name (including region). A default can also be set using the option gbif_default_bucket, see options.


Return a remote duckdb connection or arrow connection?


logical, default TRUE. Should we exclude columns mediatype and issue? varchar datatype on these columns substantially slows downs queries.


Unset AWS credentials? GBIF is provided in a public bucket, so credentials are not needed, but having a AWS_ACCESS_KEY_ID or other AWS environmental variables set can cause the connection to fail. By default, this will unset any set environmental variables for the duration of the R session. This behavior can also be turned off globally by setting the option gbif_unset_aws to FALSE (e.g. to use an alternative network endpoint)


optional parameter to arrow::s3_bucket()


additional parameters passed to the arrow::s3_bucket()


a remote tibble tbl_sql class object (by default), or a arrow Dataset query if to_duckdb is FALSE. In either case, users should call [dplyr::collect] on the final result to force evaluation and bring the resulting data into memory in R.


Query performance is dramatically improved in queries that return only a subset of columns. Consider using explicit select() commands to return only the columns you need.

A summary of this GBIF data, along with column meanings can be found at


if (FALSE) { # interactive()

gbif <- gbif_remote()