nodbi
is an R package that provides a single interface for several NoSQL databases and databases with JSON functionality, with the same function parameters and return values across all database backends. Last updated 2023-05-12.
Currently, nodbi
supports the following database backends:
- MongoDB
- SQLite
- Elasticsearch
- CouchDB
- PostgreSQL (since nodbi v0.6.0)
- DuckDB (since nodbi v0.9.0)
for an R
object of any of these data types:
- data.frame
- list
- JSON string
- a file name or URL with NDJSON records*
and for executing the following operations:
- List
- Exists
- Create
- Get
- Query**
- Update**
- Delete
across all database backends. Limitations: * Only http(s)
and only for docdb_create
. **For Elasticsearch, only simple queries (e.g. equality for a single field) and for CouchDB only root fields. For capabilities to query across any of the database backends, see section walk-through below and see the canonical testing: core-nodbi.R. See benchmark below for speed comparisons of database backends.
Install
CRAN version
install.packages("nodbi")
Development version
remotes::install_github("ropensci/nodbi")
Load package from library
API overview
Parameters for docdb_*()
functions are the same across all database backends. See walk-through below.
Purpose | Function call |
---|---|
Create database connection (see below) | src <- nodbi::src_{mongo, sqlite, couchdb, elastic}(<see below for parameters>) |
Load my_data (a data frame, a list, a JSON string, or a file name or url with NDJSON records) into database, container my_container
|
nodbi::docdb_create(src = src, key = "my_container", value = my_data) |
Get all documents back into a data frame | nodbi::docdb_get(src = src, key = "my_container") |
Get documents selected with query (as MongoDB-compatible JSON) into a data frame | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}') |
Get selected fields (in MongoDB compatible JSON) from documents selected query | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}', fields = '{"name": 1, "_id": 0, "age": 1}') |
Update (patch) selected documents with new data in data frame, list or JSON string from my_data
|
nodbi::docdb_update(src = src, key = "my_container", value = my_data, query = '{"age": 20}') |
Check if container exists | nodbi::docdb_exists(src = src, key = "my_container") |
List all containers in database | nodbi::docdb_list(src = src) |
Delete document(s) in container | nodbi::docdb_delete(src = src, key = "my_container", query = '{"age": 20}') |
Delete container | nodbi::docdb_delete(src = src, key = "my_container") |
Close and remove database connection | rm(src) |
Database connections
Overview on parameters and aspects that are specific to the database backend. These are only needed once: for src_*()
to create a connection object. The connection object is subsequently used with docdb_*
functions.
DuckDB
# install with
install.packages('duckdb', repos = c('https://duckdb.r-universe.dev', 'https://cloud.r-project.org'))
# connect
src <- nodbi::src_duckdb(dbdir = ":memory:", ...)
# remember to disconnect and shutdown DuckDB
DBI::dbDisconnect(src$con, shutdown = TRUE)
See also https://CRAN.R-project.org/package=duckdb and https://duckdb.org/docs/extensions/json.
MongoDB
MongoDB but none of the other databases require to specify the container already in the src_*()
function. “Container” refers to a MongoDB collection. See https://jeroen.github.io/mongolite/.
src <- nodbi::src_mongo(
collection = "my_container", db = "my_database",
url = "mongodb://localhost", ...)
SQLite
The functionality to process JSON is based on the SQLite extension JSON1, available in RSQLite. “Container” refers to an SQLite table.
src <- nodbi::src_sqlite(dbname = ":memory:", ...)
CouchDB
“Container” refers to a CouchDB database. Function docdb_update.src_couchdb
uses jqr to implement patching JSON, in analogy to functions available for the other databases.
src <- nodbi::src_couchdb(
host = "127.0.0.1", port = 5984L, path = NULL,
transport = "http", user = NULL, pwd = NULL, headers = NULL)
Elasticsearch
“Container” refers to an Elasticsearch index. Only lowercase is accepted for container names. Opensearch can equally be used.
src <- nodbi::src_elastic(
host = "127.0.0.1", port = 9200L, path = NULL,
transport_schema = "http", user = NULL, pwd = NULL, force = FALSE, ...)
PostgreSQL
“Container” refers to an PostgreSQL table. With PostgreSQL, the order of variables in data frames returned by docdb_get()
and docdb_query()
can differ from the order in which they were in docdb_create()
.
src <- nodbi::src_postgres(
dbname = "my_database", host = "127.0.0.1", port = 5432L, ...)
Walk-through
This example is meant to show how functional nodbi
is at this time.
# load nodbi
library(nodbi)
# connect database backend; this
# example fully works with any of
src <- src_postgres()
src <- src_mongo()
src <- src_sqlite()
src <- src_duckdb()
#
# parts of the example do not yet work with
# these database backends, see *notes* below
src <- src_elastic()
src <- src_couchdb()
# check if container already exists
docdb_exists(src, key = "my_container")
# load data (here data frame, alternatively list or JSON)
# into the container "my_container" specified in "key" parameter
docdb_create(src, key = "my_container", value = mtcars)
# [1] 32
# load additionally 98 NDJSON records
docdb_create(src, key = "my_container", "http://httpbin.org/stream/98")
# Note: container 'my_container' already exists
# [1] 98
# load additionally contacts JSON data, from package nodbi
docdb_create(src, key = "my_container", contacts)
# Note: container 'my_container' already exists
# [1] 5
# get all documents, irrespective of schema
dplyr::tibble(docdb_get(src, "my_container"))
# A tibble: 135 × 27
# `_id` isActive balance age eyeColor name email about registered tags friends url
# <chr> <lgl> <chr> <int> <chr> <chr> <chr> <chr> <chr> <list> <list> <chr>
# 1 5cd678530d… TRUE $2,412… 20 blue Kris… kris… Sint… 2017-07-1… <chr> <df> NA
# 2 5cd678531b… FALSE $3,400… 20 brown Rae … raec… Nisi… 2018-12-1… <chr> <df> NA
# 3 5cd6785325… TRUE $1,161… 22 brown Pace… pace… Eius… 2018-08-1… <chr> <df> NA
# 4 5cd6785335… FALSE $2,579… 30 brown Will… will… Null… 2018-02-1… <chr> <df> NA
# 5 5cd67853f8… FALSE $3,808… 23 green Lacy… lacy… Sunt… 2014-08-0… <chr> <df> NA
# 6 6529d28a-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
# 7 6529d2a8-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
# 8 6529d2b2-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
# 9 6529d2c6-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
# 10 6529d2d0-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
# # … with 125 more rows, and 15 more variables: args <named list>, headers <df[,4]>,
# # origin <chr>, id <int>, mpg <dbl>, cyl <dbl>, disp <dbl>, hp <dbl>, drat <dbl>, wt <dbl>,
# # qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>, carb <dbl>
# query some documents
# *note*: such complex queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", query = '{"mpg": {"$gte": 30}}')
# _id mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Fiat 128 32 4 79 66 4.1 2.2 19 1 1 4 1
# 2 Honda Civic 30 4 76 52 4.9 1.6 19 1 1 4 2
# 3 Lotus Europa 30 4 95 113 3.8 1.5 17 1 1 5 2
# 4 Toyota Corolla 34 4 71 65 4.2 1.8 20 1 1 4 1
# query some fields from some documents; 'query' is a mandatory
# parameter and is used here in its position in the signature
# *note*: such complex queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", '{"mpg": {"$gte": 30}}', fields = '{"wt": 1, "mpg": 1}')
# wt mpg
# 1 2.2 32
# 2 1.6 30
# 3 1.5 30
# 4 1.8 34
# query some subitem fields from some documents
# *note*: such complex queries do not yet work with src_couchdb() or src_elasticsearch()
str(docdb_query(
src, key = "my_container",
query = '{"$or": [{"age": {"$gt": 21}},
{"friends.name": {"$regex": "^B[a-z]{3,9}.*"}}]}',
fields = '{"age": 1, "friends.name": 1}'))
# 'data.frame': 3 obs. of 2 variables:
# $ age : int 23 30 22
# $ friends:'data.frame': 3 obs. of 1 variable:
# ..$ name:List of 3
# .. ..$ : chr "Wooten Goodwin" "Brandie Woodward" "Angelique Britt"
# .. ..$ : chr "Coleen Dunn" "Doris Phillips" "Concetta Turner"
# .. ..$ : chr "Baird Keller" "Francesca Reese" "Dona Bartlett"
# such queries can also be used for updating (patching) selected documents
# with a new 'value'(s) from a JSON string, a data frame or a list
docdb_update(src, "my_container", value = '{"vs": 9, "xy": [1, 2]}', query = '{"carb": 3}')
# [1] 3
# *note*: such queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", '{"carb": {"$in": [1,3]}}', fields = '{"vs": 1}')[[1]]
# [1] 1 1 1 1 9 9 9 1 1 1
docdb_get(src, "my_container")[ , c(1, 27, 28)]
# _id carb xy
# ...
# 126 Merc 280C 4 NULL
# 127 Merc 450SE 3 1, 2
# 128 Merc 450SL 3 1, 2
# 129 Merc 450SLC 3 1, 2
# 130 Pontiac Firebird 2 NULL
# ...
# use with dplyr
# *note* that dplyr includes a (deprecated) function src_sqlite
# which would mask nodbi's src_sqlite, so it is excluded here
library("dplyr", exclude = c("src_sqlite", "src_postgres"))
#
docdb_get(src, "my_container") %>%
group_by(gear) %>%
summarise(mean_mpg = mean(mpg))
# # A tibble: 4 × 2
# gear mean_mpg
# <dbl> <dbl>
# 1 3 16.1
# 2 4 24.5
# 3 5 21.4
# 4 NA NA
# delete documents; query is optional parameter and has to be
# specified for deleting documents instead of deleting the container
# *note*: such complex queries do not yet work with src_couchdb() or src_elasticsearch()
# *note*: with src_duckdb(), cannot use $or, can only use $and so far
# in query where one element has a dot path but the other hasn't
docdb_delete(src, "my_container", query = '{"$or": {"gear": 5, "age": {"$gte": 22}}}')
# TRUE
nrow(docdb_get(src, "my_container"))
# [1] 127
# delete container from database
docdb_delete(src, "my_container")
# [1] TRUE
#
# shutdown
DBI::dbDisconnect(src$con, shutdown = TRUE); rm(src)
Benchmark
library("nodbi")
srcMongo <- src_mongo()
srcSqlite <- src_sqlite()
srcPostgres <- src_postgres()
srcDuckdb <- src_duckdb()
srcElastic <- src_elastic()
srcCouchdb <- src_couchdb(
user = Sys.getenv("COUCHDB_TEST_USER"),
pwd = Sys.getenv("COUCHDB_TEST_PWD"))
key <- "test"
query <- '{"clarity": "SI1"}'
fields <- '{"cut": 1, "_id": 1, "clarity": "1"}'
value <- '{"clarity": "XYZ", "new": ["ABC", "DEF"]}'
data <- as.data.frame(diamonds)[1:2000, ]
ndjs <- tempfile()
jsonlite::stream_out(iris, con = file(ndjs), verbose = FALSE)
testFunction <- function(src, key, value, query, fields) {
on.exit(docdb_delete(src, key))
suppressMessages(docdb_create(src, key, data))
suppressMessages(docdb_create(src, key, ndjs))
# Elasticsearch needs a delay to process the data
if (inherits(src, "src_elastic")) Sys.sleep(1)
head(docdb_get(src, key))
docdb_query(src, key, query = query, fields = fields)
docdb_update(src, key, value = value, query = query)
}
# 2023-05-18 with 2015 mobile hardware, no database optimisations
rbenchmark::benchmark(
MongoDB = testFunction(src = srcMongo, key, value, query, fields),
SQLite = testFunction(src = srcSqlite, key, value, query, fields),
Elastic = testFunction(src = srcElastic, key, value, query, fields),
CouchDB = testFunction(src = srcCouchdb, key, value, query, fields),
PostgreSQL = testFunction(src = srcPostgres, key, value, query, fields),
DuckDB = testFunction(src = srcDuckdb, key, value, query, fields),
replications = 10L,
columns = c('test', 'replications', 'elapsed')
)
# test replications elapsed
# 4 CouchDB 10 265.0
# 3 Elastic 10 55.6 # 10s to be subtracted
# 5 PostgreSQL 10 4.3
# 6 DuckDB 10 3.9
# 2 SQLite 10 3.7
# 1 MongoDB 10 3.5
Testing
# 2023-05-18 timing not much relevant
testthat::test_local()
# ✔ | F W S OK | Context
# ✔ | 1 95 | couchdb [104.3s]
# ──────────────────────────────────────────────────────────────────────────────────────────────────────
# Skip (core-nodbi.R:246:3): docdb_update
# Reason: bulk updates not yet implemented
# ──────────────────────────────────────────────────────────────────────────────────────────────────────
# ✔ | 122 | duckdb [11.4s]
# ✔ | 2 66 | elastic [96.6s]
# ──────────────────────────────────────────────────────────────────────────────────────────────────────
# Skip (core-nodbi.R:178:3): docdb_query
# Reason: queries need to be translated into elastic syntax
#
# Skip (core-nodbi.R:246:3): docdb_update
# Reason: bulk updates not yet implemented
# ──────────────────────────────────────────────────────────────────────────────────────────────────────
# ✔ | 121 | mongodb [9.2s]
# ✔ | 125 | postgres [50.2s]
# ✔ | 124 | sqlite [43.0s]
#
# ══ Results ═══════════════════════════════════════════════════════════════════════════════════════════
# Duration: 315.6 s
#
# ── Skipped tests ────────────────────────────────────────────────────────────────────────────────────
# • bulk updates not yet implemented (2)
# • queries need to be translated into elastic syntax (1)
#
# [ FAIL 0 | WARN 0 | SKIP 3 | PASS 653 ]
Notes
- Please report any issues or bugs.
- License: MIT
- Get citation information for
nodbi
in R doingcitation(package = 'nodbi')
- Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
- Support for redis has been removed since version 0.5, because no way was found to query and update specific documents in a container.