The goal of arkdb is to provide a convenient way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much too large to read into memory all at once.
You can install arkdb from GitHub with:
# install.packages("devtools") devtools::install_github("cboettig/arkdb")
Consider the nycflights database in SQLite:
tmp <- tempdir() # Or can be your working directory, "." db <- dbplyr::nycflights13_sqlite(tmp) #> Caching nycflights db at /tmp/Rtmp3Ebd1H/nycflights13.sqlite #> Creating table: airlines #> Creating table: airports #> Creating table: flights #> Creating table: planes #> Creating table: weather
Create an archive of the database:
dir <- fs::dir_create(fs::path(tmp, "nycflights")) ark(db, dir, lines = 50000) #> Exporting airlines in 50000 line chunks: #> ...Done! (in 0.008990765 secs) #> Exporting airports in 50000 line chunks: #> ...Done! (in 0.03013897 secs) #> Exporting flights in 50000 line chunks: #> ...Done! (in 8.753164 secs) #> Exporting planes in 50000 line chunks: #> ...Done! (in 0.02437472 secs) #> Exporting weather in 50000 line chunks: #> ...Done! (in 0.6173553 secs)
Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:
files <- fs::dir_ls(dir) new_db <- DBI::dbConnect(RSQLite::SQLite(), fs::path(tmp, "local.sqlite")) unark(files, new_db, lines = 50000) #> Importing /tmp/Rtmp3Ebd1H/nycflights/airlines.tsv.bz2 in 50000 line chunks: #> ...Done! (in 0.01028419 secs) #> Importing /tmp/Rtmp3Ebd1H/nycflights/airports.tsv.bz2 in 50000 line chunks: #> ...Done! (in 0.01863098 secs) #> Importing /tmp/Rtmp3Ebd1H/nycflights/flights.tsv.bz2 in 50000 line chunks: #> ...Done! (in 5.179139 secs) #> Importing /tmp/Rtmp3Ebd1H/nycflights/planes.tsv.bz2 in 50000 line chunks: #> ...Done! (in 0.0276711 secs) #> Importing /tmp/Rtmp3Ebd1H/nycflights/weather.tsv.bz2 in 50000 line chunks: #> ...Done! (in 0.202945 secs)
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.