Skip to contents

While triples data can be added one by one over SPARQL queries, Virtuoso bulk import is by far the fastest way to import large triplestores in the database.

Usage

vos_import(
  con,
  files = NULL,
  wd = ".",
  glob = "*",
  graph = "rdflib",
  n_cores = 1L
)

Arguments

con

a ODBC connection to Virtuoso, from vos_connect()

files

paths to files to be imported

wd

Alternatively, can specify directory and globbing pattern to import. Note that in this case, wd must be in (or a subdir of) the AllowedDirs list of virtuoso.ini file created by vos_configure(). By default, this includes the working directory where you called vos_start() or vos_configure().

glob

A wildcard aka globbing pattern (e.g. `"*.nq"“).

graph

Name (technically URI) for a graph in the database. Can leave as default. If a graph is already specified by the import file (e.g. in nquads), that will be used instead.

n_cores

specify the number of available cores for parallel loading. Particularly useful when importing large numbers of bulk files.

Value

(Invisibly) returns the status table of the bulk loader, indicating file loading time or errors.

Details

the bulk importer imports all files matching a pattern in a given directory. If given a list of files, these are temporarily symlinked (or copied on Windows machines) to the Virtuoso app cache dir in a subdirectory, and the entire subdirectory is loaded (filtered by the globbing pattern). If files are not specified, load is called directly on the specified directory and pattern. This is particularly useful for loading large numbers of files.

Note that Virtuoso recommends breaking large files into multiple smaller ones, which can improve loading time (particularly if using multiple cores.)

Virtuoso Bulk Importer recognizes the following file formats:

  • .grdf

  • .nq

  • .owl

  • .nt

  • .rdf

  • .trig

  • .ttl

  • .xml

Any of these can optionally be gzipped (with a .gz extension).

Examples


vos_status()
#> virtuoso isn't running.

# \donttest{
if(has_virtuoso()){
vos_start()
con <- vos_connect()

example <- system.file("extdata", "person.nq", package = "virtuoso")
vos_import(con, example)
}
#> Warning: Exiting, virtuoso template not found... is virtuoso installed?
#> PROCESS 'virtuoso-t', running, pid 2758.
#> Server is now starting up, this may take a few seconds...
#> virtuoso isn't running.
#> Warning: could not automatically locate virtodbc.so
#> Error in base::tryCatch(base::withCallingHandlers({    NULL    base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/tmp/RtmpEoVaWd/callr-fun-844871e13"),         base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv,         quote = TRUE), file = "/tmp/RtmpEoVaWd/callr-res-829e574f5",         compress = FALSE)    base::flush(base::stdout())    base::flush(base::stderr())    NULL    base::invisible()}, error = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/tmp/RtmpEoVaWd/callr-res-829e574f5",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/tmp/RtmpEoVaWd/callr-res-829e574f5",             ".error"))    }}, callr_message = function(e) {    base::try(base::signalCondition(e))}), error = function(e) {    NULL    if (FALSE) {        base::try(base::stop(e))    }    else {        base::invisible()    }}, interrupt = function(e) {    NULL    if (FALSE) {        e    }    else {        base::invisible()    }}): ! ODBC failed with error 00000 from [unixODBC][Driver Manager].
#>  Can't open lib 'virtodbc.so' : file not found
#>  From nanodbc/nanodbc.cpp:1150.
# }