Introduction: Virtuoso Installation and Configuration
Carl Boettiger
2024-09-28
Source:vignettes/installation.Rmd
installation.Rmd
Virtuoso
is a high-performance “universal server” that can act as both a
relational database (supporting standard SQL queries) and an RDF
triplestore, (supporting SPARQL queries).
Virtuoso supports communication over the standard ODBC interface, and so
R users can potentially connect to Virtuoso merely by installing the
server and using the odbc
R package. However, installation
can present a few gotchas to users unfamiliar with Virtuoso. This
package seeks to streamline the process of installing, managing, and
querying a Virtuoso server. While the package can be also be used merely
to provide a standard DBI
connection to an RDBS, e.g. as a
dplyr
back-end, Virtuoso’s popularity and performance is
particularly notable with respect to RDF data and SPARQL queries, so
most examples focus on those use cases.
Installation
The virtuoso
package provides installation helpers for
both Mac OSX and Windows users through the function
vos_install()
. At the time of writing, the Mac OS X
installer uses Homebrew to install the Virtuoso Open Source server
(similar to the hugo
installer in RStudio’s
blogdown
). On Windows, vos_install()
downloads
and executes the Windows self-extracting archive (.exe
file), which will open a standard installation dialog in interactive
mode, or be run automatically if not run in an interactive session. No
automated installer is provided for Linux systems; Linux users are
encouraged to simply install the appropriate binaries for their
distribution (e.g. apt-get install -y virtuoso-opensource
on Debian/Ubuntu systems).
Configuration
Virtuoso Open Source configuration is controlled by a
virtouso.ini
file, which sets, among other things, which
directories can be accessed for tasks such as bulk import, as well as
performance tweaks such as available memory. Unfortunately, the Virtuoso
server process (virtuoso-t
application) cannot start
without a path to an appropriate config file, and the installers
(e.g. on both Windows and Linux) frequently install an example
virtuoso.ini
to a location which can be hard to find and
for which users do not have permission to edit directly. Moreover, the
file format is not always intuitive to edit. The virtuoso
package thus helps locate this file and provides a helper function,
vos_configure()
, to create and modify this configuration
file. Because reasonable defaults are also provided by this function,
users should usually not need to call this function manually.
vos_configure()
is called automatically from
vos_start()
if the path to a virtuoso.ini
file
is not passed to vos_start()
.
In addition to configuring Virtuoso’s settings through a
virtuoso.ini
file, the other common barrier is setting up
the driver for the ODBC connection. Some installers (Mac, Linux) do not
automatically add the appropriate driver to an active
odbcinst.ini
file with a predictable Driver Server Name,
which we need to know to initiate the ODBC connection. An internal
helper function handles identifying drivers and establishing the
appropriate odcinst.ini
automatically when necessary.
Server management
Lastly, Virtuoso Open Source is often run as a system service,
starting when the operating system starts. This is often undesirable, as
the casual laptop user does not want the service running all the time,
and can be difficult to control for users unfamiliar with managing such
background services on their operating systems. Instead of this
behavior, the virtuoso
package provides an explicit
interface to control the external server. The server only starts when
created by vos_start()
, and ends automatically when the R
process ends, or can be killed, paused, or resumed at any time from R
(e.g. via vos_kill()
). Helper utilities can also query the
status and logs of the server from R. As with most database servers,
data persists to disk, at an appropriate location for the OS determined
by rappdirs
package, and a helper utility,
vos_delete_db()
can remove this persistent storage
location.
Users can also connect directly to any existing (local or remote)
Virtuoso instance by passing the appropriate information to
vos_connect()
, which can be convenient for queries.
Note that he Virtuoso back-end provided by the R package
rdflib
can also connect to any Virtuoso server created by
the virtuoso
R package, though queries loading and queries
through the redland
libraries used by rdflib
will generally be slower than direct calls over ODBC via the
virtuoso
package functions, often dramatically so for large
triplestores.