Drugs Databases Parser • dbparser

Overview

dbparser is an rOpenSci peer-reviewed R package that parses and integrates major pharmacological databases into standardized, analysis-ready R objects called dvobjects (drugverse objects).

Pharmacological databases use incompatible formats and structures, forcing researchers to write custom parsing scripts — a process that consumes 60–80% of analysis time. dbparser eliminates this bottleneck with unified parsing functions, chainable merge operations, and a consistent output structure that enables reproducible, cross-database analyses.

With recent updates, dbparser has evolved into an integration engine, allowing you to merge mechanistic data (DrugBank) with real-world phenotypic data (OnSIDES) and drug-drug interaction risks (TWOSIDES).

Installation

# From CRAN (stable)
install.packages("dbparser")

# From GitHub (development)
# install.packages("pak")
pak::pak("ropensci/dbparser")

Supported Databases

DrugBank (The Mechanistic Hub)

DrugBank is a comprehensive database containing detailed drug, pharmacological, and target information. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data (chemical, pharmacological, pharmaceutical) with comprehensive drug target information (sequence, structure, pathway). More information can be found here.

Parser: parseDrugBank()
Input: Full XML database (download — requires free account, may take a couple of days)
Tested versions: 5.1.0 through 5.1.12
Alternative: Use dbdataset for pre-parsed data without downloading the XML (GitHub only, exceeds CRAN size limit)
Tutorial: DrugBank Parsing Vignette

If you find errors with any DrugBank version, please submit an issue here.

OnSIDES (Adverse Drug Events)

OnSIDES provides adverse drug events extracted from thousands of FDA drug labels using machine learning.

Parser: parseOnSIDES()
Input: Directory containing OnSIDES CSV files

TWOSIDES (Drug-Drug Interactions)

TWOSIDES provides data on adverse events arising when two drugs are taken together.

Parser: parseTWOSIDES()
Input: TWOSIDES.csv.gz file

Quick Start

Parse a Single Database

library(dbparser)

# Parse DrugBank
drugbank_db <- parseDrugBank("data/drugbank.xml")

# Parse OnSIDES
onsides_db <- parseOnSIDES("data/onsides/")

# Parse TWOSIDES
twosides_db <- parseTWOSIDES("data/TWOSIDES.csv.gz")

Integration Pipeline

The power of dbparser lies in its ability to chain parsers and mergers together. Here is how you can build a complete pharmacovigilance dataset:

library(dbparser)
library(dplyr)

# 1. Parse the raw databases
drugbank_db <- parseDrugBank("data/drugbank.xml")
onsides_db  <- parseOnSIDES("data/onsides/")
twosides_db <- parseTWOSIDES("data/TWOSIDES.csv.gz")

# 2. Build the Integrated Knowledge Graph
#    DrugBank serves as the hub. Chain the merges.
final_db <- drugbank_db %>%
  merge_drugbank_onsides(onsides_db) %>%
  merge_drugbank_twosides(twosides_db)

# 3. Analyze Results
head(final_db$integrated_data$drug_drug_interactions)

For a detailed case study, see the Integrated Pharmacovigilance Vignette.

The dvobject Structure

dvobject is a unified, compressed format for pharmacological data — an R list object that preserves complex relational hierarchies while enabling consistent access patterns.

For a single database (e.g., DrugBank):

drugs: list of data frames containing drug information (synonyms, classifications, etc.) — the only mandatory component
salts: data frame of drug salt information
products: data frame of commercially available drug products worldwide
references: data frame of articles, links, and textbooks about drugs or CETT data
cett: list of data frames containing targets, enzymes, carriers, and transporters information

For a merged database (Integrated Pharmacovigilance):

When databases are merged using merge_drugbank_onsides or merge_drugbank_twosides, the dvobject becomes a nested structure:

drugbank: The mechanistic hub
onsides: Side-effect data (from FDA labels)
twosides: Drug-drug interaction data
integrated_data: Enriched tables bridging databases (e.g., linking DrugBank IDs to OnSIDES adverse events)
metadata: Detailed provenance for all contained datasets

Research Impact

dbparser has enabled 10+ peer-reviewed publications in leading journals:

Domain	Journal	Reference
Alzheimer’s Drug Repurposing	Nature Scientific Reports	Parolo et al. (2023)
COVID-19 Therapeutics	Pharmaceutics	Pérez-Moraga et al. (2021)
Pan-Cancer Biomarkers	Briefings in Bioinformatics	Mercatelli et al. (2022)
Pathway Modeling	Computer Methods and Programs in Biomedicine	Hammoud et al. (2025)
Clinical Trial Analysis	Frontiers in Pharmacology	Namiot et al. (2023)

📊 50,000+ CRAN downloads | Featured in the CRAN Epidemiology Task View

For the full list, see our JOSS paper.

Ecosystem

Package	Description	Links
dbdataset	Pre-parsed DrugBank datasets ready for analysis	GitHub
covid19dbcand	COVID-19 drug candidate datasets	GitHub
periscope2	Shiny framework for interactive dashboards	CRAN

Citation

If you use dbparser in published research, please cite our JOSS paper:

Ali et al., (2026). dbparser: An R Package for Parsing and Integrating
Pharmacological Databases. Journal of Open Source Software, 11(118),
9950, https://doi.org/10.21105/joss.09950

citation("dbparser")

If you find dbparser useful, consider ⭐ starring the GitHub repository and sharing it with colleagues.

Enterprise Support

For custom database integrations, enterprise support, training, or deployment assistance — dbparser is maintained by Interstellar Consultation Services.

📧 info@interstellar-egypt.com

Contributing

We welcome contributions! Please review our Contributing Guide.

Please note that the dbparser project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

dbparser