Introduction
The main purpose of the dbparser
package is to parse the DrugBank database which is downloadable in XML format from this link. The parsed data can then be explored and analyzed as desired by the user. In this tutorial, we will see how to use dbparser
along with dplyr
and ggplot2
along with other libraries to do simple drug analysis
Loading and Parsing the Data
Before starting the code we are assuming the following:
- user already downloaded DrugBank xml database file based on the Read Me instructions or the above note,
- user saved the downloaded database in working directory as
C:\
. - user named the downloaded xml file drugbank.xml.
Now we can loads the drugs
info, drug groups
info and drug targets
actions info.
## load dbparser package
suppressPackageStartupMessages({
library(tidyr)
library(dplyr)
library(canvasXpress)
library(tibble)
library(dbparser)
})
## load drugs data
drugs <- readRDS(system.file("drugs.RDS", package = "dbparser"))
## load drug groups data
drug_groups <- readRDS(system.file("drug_groups.RDS", package = "dbparser"))
## load drug targets actions data
drug_targets_actions <- readRDS(system.file("targets_actions.RDS", package = "dbparser"))
Exploring the data
Following is an example involving a quick look at a few aspects of the parsed data. First we look at the proportions of biotech
and small-molecule
drugs in the data.
## view proportions of the different drug types (biotech vs. small molecule)
type_stat <- drugs %>%
select(type) %>%
group_by(type) %>%
summarise(count = n()) %>%
column_to_rownames("type")
canvasXpress(
data = type_stat,
graphOrientation = "vertical",
graphType = "Bar",
showSampleNames = FALSE,
title ="Drugs Type Distribution",
xAxisTitle = "Count"
)
Below, we view the different drug_groups
in the data and how prevalent they are.
## view proportions of the different drug types for each drug group
type_stat <- drugs %>%
full_join(drug_groups, by = c("drugbank_id")) %>%
select(type, group) %>%
group_by(type, group) %>%
summarise(count = n()) %>%
pivot_wider(names_from = group, values_from = count) %>%
column_to_rownames("type")
#> `summarise()` has grouped output by 'type'. You can override using the
#> `.groups` argument.
canvasXpress(
data = type_stat,
graphType = "Stacked",
legendColumns = 2,
legendPosition = "bottom",
title ="Drug Type Distribution per Drug Group",
xAxisTitle = "Quantity",
xAxis2Show = TRUE,
xAxisShow = FALSE,
smpTitle = "Drug Group")
Finally, we look at the drug_targets_actions
to observe their proportions as well.
## get counts of the different target actions in the data
targetActionCounts <-
drug_targets_actions %>%
group_by(action) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
top_n(10) %>%
column_to_rownames("action")
#> Selecting by count
## get bar chart of the 10 most occurring target actions in the data
canvasXpress(
data = targetActionCounts,
graphType = "Bar",
legendColumns = 2,
legendPosition = "bottom",
title = "Target Actions Distribution",
showSampleNames = FALSE,
xAxis2Show = TRUE,
xAxisShow = FALSE)