
Accessing Non-Integrated Datasets
Helen Miller
2022-06-15
Source:vignettes/Non_Integrated_Datasets.Rmd
Non_Integrated_Datasets.RmdMany studies include data from assays which have not been integrated
into the DataSpace. Some of these are available as “Non-Integrated
Datasets,” which can be downloaded from the app as a zip file.
DataSpaceR provides an interface for accessing
non-integrated data from studies where it is available.
Viewing available non-integrated data
Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.
library(DataSpaceR)
con <- connectDS()
vtn505 <- con$getStudy("vtn505")
vtn505
#> <DataSpaceStudy>
#> Study: vtn505
#> URL: https://dataspace.cavd.org/CAVD/vtn505
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Intracellular Cytokine Staining
#> - Neutralizing antibody
#> Available non-integrated datasets:
#> - ADCP
#> - Demographics (Supplemental)
#> - Fc ArrayThe print method on the study object will list available
non-integrated datasets. The availableDatasets property
shows some more info about available datasets, with the
integrated field indicating whether the data is integrated.
The value for n will be NA for non-integrated
data until the dataset has been loaded.
knitr::kable(vtn505$availableDatasets)| name | label | n | integrated |
|---|---|---|---|
| BAMA | Binding Ab multiplex assay | 10260 | TRUE |
| Demographics | Demographics | 2504 | TRUE |
| ICS | Intracellular Cytokine Staining | 22684 | TRUE |
| NAb | Neutralizing antibody | 628 | TRUE |
| ADCP | ADCP | NA | FALSE |
| DEM SUPP | Demographics (Supplemental) | NA | FALSE |
| Fc Array | Fc Array | NA | FALSE |
Loading non-integrated data
Non-Integrated datasets can be loaded with getDataset
like integrated data. This will unzip the non-integrated data to a temp
directory and load it into the environment.
adcp <- vtn505$getDataset("ADCP")
dim(adcp)
#> [1] 378 11
colnames(adcp)
#> [1] "study_prot" "participant_id" "study_day"
#> [4] "lab_code" "specimen_type" "antigen"
#> [7] "percent_cv" "avg_phagocytosis_score" "positivity_threshold"
#> [10] "response" "assay_identifier"You can also view the file format info using
getDatasetDescription. For non-integrated data, this will
open a pdf into your computer’s default pdf viewer.
vtn505$getDatasetDescription("ADCP")Non-integrated data is downloaded to a temp directory by default.
There are a couple of ways to override this if desired. One is to
specify outputDir when calling getDataset or
getDatasetDescription.
If you will be accessing the data at another time and don’t want to
have to re-download it, you can change the default directory for the
whole study object with setDataDir.
vtn505$dataDir
#> [1] "/tmp/RtmpoDO8Tc"
vtn505$setDataDir(".")
vtn505$dataDir
#> [1] "/home/jmtaylor/Projects/DataSpaceR/vignettes"If the dataset already exists in the specified dataDir
or outputDir, it will be not be downloaded. This can be
overridden with reload=TRUE, which forces a
re-download.
Session information
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
#> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
#> [7] LC_PAPER=en_US.utf8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1
#> [5] jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9
#> [9] httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2
#> [13] stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2