Accessing Non-Integrated Datasets

Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as “Non-Integrated Datasets,” which can be downloaded from the app as a zip file. DataSpaceR provides an interface for accessing non-integrated data from studies where it is available.

Viewing available non-integrated data

Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.

library(DataSpaceR)
con <- connectDS()
vtn505 <- con$getStudy("vtn505")
vtn505
#> <DataSpaceStudy>
#>   Study: vtn505
#>   URL: https://dataspace.cavd.org/CAVD/vtn505
#>   Available datasets:
#>     - Binding Ab multiplex assay
#>     - Demographics
#>     - Intracellular Cytokine Staining
#>     - Neutralizing antibody
#>   Available non-integrated datasets:
#>     - ADCP
#>     - Demographics (Supplemental)
#>     - Fc Array

The print method on the study object will list available non-integrated datasets. The availableDatasets property shows some more info about available datasets, with the integrated field indicating whether the data is integrated. The value for n will be NA for non-integrated data until the dataset has been loaded.

knitr::kable(vtn505$availableDatasets)

name	label	n	integrated
BAMA	Binding Ab multiplex assay	10260	TRUE
Demographics	Demographics	2504	TRUE
ICS	Intracellular Cytokine Staining	22684	TRUE
NAb	Neutralizing antibody	628	TRUE
ADCP	ADCP	NA	FALSE
DEM SUPP	Demographics (Supplemental)	NA	FALSE
Fc Array	Fc Array	NA	FALSE

Loading non-integrated data

Non-Integrated datasets can be loaded with getDataset like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment.

adcp <- vtn505$getDataset("ADCP")
dim(adcp)
#> [1] 378  11
colnames(adcp)
#>  [1] "study_prot"             "participant_id"         "study_day"             
#>  [4] "lab_code"               "specimen_type"          "antigen"               
#>  [7] "percent_cv"             "avg_phagocytosis_score" "positivity_threshold"  
#> [10] "response"               "assay_identifier"

You can also view the file format info using getDatasetDescription. For non-integrated data, this will open a pdf into your computer’s default pdf viewer.

vtn505$getDatasetDescription("ADCP")

Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify outputDir when calling getDataset or getDatasetDescription.

If you will be accessing the data at another time and don’t want to have to re-download it, you can change the default directory for the whole study object with setDataDir.

vtn505$dataDir
#> [1] "/tmp/RtmpoDO8Tc"
vtn505$setDataDir(".")
vtn505$dataDir
#> [1] "/home/jmtaylor/Projects/DataSpaceR/vignettes"

If the dataset already exists in the specified dataDir or outputDir, it will be not be downloaded. This can be overridden with reload=TRUE, which forces a re-download.

Session information

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
#>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
#>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
#>  [7] LC_PAPER=en_US.utf8       LC_NAME=C                
#>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5  knitr_1.37       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8       digest_0.6.29    assertthat_0.2.1 R6_2.5.1        
#>  [5] jsonlite_1.8.0   magrittr_2.0.2   evaluate_0.15    highr_0.9       
#>  [9] httr_1.4.2       stringi_1.7.6    curl_4.3.2       tools_4.1.2     
#> [13] stringr_1.4.0    Rlabkey_2.8.3    xfun_0.29        compiler_4.1.2

Helen Miller

2022-06-15

Viewing available non-integrated data

Loading non-integrated data

Session information

About

Community

Resources