This is the most useful function for user to download common
TCGA datasets, it is similar to getFirehoseData
function in RTCGAToolbox
package.
Usage
getTCGAdata(
project = NULL,
clinical = TRUE,
download = FALSE,
forceDownload = FALSE,
destdir = tempdir(),
mRNASeq = FALSE,
mRNAArray = FALSE,
mRNASeqType = "normalized",
miRNASeq = FALSE,
exonRNASeq = FALSE,
RPPAArray = FALSE,
ReplicateBaseNormalization = FALSE,
Methylation = FALSE,
MethylationType = c("27K", "450K"),
GeneMutation = FALSE,
SomaticMutation = FALSE,
GisticCopyNumber = FALSE,
Gistic2Threshold = TRUE,
CopyNumberSegment = FALSE,
RemoveGermlineCNV = TRUE,
...
)
Arguments
- project
default is
NULL
. Should be one or more of TCGA project id (character vector) provided by Xena. See all available project id, please useavailTCGA("ProjectID")
.- clinical
logical. if
TRUE
, download clinical information. Default isTRUE
.- download
logical. if
TRUE
, download data, otherwise return a result list include data information. Default isFALSE
. You can set this toFALSE
if you want to check what you will download or use other function provided byUCSCXenaTools
to filter result datasets you want to download.- forceDownload
logical. if
TRUE
, force to download files no matter if exist. Default isFALSE
.- destdir
specify a location to store download data. Default is system temp directory.
- mRNASeq
logical. if
TRUE
, download mRNASeq data. Default isFALSE
.- mRNAArray
logical. if
TRUE
, download mRNA microarray data. Default isFALSE
.- mRNASeqType
character vector. Can be one, two or three in
c("normalized", "pancan normalized", "percentile")
.- miRNASeq
logical. if
TRUE
, download miRNASeq data. Default isFALSE
.- exonRNASeq
logical. if
TRUE
, download exon RNASeq data. Default isFALSE
.- RPPAArray
logical. if
TRUE
, download RPPA data. Default isFALSE
.- ReplicateBaseNormalization
logical. if
TRUE
, download RPPA data by Replicate Base Normalization (RBN). Default isFALSE
.- Methylation
logical. if
TRUE
, download DNA Methylation data. Default isFALSE
.- MethylationType
character vector. Can be one or two in
c("27K", "450K")
.- GeneMutation
logical. if
TRUE
, download gene mutation data. Default isFALSE
.- SomaticMutation
logical. if
TRUE
, download somatic mutation data. Default isFALSE
.- GisticCopyNumber
logical. if
TRUE
, download Gistic2 Copy Number data. Default isFALSE
.- Gistic2Threshold
logical. if
TRUE
, download Threshold Gistic2 data. Default isTRUE
.- CopyNumberSegment
logical. if
TRUE
, download Copy Number Segment data. Default isFALSE
.- RemoveGermlineCNV
logical. if
TRUE
, download Copy Number Segment data which has removed germline copy number variation. Default isTRUE
.- ...
other argument to
download.file
function
Value
if download=TRUE
, return data.frame
from XenaDownload
,
otherwise return a list including XenaHub
object and datasets information
Details
TCGA Common Data Sets are frequently used for biological analysis.
To make easier to achieve these data, this function provide really easy
options to choose datasets and behavior. All availble information about
datasets of TCGA can access vis availTCGA()
and check with showTCGA()
.
Author
Shixiang Wang w_shixiang@163.com
Examples
###### get data, but not download
# 1 choose project and data types you wanna download
getTCGAdata(project = "LUAD", mRNASeq = TRUE, mRNAArray = TRUE,
mRNASeqType = "normalized", miRNASeq = TRUE, exonRNASeq = TRUE,
RPPAArray = TRUE, Methylation = TRUE, MethylationType = "450K",
GeneMutation = TRUE, SomaticMutation = TRUE)
#> $Xena
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (1 total):
#> TCGA Lung Adenocarcinoma (LUAD)
#> datasets() (7 total):
#> TCGA.LUAD.sampleMap/HumanMethylation450
#> TCGA.LUAD.sampleMap/HiSeqV2
#> TCGA.LUAD.sampleMap/miRNA_HiSeq_gene
#> ...
#> TCGA.LUAD.sampleMap/AgilentG4502A_07_3
#> TCGA.LUAD.sampleMap/HiSeqV2_exon
#>
#> $DataInfo
#> # A tibble: 7 × 20
#> XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype Label
#> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 492 DNA methyl… Meth…
#> 2 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 576 gene expre… Illu…
#> 3 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 495 miRNA matu… Illu…
#> 4 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 706 phenotype Phen…
#> 5 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… NA protein ex… RPPA
#> 6 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 33 gene expre… Agil…
#> 7 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 576 exon expre… Illu…
#> # ℹ 13 more variables: Type <chr>, AnatomicalOrigin <chr>, SampleType <chr>,
#> # Tags <chr>, ProbeMap <chr>, LongTitle <chr>, Citation <chr>, Version <chr>,
#> # Unit <chr>, Platform <chr>, ProjectID <chr>, DataType <chr>, FileType <chr>
#>
# 2 only choose 'LUAD' and its clinical data
getTCGAdata(project = "LUAD")
#> $Xena
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (1 total):
#> TCGA Lung Adenocarcinoma (LUAD)
#> datasets() (1 total):
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>
#> $DataInfo
#> # A tibble: 1 × 20
#> XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype Label
#> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 https://… tcgaHub TCGA Lung … TCGA.LUAD.s… 706 phenotype Phen…
#> # ℹ 13 more variables: Type <chr>, AnatomicalOrigin <chr>, SampleType <chr>,
#> # Tags <chr>, ProbeMap <chr>, LongTitle <chr>, Citation <chr>, Version <chr>,
#> # Unit <chr>, Platform <chr>, ProjectID <chr>, DataType <chr>, FileType <chr>
#>
if (FALSE) { # \dontrun{
###### download datasets
# 3 download clinical datasets of LUAD and LUSC
getTCGAdata(project = c("LUAD", "LUSC"), clinical = TRUE, download = TRUE)
# 4 download clinical, RPPA and gene mutation datasets of LUAD and LUSC
# getTCGAdata(project = c("LUAD", "LUSC"), clinical = TRUE, RPPAArray = TRUE, GeneMutation = TRUE)
} # }