发表时间:2019年3月20日
TCGAbiolinks提供了一些从GDC进行数据下载和数据准备的函数,以方便进行下游的分析。
本节首先解释不同的下载方法和SummarizedExperiment
对象,它是TCGAbiolinks中使用的默认数据结构,后面是一些示例。.
数据下载:两个方法差异
TCGAbiolinks可以使用两种方法下载GDC数据:
api
方法相比可能更慢。tar.gz
的压缩文件。如果文件的大小和数量太大,这个tar.gz
文件会太大导致下载失败的可能性提高。为了解决这个问题,我们将使用files.per.chunk
功能将文件拆分成多个小文件,例如,如果chunks.per.download
等于10,我们将每个tar.gz分为10个小文件下载。数据准备:SummarizedExperiment
对象
使用 SummarizedExperiment package,我们可以从SummarizedExperiment
对象中提取三个主要的数据矩阵
colData(data)
:获得样本信息的矩阵,包括了从对应TCGA论文中获得的临床数据以及肿瘤亚型信息assay(data)
:获得Assay信息的矩阵,就是每一个样本中基因的表达量rowRanges(data)
:获得特征(一般是指基因)信息的矩阵,包括特征的元数据,例如基因所在基因组范围Summarized Experiment:注释信息
使用GDCprepare
函数时,会调用一个参数SummarizedExperiment
,该参数决定了输出类型为Summarized Experiment(默认选项)或数据框。为了创建一个Summarized Experiment对象,我们需要使用最新的基因组注释文件进行数据注释。比如:1)对于legacy数据(与hg19对齐的数据),TCGAbiolinks正在使用GRCh37.p13进行注释;2)对于harmonized数据(与hg38对齐的数据),TCGAbiolinks正在使用GRCh38.p7 (May 2017)进行注释
不幸的是,在GRCh38.p7 这样的注释文件更新后,比如一些基因缩写名称的改变/删除、更改基因坐标等。这可能会导致一些TCGA数据的丢失。例如,如果基因被删除,我们就不能再映射它了,那么在SummarizedExperiment
中这些信息会丢失。
如果设置SummarizedExperiment
为FALSE
,您将获得未修改的数据,并需要您自己去注释。
此外,DNA甲基化数据并没有更新。但是可以在这里找到最新的元数据:http://zwdzwd.github.io/InfiniumAnnotation
GDCdownload
函数Argument | Description |
---|---|
query | A query for GDCquery function |
token.file | Token file to download controled data (only for method = “client”) |
method | Uses the API (POST method) or gdc client tool. Options “api”, “client”. API is faster, but the data might get corrupted in the download, and it might need to be executed again |
directory | Directory/Folder where the data was downloaded. Default: GDCdata |
files.per.chunk | This will make the API method only download n (files.per.chunk) files at a time. This may reduce the download problems when the data size is too large. Expected a integer number (example files.per.chunk = 6) |
GDCprepare
函数Argument | Description |
---|---|
query | A query for GDCquery function |
save | Save result as RData object? |
save.filename | Name of the file to be save if empty an automatic will be created |
directory | Directory/Folder where the data was downloaded. Default: GDCdata |
summarizedExperiment | Create a summarizedExperiment? Default TRUE (if possible) |
remove.files.prepared | Remove the files read? Default: FALSE This argument will be considered only if save argument is set to true |
add.gistic2.mut | If a list of genes (gene symbol) is given, columns with gistic2 results from GDAC firehose (hg19) and a column indicating if there is or not mutation in that gene (hg38) (TRUE or FALSE - use the MAF file for more information) will be added to the sample matrix in the summarized Experiment object. |
mut.pipeline | If add.gistic2.mut is not NULL this field will be taken in consideration. Four separate variant calling pipelines are implemented for GDC data harmonization. Options: muse, varscan2, somaticsniper, MuTect2. For more information: https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/ |
mutant_variant_classification | List of mutant_variant_classification that will be consider a sample mutant or not. Default: “Frame_Shift_Del”, “Frame_Shift_Ins”, “Missense_Mutation”, “Nonsense_Mutation”, “Splice_Site”, “In_Frame_Del”, “In_Frame_Ins”, “Translation_Start_Site”, “Nonstop_Mutation” |
在这个例子中,我们将使用GDC API方法从Legacy数据库(数据与参考基因组hg19对齐)下载两个样本的基因表达数据,并显示object数据和元数据。
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(project = "TCGA-GBM",
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
legacy = TRUE)
GDCdownload(query, method = "api", files.per.chunk = 10)
data <- GDCprepare(query)
# Gene expression aligned against hg19.
datatable(as.data.frame(colData(data)),
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
patient | barcode | sample | shortLetterCode | definition | classification_of_tumor | last_known_disease_status | updated_datetime.x | primary_diagnosis | tumor_stage | age_at_diagnosis | vital_status | morphology | days_to_death | days_to_last_known_disease_status | created_datetime.x | state.x | days_to_recurrence | diagnosis_id | tumor_grade | treatments | tissue_or_organ_of_origin | days_to_birth | progression_or_recurrence | prior_malignancy | site_of_resection_or_biopsy | days_to_last_follow_up | cigarettes_per_day | weight | updated_datetime.y | alcohol_history | alcohol_intensity | bmi | years_smoked | created_datetime.y | state.y | exposure_id | height | updated_datetime | created_datetime | gender | year_of_birth | state | race | demographic_id | ethnicity | year_of_death | bcr_patient_barcode | dbgap_accession_number | disease_type | released | state.1 | primary_site | project_id | name | subtype_patient | subtype_Tissue.source.site | subtype_Study | subtype_BCR | subtype_Whole.exome | subtype_Whole.genome | subtype_RNAseq | subtype_SNP6 | subtype_U133a | subtype_HM450 | subtype_HM27 | subtype_RPPA | subtype_Histology | subtype_Grade | subtype_Age..years.at.diagnosis. | subtype_Gender | subtype_Survival..months. | subtype_Vital.status..1.dead. | subtype_Karnofsky.Performance.Score | subtype_Mutation.Count | subtype_Percent.aneuploidy | subtype_IDH.status | subtype_X1p.19q.codeletion | subtype_IDH.codel.subtype | subtype_MGMT.promoter.status | subtype_Chr.7.gain.Chr.10.loss | subtype_Chr.19.20.co.gain | subtype_TERT.promoter.status | subtype_TERT.expression..log2. | subtype_TERT.expression.status | subtype_ATRX.status | subtype_DAXX.status | subtype_Telomere.Maintenance | subtype_BRAF.V600E.status | subtype_BRAF.KIAA1549.fusion | subtype_ABSOLUTE.purity | subtype_ABSOLUTE.ploidy | subtype_ESTIMATE.stromal.score | subtype_ESTIMATE.immune.score | subtype_ESTIMATE.combined.score | subtype_Original.Subtype | subtype_Transcriptome.Subtype | subtype_Pan.Glioma.RNA.Expression.Cluster | subtype_IDH.specific.RNA.Expression.Cluster | subtype_Pan.Glioma.DNA.Methylation.Cluster | subtype_IDH.specific.DNA.Methylation.Cluster | subtype_Supervised.DNA.Methylation.Cluster | subtype_Random.Forest.Sturm.Cluster | subtype_RPPA.cluster | subtype_Telomere.length.estimate.in.blood.normal..Kb. | subtype_Telomere.length.estimate.in.tumor..Kb. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TCGA-14-0736 | TCGA-14-0736-02A-01R-2005-01 | TCGA-14-0736-02A | TR | Recurrent Solid Tumor | not reported | not reported | 2017-03-04T16:44:35.784223-06:00 | c71.9 | not reported | 18219 | dead | 9440/3 | 460 | live | c2cca6a5-69cd-5e31-b0e9-fa80d38e1375 | not reported | [object Object] | c71.9 | -18219 | not reported | not reported | c71.9 | 460 | 2017-03-04T16:37:25.850486-06:00 | live | 93758177-ffbb-567a-8ae4-04ad1d48cb45 | 2017-03-04T16:37:28.862150-06:00 | male | 1950 | live | black or african american | e597a52f-1c28-5248-a751-22d810af3013 | not reported | 2000 | TCGA-14-0736 | Glioblastoma Multiforme | true | legacy | Brain | TCGA-GBM | Glioblastoma Multiforme | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TCGA-06-0211 | TCGA-06-0211-02A-02R-2005-01 | TCGA-06-0211-02A | TR | Recurrent Solid Tumor | not reported | not reported | 2017-03-04T16:44:35.784223-06:00 | c71.9 | not reported | 17515 | dead | 9440/3 | 360 | live | cec1bfa2-94cf-5bf5-ad44-d1aad1c78586 | not reported | [object Object] | c71.9 | -17515 | not reported | not reported | c71.9 | 360 | 2017-03-04T16:37:25.850486-06:00 | live | 68b17566-c576-5237-badd-6e1979a8f86f | 2017-03-04T16:37:28.862150-06:00 | male | 1950 | live | white | fe9a665e-6e67-5c8c-ac4f-b316f44db506 | not hispanic or latino | 1997 | TCGA-06-0211 | Glioblastoma Multiforme | true | legacy | Brain | TCGA-GBM | Glioblastoma Multiforme |
# Only first 100 to make render faster
datatable(assay(data)[1:5,],
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = TRUE)
TCGA-14-0736-02A-01R-2005-01 | TCGA-06-0211-02A-02R-2005-01 | |
---|---|---|
A1BG | 798.1285 | 402.2892 |
A2M | 11681.6573 | 33835.5229 |
NAT1 | 164.5998 | 70.9712 |
NAT2 | 5.637 | 4.2689 |
RP11-986E7.7 | 206399.0981 | 45121.6649 |
rowRanges(data)
## GRanges object with 100 ranges and 3 metadata columns:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## A1BG chr19 58856544-58864865 - | A1BG
## A2M chr12 9220260-9268825 - | A2M
## NAT1 chr8 18027986-18081198 + | NAT1
## NAT2 chr8 18248755-18258728 + | NAT2
## RP11-986E7.7 chr14 95058395-95090983 + | RP11-986E7.7
## ... ... ... ... . ...
## ADORA1 chr1 203059782-203136533 + | ADORA1
## ADORA2A chr22 24813847-24838328 + | ADORA2A
## ADORA2B chr17 15848231-15879060 + | ADORA2B
## ADORA3 chr1 112025970-112106584 - | ADORA3
## ADPRH chr3 119298115-119308792 + | ADPRH
## entrezgene ensembl_gene_id
## <numeric> <character>
## A1BG 1 ENSG00000121410
## A2M 2 ENSG00000175899
## NAT1 9 ENSG00000171428
## NAT2 10 ENSG00000156006
## RP11-986E7.7 12 ENSG00000273259
## ... ... ...
## ADORA1 134 ENSG00000163485
## ADORA2A 135 ENSG00000128271
## ADORA2B 136 ENSG00000170425
## ADORA3 140 ENSG00000121933
## ADPRH 141 ENSG00000144843
## -------
## seqinfo: 24 sequences from an unspecified genome; no seqlengths
在这个例子中,我们将从harmonized数据库下载两个样本的基因表达数据(数据与参考基因组hg38对齐),并显示了object数据和元数据。
library(TCGAbiolinks)
library(SummarizedExperiment)
# Gene expression aligned against hg38
query <- GDCquery(project = "TCGA-GBM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"))
GDCdownload(query)
data <- GDCprepare(query)
datatable(as.data.frame(colData(data)),
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
patient | barcode | sample | shortLetterCode | definition | classification_of_tumor | last_known_disease_status | updated_datetime.x | primary_diagnosis | tumor_stage | age_at_diagnosis | vital_status | morphology | days_to_death | days_to_last_known_disease_status | created_datetime.x | state.x | days_to_recurrence | diagnosis_id | tumor_grade | treatments | tissue_or_organ_of_origin | days_to_birth | progression_or_recurrence | prior_malignancy | site_of_resection_or_biopsy | days_to_last_follow_up | cigarettes_per_day | weight | updated_datetime.y | alcohol_history | alcohol_intensity | bmi | years_smoked | created_datetime.y | state.y | exposure_id | height | updated_datetime | created_datetime | gender | year_of_birth | state | race | demographic_id | ethnicity | year_of_death | bcr_patient_barcode | dbgap_accession_number | disease_type | released | state.1 | primary_site | project_id | name | subtype_patient | subtype_Tissue.source.site | subtype_Study | subtype_BCR | subtype_Whole.exome | subtype_Whole.genome | subtype_RNAseq | subtype_SNP6 | subtype_U133a | subtype_HM450 | subtype_HM27 | subtype_RPPA | subtype_Histology | subtype_Grade | subtype_Age..years.at.diagnosis. | subtype_Gender | subtype_Survival..months. | subtype_Vital.status..1.dead. | subtype_Karnofsky.Performance.Score | subtype_Mutation.Count | subtype_Percent.aneuploidy | subtype_IDH.status | subtype_X1p.19q.codeletion | subtype_IDH.codel.subtype | subtype_MGMT.promoter.status | subtype_Chr.7.gain.Chr.10.loss | subtype_Chr.19.20.co.gain | subtype_TERT.promoter.status | subtype_TERT.expression..log2. | subtype_TERT.expression.status | subtype_ATRX.status | subtype_DAXX.status | subtype_Telomere.Maintenance | subtype_BRAF.V600E.status | subtype_BRAF.KIAA1549.fusion | subtype_ABSOLUTE.purity | subtype_ABSOLUTE.ploidy | subtype_ESTIMATE.stromal.score | subtype_ESTIMATE.immune.score | subtype_ESTIMATE.combined.score | subtype_Original.Subtype | subtype_Transcriptome.Subtype | subtype_Pan.Glioma.RNA.Expression.Cluster | subtype_IDH.specific.RNA.Expression.Cluster | subtype_Pan.Glioma.DNA.Methylation.Cluster | subtype_IDH.specific.DNA.Methylation.Cluster | subtype_Supervised.DNA.Methylation.Cluster | subtype_Random.Forest.Sturm.Cluster | subtype_RPPA.cluster | subtype_Telomere.length.estimate.in.blood.normal..Kb. | subtype_Telomere.length.estimate.in.tumor..Kb. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TCGA-14-0736 | TCGA-14-0736-02A-01R-2005-01 | TCGA-14-0736-02A | TR | Recurrent Solid Tumor | not reported | not reported | 2017-03-04T16:44:35.784223-06:00 | c71.9 | not reported | 18219 | dead | 9440/3 | 460 | live | c2cca6a5-69cd-5e31-b0e9-fa80d38e1375 | not reported | [object Object] | c71.9 | -18219 | not reported | not reported | c71.9 | 460 | 2017-03-04T16:37:25.850486-06:00 | live | 93758177-ffbb-567a-8ae4-04ad1d48cb45 | 2017-03-04T16:37:28.862150-06:00 | male | 1950 | live | black or african american | e597a52f-1c28-5248-a751-22d810af3013 | not reported | 2000 | TCGA-14-0736 | Glioblastoma Multiforme | true | legacy | Brain | TCGA-GBM | Glioblastoma Multiforme | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TCGA-06-0211 | TCGA-06-0211-02A-02R-2005-01 | TCGA-06-0211-02A | TR | Recurrent Solid Tumor | not reported | not reported | 2017-03-04T16:44:35.784223-06:00 | c71.9 | not reported | 17515 | dead | 9440/3 | 360 | live | cec1bfa2-94cf-5bf5-ad44-d1aad1c78586 | not reported | [object Object] | c71.9 | -17515 | not reported | not reported | c71.9 | 360 | 2017-03-04T16:37:25.850486-06:00 | live | 68b17566-c576-5237-badd-6e1979a8f86f | 2017-03-04T16:37:28.862150-06:00 | male | 1950 | live | white | fe9a665e-6e67-5c8c-ac4f-b316f44db506 | not hispanic or latino | 1997 | TCGA-06-0211 | Glioblastoma Multiforme | true | legacy | Brain | TCGA-GBM | Glioblastoma Multiforme |
datatable(assay(data)[1:5,],
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = TRUE)
TCGA-14-0736-02A-01R-2005-01 | TCGA-06-0211-02A-02R-2005-01 | |
---|---|---|
ENSG00000000003 | 1027433.82686 | 424382.673556 |
ENSG00000000005 | 22979.3388709 | 2481.99005963 |
ENSG00000000419 | 810469.937335 | 738697.921797 |
ENSG00000000457 | 46320.6174017 | 39986.1652223 |
ENSG00000000460 | 26077.4119828 | 31642.5655792 |
rowRanges(data)
## GRanges object with 100 ranges and 3 metadata columns:
## seqnames ranges strand | ensembl_gene_id
## <Rle> <IRanges> <Rle> | <character>
## ENSG00000000003 chrX 100627109-100639991 - | ENSG00000000003
## ENSG00000000005 chrX 100584802-100599885 + | ENSG00000000005
## ENSG00000000419 chr20 50934867-50958555 - | ENSG00000000419
## ENSG00000000457 chr1 169849631-169894267 - | ENSG00000000457
## ENSG00000000460 chr1 169662007-169854080 + | ENSG00000000460
## ... ... ... ... . ...
## ENSG00000005421 chr7 95297676-95324707 - | ENSG00000005421
## ENSG00000005436 chr2 75652000-75710989 - | ENSG00000005436
## ENSG00000005448 chr2 74421678-74425755 + | ENSG00000005448
## ENSG00000005469 chr7 87345681-87399795 + | ENSG00000005469
## ENSG00000005471 chr7 87401697-87480435 - | ENSG00000005471
## external_gene_name original_ensembl_gene_id
## <character> <character>
## ENSG00000000003 TSPAN6 ENSG00000000003.13
## ENSG00000000005 TNMD ENSG00000000005.5
## ENSG00000000419 DPM1 ENSG00000000419.11
## ENSG00000000457 SCYL3 ENSG00000000457.12
## ENSG00000000460 C1orf112 ENSG00000000460.15
## ... ... ...
## ENSG00000005421 PON1 ENSG00000005421.7
## ENSG00000005436 GCFC2 ENSG00000005436.12
## ENSG00000005448 WDR54 ENSG00000005448.15
## ENSG00000005469 CROT ENSG00000005469.10
## ENSG00000005471 ABCB4 ENSG00000005471.14
## -------
## seqinfo: 24 sequences from an unspecified genome; no seqlengths
GDCprepare
:输出此功能仍在开发中,并不适用于所有情况。具体请参阅下表中的信息。此外,关于数据查询、数据下载、数据准备的示例可以在这个Gist能找到。
Data.category | Data.type | Workflow Type | Status |
---|---|---|---|
Transcriptome Profiling | Gene Expression Quantification | HTSeq - Counts | Data frame or SE (losing 5% of information when mapping to genomic regions) |
Transcriptome Profiling | Gene Expression Quantification | HTSeq - FPKM-UQ | Returning only a (losing 5% of information when mapping to genomic regions) |
Transcriptome Profiling | Gene Expression Quantification | HTSeq - FPKM | Returning only a (losing 5% of information when mapping to genomic regions) |
Transcriptome Profiling | Isoform Expression Quantification | Not needed | |
Transcriptome Profiling | miRNA Expression Quantification | Not needed | Returning only a dataframe for the moment |
Copy number variation | Copy Number Segment | Not needed | Returning only a dataframe for the moment |
Copy number variation | Masked Copy Number Segment | Not needed | Returning only a dataframe for the moment |
Copy number variation | Gene Level Copy Number Scores | Not needed | Returning only a dataframe for the moment |
Simple Nucleotide Variation | |||
Raw Sequencing Data | |||
Biospecimen | Slide Image | ||
Biospecimen | Biospecimen Supplement | ||
Clinical |
Data.category | Data.type | Platform | file.type | Status |
---|---|---|---|---|
Transcriptome Profiling | ||||
Copy number variation | - | Affymetrix SNP Array 6.0 | nocnv_hg18.seg | Working |
- | Affymetrix SNP Array 6.0 | hg18.seg | Working | |
- | Affymetrix SNP Array 6.0 | nocnv_hg19.seg | Working | |
- | Affymetrix SNP Array 6.0 | hg19.seg | Working | |
- | Illumina HiSeq | Several | Working | |
Simple Nucleotide Variation | Simple somatic mutation | |||
Raw Sequencing Data | ||||
Biospecimen | ||||
Clinical | ||||
Protein expression | MDA RPPA Core | - | Working | |
Gene expression | Gene expression quantification | Illumina HiSeq | normalized_results | Working |
Illumina HiSeq | results | Working | ||
HT_HG-U133A | - | Working | ||
AgilentG4502A_07_2 | - | Data frame only | ||
AgilentG4502A_07_1 | - | Data frame only | ||
HuEx-1_0-st-v2 | FIRMA.txt | Not Preparing | ||
gene.txt | Not Preparing | |||
Isoform expression quantification | ||||
miRNA gene quantification | ||||
Exon junction quantification | ||||
Exon quantification | ||||
miRNA isoform quantification | ||||
DNA methylation | Illumina Human Methylation 450 | Not used | Working | |
Illumina Human Methylation 27 | Not used | Working | ||
Illumina DNA Methylation OMA003 CPI | Not used | Working | ||
Illumina DNA Methylation OMA002 CPI | Not used | Working | ||
Illumina Hi Seq | Not working | |||
Raw Microarray Data | ||||
Structural Rearrangement | ||||
Other |
#-------------------------------------------------------
# Example to idat files from TCGA projects
#-------------------------------------------------------
library(TCGAbiolinks)
library(SummarizedExperiment)
projects <- TCGAbiolinks:::getGDCprojects()$project_id
projects <- projects[grepl('^TCGA',projects,perl=T)]
match.file.cases.all <- NULL
for(proj in projects){
print(proj)
query <- GDCquery(project = proj,
data.category = "Raw microarray data",
data.type = "Raw intensities",
experimental.strategy = "Methylation array",
legacy = TRUE,
file.type = ".idat",
platform = "Illumina Human Methylation 450")
match.file.cases <- getResults(query,cols=c("cases","file_name"))
match.file.cases$project <- proj
match.file.cases.all <- rbind(match.file.cases.all,match.file.cases)
tryCatch(GDCdownload(query, method = "api", files.per.chunk = 20),
error = function(e) GDCdownload(query, method = "client"))
}
# This will create a map between idat file name, cases (barcode) and project
readr::write_tsv(match.file.cases.all, path = "idat_filename_case.txt")
# code to move all files to local folder
for(file in dir(".",pattern = ".idat", recursive = T)){
TCGAbiolinks::move(file,basename(file))
}
library(TCGAbiolinks)
library(SummarizedExperiment)
query_meth.hg19 <- GDCquery(project= "TCGA-LGG",
data.category = "DNA methylation",
platform = "Illumina Human Methylation 450",
barcode = c("TCGA-HT-8111-01A-11D-2399-05","TCGA-HT-A5R5-01A-11D-A28N-05"),
legacy = TRUE)
GDCdownload(query_meth.hg19)
data.hg19 <- GDCprepare(query_meth.hg19)
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(project = "TCGA-GBM",
data.category = "Protein expression",
legacy = TRUE,
barcode = c("TCGA-OX-A56R-01A-21-A44T-20","TCGA-08-0357-01A-21-1898-20"))
GDCdownload(query)
data <- GDCprepare(query, save = TRUE,
save.filename = "gbmProteinExpression.rda",
remove.files.prepared = TRUE)
library(TCGAbiolinks)
library(SummarizedExperiment)
# Aligned against Hg19
query.exp.hg19 <- GDCquery(project = "TCGA-GBM",
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
legacy = TRUE)
GDCdownload(query.exp.hg19)
data <- GDCprepare(query.exp.hg19)
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(project = "TCGA-ACC",
data.category = "Copy Number Variation",
data.type = "Copy Number Segment",
barcode = c( "TCGA-OR-A5KU-01A-11D-A29H-01", "TCGA-OR-A5JK-01A-11D-A29H-01"))
GDCdownload(query)
data <- GDCprepare(query)
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(project = "TCGA-ACC",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number Scores",
access="open")
GDCdownload(query)
data <- GDCprepare(query)
library(TCGAbiolinks)
library(SummarizedExperiment)
# mRNA pipeline: https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/
query.exp.hg38 <- GDCquery(project = "TCGA-GBM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"))
GDCdownload(query.exp.hg38)
expdat <- GDCprepare(query = query.exp.hg38,
save = TRUE,
save.filename = "exp.rda")
library(TCGAbiolinks)
library(SummarizedExperiment)
query.mirna <- GDCquery(project = "TARGET-AML",
experimental.strategy = "miRNA-Seq",
data.category = "Transcriptome Profiling",
barcode = c("TARGET-20-PATDNN","TARGET-20-PAPUNR"),
data.type = "miRNA Expression Quantification")
GDCdownload(query.mirna)
mirna <- GDCprepare(query = query.mirna,
save = TRUE,
save.filename = "mirna.rda")
query.isoform <- GDCquery(project = "TARGET-AML",
experimental.strategy = "miRNA-Seq",
data.category = "Transcriptome Profiling",
barcode = c("TARGET-20-PATDNN","TARGET-20-PAPUNR"),
data.type = "Isoform Expression Quantification")
GDCdownload(query.isoform)
isoform <- GDCprepare(query = query.isoform,
save = TRUE,
save.filename = "mirna-isoform.rda")
library(TCGAbiolinks)
library(SummarizedExperiment)
#--------------------------------------
# DNA methylation data
#--------------------------------------
# DNA methylation aligned to hg38
query_met.hg38 <- GDCquery(project= "TCGA-LGG",
data.category = "DNA Methylation",
platform = "Illumina Human Methylation 450",
barcode = c("TCGA-HT-8111-01A-11D-2399-05","TCGA-HT-A5R5-01A-11D-A28N-05"))
GDCdownload(query_met.hg38)
data.hg38 <- GDCprepare(query_met.hg38)
# Using sesame http://bioconductor.org/packages/sesame/
# Please cite 10.1093/nar/gky691 and doi: 10.1093/nar/gkt090.
library(TCGAbiolinks)
library(SummarizedExperiment)
proj <- "TCGA-ACC"
query <- GDCquery(project = proj,
data.category = "Raw microarray data",
data.type = "Raw intensities",
experimental.strategy = "Methylation array",
legacy = TRUE,
barcode = c("TCGA-OR-A5JT","CGA-OR-A5LG","TCGA-OR-A5JX"),
file.type = ".idat",
platform = "Illumina Human Methylation 450")
tryCatch(GDCdownload(query, method = "api", files.per.chunk = 20),
error = function(e) GDCdownload(query, method = "client"))
betas <- GDCprepare(query)
本文由 AlphaJP 创作,如果您觉得本文不错,请随意赞赏
采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
原文链接:https://blog.computsystmed.com/archives/translation-tcgabiolinks-downloading-and-preparing-files-for-analysis
最后更新:2019-05-25 17:13:23
Update your browser to view this website correctly. Update my browser now