发表时间:2019年3月20日
TCGAbiolinks提供了一些搜索GDC数据库的功能。本节首先介绍GDC数据的两个可以用的源:Harmonized和Legacy Archive。然后提供一些实例,探究如何如何访问和使用它们。
TCGAbiolinks可以使用两个GDC源下载GDC数据:
GDC Bioinformatics Pipelines
对提供的生物样本和临床数据进行标准化TCGA Barcode 由一组标识符组成。每一个Barcode都对应一个TCGA数据元素。有关元数据标识符如何构成Barcode 的说明,请参阅下面信息。其中,Aliquot Barcode包含最多数量的标识符。
案例:
有关更多信息,请查看TCGA wiki
可以使用GDCquery
函数轻松搜索GDC数据。该函数可以使用以下参数来搜索:
Project | A list of valid project (see table below)] | |
---|---|---|
data.category | A valid project (see list with TCGAbiolinks:::getProjectSummary(project)) | |
data.type | A data type to filter the files to download | |
workflow.type | GDC workflow type | |
legacy | Search in the legacy repository | |
access | Filter by access type. Possible values: controlled, open | |
platform | Example: | |
CGH- 1x1M_G4447A | IlluminaGA_RNASeqV2 | |
AgilentG4502A_07 | IlluminaGA_mRNA_DGE | |
Human1MDuo | HumanMethylation450 | |
HG-CGH-415K_G4124A | IlluminaGA_miRNASeq | |
HumanHap550 | IlluminaHiSeq_miRNASeq | |
ABI | H-miRNA_8x15K | |
HG-CGH-244A | SOLiD_DNASeq | |
IlluminaDNAMethylation_OMA003_CPI | IlluminaGA_DNASeq_automated | |
IlluminaDNAMethylation_OMA002_CPI | HG-U133_Plus_2 | |
HuEx- 1_0-st-v2 | Mixed_DNASeq | |
H-miRNA_8x15Kv2 | IlluminaGA_DNASeq_curated | |
MDA_RPPA_Core | IlluminaHiSeq_TotalRNASeqV2 | |
HT_HG-U133A | IlluminaHiSeq_DNASeq_automated | |
diagnostic_images | microsat_i | |
IlluminaHiSeq_RNASeq | SOLiD_DNASeq_curated | |
IlluminaHiSeq_DNASeqC | Mixed_DNASeq_curated | |
IlluminaGA_RNASeq | IlluminaGA_DNASeq_Cont_automated | |
IlluminaGA_DNASeq | IlluminaHiSeq_WGBS | |
pathology_reports | IlluminaHiSeq_DNASeq_Cont_automated | |
Genome_Wide_SNP_6 | bio | |
tissue_images | Mixed_DNASeq_automated | |
HumanMethylation27 | Mixed_DNASeq_Cont_curated | |
IlluminaHiSeq_RNASeqV2 | Mixed_DNASeq_Cont | |
file.type | To be used in the legacy database for some platforms, to define which file types to be used. | |
barcode | A list of barcodes to filter the files to download | |
experimental.strategy | Filter to experimental stratey. Harmonized: WXS, RNA-Seq, miRNA-Seq, Genotyping Array. Legacy: WXS, RNA-Seq, miRNA-Seq, Genotyping Array, DNA-Seq, Methylation array, Protein expression array, WXS,CGH array, VALIDATION, Gene expression array,WGS, MSI-Mono-Dinucleotide Assay, miRNA expression array, Mixed strategies, AMPLICON, Exon array, Total RNA-Seq, Capillary sequencing, Bisulfite-Seq | |
sample.type | A sample type to filter the files to download |
在project
中可以选择的选项如下:
表格内容注释:
dbgap_accession_number | disease_type | releasable | released | state | primary_site | project_id | id | name | tumor |
---|---|---|---|---|---|---|---|---|---|
Epithelial Neoplasms, NOS,Adenomas and Adenocarcinomas | false | true | open | Thyroid gland | TCGA-THCA | TCGA-THCA | Thyroid Carcinoma | THCA | |
phs000465 | Acute Myeloid Leukemia | false | true | open | Blood | TARGET-AML | TARGET-AML | Acute Myeloid Leukemia | AML |
phs000467 | Neuroblastoma | false | true | open | Nervous System | TARGET-NBL | TARGET-NBL | Neuroblastoma | NBL |
Gliomas | false | true | open | Brain | TCGA-LGG | TCGA-LGG | Brain Lower Grade Glioma | LGG | |
Cystic, Mucinous and Serous Neoplasms,Adenomas and Adenocarcinomas,Complex Epithelial Neoplasms,Squamous Cell Neoplasms | false | true | open | Cervix uteri | TCGA-CESC | TCGA-CESC | Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma | CESC | |
Lipomatous Neoplasms,Soft Tissue Tumors and Sarcomas, NOS,Fibromatous Neoplasms,Myomatous Neoplasms,Nerve Sheath Tumors,Synovial-like Neoplasms | false | true | open | Retroperitoneum and peritoneum,Bones, joints and articular cartilage of limbs,Other and unspecified parts of tongue,Stomach,Other and unspecified male genital organs,Colon,Connective, subcutaneous and other soft tissues,Meninges,Ovary,Corpus uteri,Peripheral nerves and autonomic nervous system,Uterus, NOS,Kidney | TCGA-SARC | TCGA-SARC | Sarcoma | SARC | |
Adenomas and Adenocarcinomas | false | true | open | Adrenal gland | TCGA-ACC | TCGA-ACC | Adrenocortical Carcinoma | ACC | |
phs000468 | Osteosarcoma | false | true | open | Bone | TARGET-OS | TARGET-OS | Osteosarcoma | OS |
Cystic, Mucinous and Serous Neoplasms,Adenomas and Adenocarcinomas | false | true | open | Rectosigmoid junction,Unknown,Rectum,Colon,Connective, subcutaneous and other soft tissues | TCGA-READ | TCGA-READ | Rectum Adenocarcinoma | READ | |
phs000470 | Rhabdoid Tumor | false | true | open | Kidney | TARGET-RT | TARGET-RT | Rhabdoid Tumor | RT |
Adenomas and Adenocarcinomas | false | true | open | Liver and intrahepatic bile ducts | TCGA-LIHC | TCGA-LIHC | Liver Hepatocellular Carcinoma | LIHC | |
Adenomas and Adenocarcinomas | false | true | open | Kidney | TCGA-KICH | TCGA-KICH | Kidney Chromophobe | KICH | |
Thymic Epithelial Neoplasms | false | true | open | Heart, mediastinum, and pleura,Thymus | TCGA-THYM | TCGA-THYM | Thymoma | THYM | |
Cystic, Mucinous and Serous Neoplasms,Adenomas and Adenocarcinomas | false | true | open | Stomach | TCGA-STAD | TCGA-STAD | Stomach Adenocarcinoma | STAD | |
Squamous Cell Neoplasms | false | true | open | Bronchus and lung | TCGA-LUSC | TCGA-LUSC | Lung Squamous Cell Carcinoma | LUSC | |
Mesothelial Neoplasms | false | true | open | Heart, mediastinum, and pleura,Bronchus and lung | TCGA-MESO | TCGA-MESO | Mesothelioma | MESO | |
Cystic, Mucinous and Serous Neoplasms,Epithelial Neoplasms, NOS,Adenomas and Adenocarcinomas,Ductal and Lobular Neoplasms | false | true | open | Pancreas | TCGA-PAAD | TCGA-PAAD | Pancreatic Adenocarcinoma | PAAD | |
Transitional Cell Papillomas and Carcinomas,Epithelial Neoplasms, NOS,Adenomas and Adenocarcinomas,Squamous Cell Neoplasms | false | true | open | Bladder | TCGA-BLCA | TCGA-BLCA | Bladder Urothelial Carcinoma | BLCA | |
phs000466 | Clear Cell Sarcoma of the Kidney | false | true | open | Kidney | TARGET-CCSK | TARGET-CCSK | Clear Cell Sarcoma of the Kidney | CCSK |
Squamous Cell Neoplasms | false | true | open | Other and ill-defined sites in lip, oral cavity and pharynx,Palate,Other and unspecified parts of tongue,Hypopharynx,Lip,Tonsil,Gum,Larynx,Oropharynx,Floor of mouth,Bones, joints and articular cartilage of other and unspecified sites,Other and unspecified parts of mouth,Base of tongue | TCGA-HNSC | TCGA-HNSC | Head and Neck Squamous Cell Carcinoma | HNSC | |
Adenomas and Adenocarcinomas | false | true | open | Kidney | TCGA-KIRC | TCGA-KIRC | Kidney Renal Clear Cell Carcinoma | KIRC | |
Not Reported,Gliomas | false | true | open | Brain | TCGA-GBM | TCGA-GBM | Glioblastoma Multiforme | GBM | |
Nevi and Melanomas | false | true | open | Skin | TCGA-SKCM | TCGA-SKCM | Skin Cutaneous Melanoma | SKCM | |
phs001374 | Epithelial Neoplasms, NOS,Squamous Cell Neoplasms | true | true | open | Bronchus and lung | VAREPOP-APOLLO | VAREPOP-APOLLO | VA Research Precision Oncology Program | APOLLO |
Adenomas and Adenocarcinomas | false | true | open | Other and unspecified parts of biliary tract,Gallbladder,Liver and intrahepatic bile ducts | TCGA-CHOL | TCGA-CHOL | Cholangiocarcinoma | CHOL | |
Not Reported,Cystic, Mucinous and Serous Neoplasms,Epithelial Neoplasms, NOS,Adenomas and Adenocarcinomas | false | true | open | Corpus uteri,Uterus, NOS | TCGA-UCEC | TCGA-UCEC | Uterine Corpus Endometrial Carcinoma | UCEC | |
Cystic, Mucinous and Serous Neoplasms,Adenomas and Adenocarcinomas,Squamous Cell Neoplasms | false | true | open | Esophagus,Stomach | TCGA-ESCA | TCGA-ESCA | Esophageal Carcinoma | ESCA | |
Cystic, Mucinous and Serous Neoplasms,Epithelial Neoplasms, NOS,Adenomas and Adenocarcinomas,Complex Epithelial Neoplasms | false | true | open | Rectosigmoid junction,Colon | TCGA-COAD | TCGA-COAD | Colon Adenocarcinoma | COAD | |
Adnexal and Skin Appendage Neoplasms,Basal Cell Neoplasms,Adenomas and Adenocarcinomas,Cystic, Mucinous and Serous Neoplasms,Epithelial Neoplasms, NOS,Squamous Cell Neoplasms,Fibroepithelial Neoplasms,Ductal and Lobular Neoplasms,Complex Epithelial Neoplasms | false | true | open | Breast | TCGA-BRCA | TCGA-BRCA | Breast Invasive Carcinoma | BRCA | |
Not Reported,Cystic, Mucinous and Serous Neoplasms | false | true | open | Ovary | TCGA-OV | TCGA-OV | Ovarian Serous Cystadenocarcinoma | OV | |
Myeloid Leukemias | false | true | open | Hematopoietic and reticuloendothelial systems | TCGA-LAML | TCGA-LAML | Acute Myeloid Leukemia | LAML | |
Not Reported,Mature B-Cell Lymphomas | false | true | open | Heart, mediastinum, and pleura,Testis,Stomach,Lymph nodes,Bones, joints and articular cartilage of other and unspecified sites,Brain,Thyroid gland,Small intestine,Colon,Connective, subcutaneous and other soft tissues,Other and unspecified major salivary glands,Retroperitoneum and peritoneum,Hematopoietic and reticuloendothelial systems,Breast | TCGA-DLBC | TCGA-DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | DLBC | |
Nevi and Melanomas | false | true | open | Eye and adnexa | TCGA-UVM | TCGA-UVM | Uveal Melanoma | UVM | |
Cystic, Mucinous and Serous Neoplasms,Adenomas and Adenocarcinomas,Ductal and Lobular Neoplasms | false | true | open | Prostate gland | TCGA-PRAD | TCGA-PRAD | Prostate Adenocarcinoma | PRAD | |
phs001179 | Germ Cell Neoplasms,Acinar Cell Neoplasms,Miscellaneous Tumors,Thymic Epithelial Neoplasms,Gliomas,Basal Cell Neoplasms,Complex Mixed and Stromal Neoplasms,Ductal and Lobular Neoplasms,Neuroepitheliomatous Neoplasms,Complex Epithelial Neoplasms,Adnexal and Skin Appendage Neoplasms,Mesothelial Neoplasms,Mucoepidermoid Neoplasms,Not Reported,Adenomas and Adenocarcinomas,Cystic, Mucinous and Serous Neoplasms,Specialized Gonadal Neoplasms,Epithelial Neoplasms, NOS,Squamous Cell Neoplasms,Transitional Cell Papillomas and Carcinomas,Paragangliomas and Glomus Tumors,Nevi and Melanomas,Meningiomas | false | true | open | Testis,Gallbladder,Unknown,Other and unspecified parts of biliary tract,Adrenal gland,Thyroid gland,Spinal cord, cranial nerves, and other parts of central nervous system,Peripheral nerves and autonomic nervous system,Stomach,Cervix uteri,Bladder,Small intestine,Breast,Prostate gland,Other and ill-defined sites,Other and unspecified major salivary glands,Rectum,Retroperitoneum and peritoneum,Pancreas,Heart, mediastinum, and pleura,Other and ill-defined digestive organs,Bronchus and lung,Liver and intrahepatic bile ducts,Other and unspecified female genital organs,Thymus,Penis,Nasopharynx,Ovary,Uterus, NOS,Vulva,Other and unspecified urinary organs,Trachea,Ureter,Other endocrine glands and related structures,Not Reported,Colon,Anus and anal canal,Vagina,Skin,Esophagus,Eye and adnexa,Kidney | FM-AD | FM-AD | Foundation Medicine Adult Cancer Clinical Dataset (FM-AD) | AD |
Germ Cell Neoplasms | false | true | open | Testis | TCGA-TGCT | TCGA-TGCT | Testicular Germ Cell Tumors | TGCT | |
phs000471 | High-Risk Wilms Tumor | false | true | open | Kidney | TARGET-WT | TARGET-WT | High-Risk Wilms Tumor | WT |
phs001444 | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | false | true | open | Lymph Nodes | NCICCR-DLBCL | NCICCR-DLBCL | Genomic Variation in Diffuse Large B Cell Lymphomas | DLBCL |
Cystic, Mucinous and Serous Neoplasms,Acinar Cell Neoplasms,Adenomas and Adenocarcinomas | false | true | open | Bronchus and lung | TCGA-LUAD | TCGA-LUAD | Lung Adenocarcinoma | LUAD | |
phs001184 | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | false | true | open | Lymph Nodes | CTSP-DLBCL1 | CTSP-DLBCL1 | CTSP Diffuse Large B-Cell Lymphoma (DLBCL) CALGB 50303 | DLBCL1 |
Adenomas and Adenocarcinomas | false | true | open | Kidney | TCGA-KIRP | TCGA-KIRP | Kidney Renal Papillary Cell Carcinoma | KIRP | |
Paragangliomas and Glomus Tumors | false | true | open | Heart, mediastinum, and pleura,Other endocrine glands and related structures,Adrenal gland,Connective, subcutaneous and other soft tissues,Other and ill-defined sites,Spinal cord, cranial nerves, and other parts of central nervous system,Retroperitoneum and peritoneum | TCGA-PCPG | TCGA-PCPG | Pheochromocytoma and Paraganglioma | PCPG | |
Complex Mixed and Stromal Neoplasms | false | true | open | Uterus, NOS | TCGA-UCS | TCGA-UCS | Uterine Carcinosarcoma | UCS | |
Myeloid Leukemias,Lymphoid Leukemias | false | true | open | Hematopoietic and reticuloendothelial systems | TARGET-ALL-P3 | TARGET-ALL-P3 | Acute Lymphoblastic Leukemia - Phase III | ALL |
sample.type
可用参数如下:
tissue.code | shortLetterCode | tissue.definition |
---|---|---|
01 | TP | Primary solid Tumor |
02 | TR | Recurrent Solid Tumor |
03 | TB | Primary Blood Derived Cancer - Peripheral Blood |
04 | TRBM | Recurrent Blood Derived Cancer - Bone Marrow |
05 | TAP | Additional - New Primary |
06 | TM | Metastatic |
07 | TAM | Additional Metastatic |
08 | THOC | Human Tumor Original Cells |
09 | TBM | Primary Blood Derived Cancer - Bone Marrow |
10 | NB | Blood Derived Normal |
11 | NT | Solid Tissue Normal |
12 | NBC | Buccal Cell Normal |
13 | NEBV | EBV Immortalized Normal |
14 | NBM | Bone Marrow Normal |
20 | CELLC | Control Analyte |
40 | TRB | Recurrent Blood Derived Cancer - Peripheral Blood |
50 | CELL | Cell Lines |
60 | XP | Primary Xenograft Tissue |
61 | XCL | Cell Line Derived Xenograft Tissue |
其他搜索参数(data.category、data.type、workflow.type、platform、file.type)可以在下面找到。请注意,这些表格并不是完整的
legacy = FALSE
)datatable(readr::read_csv("https://docs.google.com/spreadsheets/d/1f98kFdj9mxVDc1dv4xTZdx8iWgUiDYO-qiFJINvmTZs/export?format=csv&gid=2046985454"),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 40),
rownames = FALSE)
## Parsed with column specification:
## cols(
## Data.category = col_character(),
## Data.type = col_character(),
## `Workflow Type` = col_character(),
## Platform = col_character()
## )
Data.category | Data.type | Workflow Type | Platform |
---|---|---|---|
Transcriptome Profiling | Gene Expression Quantification | HTSeq - Counts | |
Transcriptome Profiling | Gene Expression Quantification | HTSeq - FPKM | |
Transcriptome Profiling | Gene Expression Quantification | HTSeq - FPKM-UQ | |
Transcriptome Profiling | Isoform Expression Quantification | - | |
Transcriptome Profiling | miRNA Expression Quantification | - | |
Copy number variation | Copy Number Segment | ||
Copy number variation | Masked Copy Number Segment | ||
Copy number variation | Gene Level Copy Number Scores | ||
Simple Nucleotide Variation | - | ||
Raw Sequencing Data | - | ||
Biospecimen | - | ||
Clinical | - | ||
DNA Methylation | Methylation Beta Value | Illumina Human Methylation 450 | |
DNA Methylation | Methylation Beta Value | Illumina Human Methylation 27 |
legacy = TRUE
)datatable(readr::read_csv("https://docs.google.com/spreadsheets/d/1f98kFdj9mxVDc1dv4xTZdx8iWgUiDYO-qiFJINvmTZs/export?format=csv&gid=1817673686"),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 40),
rownames = FALSE)
## Parsed with column specification:
## cols(
## Data.category = col_character(),
## Data.type = col_character(),
## Platform = col_character(),
## file.type = col_character()
## )
Data.category | Data.type | Platform | file.type |
---|---|---|---|
Biospecimen | |||
Clinical | |||
Copy number variation | - | Affymetrix SNP Array 6.0 | nocnv_hg18.seg |
Copy number variation | - | Affymetrix SNP Array 6.0 | hg18.seg |
Copy number variation | - | Affymetrix SNP Array 6.0 | nocnv_hg19.seg |
Copy number variation | - | Affymetrix SNP Array 6.0 | hg19.seg |
Copy number variation | - | Illumina HiSeq | - |
DNA methylation | Illumina Human Methylation 450 | Not used | |
DNA methylation | Illumina Human Methylation 27 | Not used | |
DNA methylation | Illumina DNA Methylation OMA003 CPI | Not used | |
DNA methylation | Illumina DNA Methylation OMA002 CPI | Not used | |
DNA methylation | Illumina Hi Seq | ||
DNA methylation | Bisulfite sequence alignment | ||
DNA methylation | Methylation percentage | ||
DNA methylation | Aligned reads | ||
Gene expression | Gene expression quantification | Illumina HiSeq | normalized_results |
Gene expression | Gene expression quantification | Illumina HiSeq | results |
Gene expression | Gene expression quantification | HT_HG-U133A | - |
Gene expression | Gene expression quantification | AgilentG4502A_07_2 | - |
Gene expression | Gene expression quantification | AgilentG4502A_07_1 | - |
Gene expression | Gene expression quantification | HuEx-1_0-st-v2 | FIRMA.txt |
Gene expression | Gene expression quantification | gene.txt | |
Gene expression | Isoform expression quantification | - | - |
Gene expression | miRNA gene quantification | - | hg19.mirna |
Gene expression | miRNA gene quantification | hg19.mirbase20 | |
Gene expression | miRNA gene quantification | mirna | |
Gene expression | Exon junction quantification | - | - |
Gene expression | Exon quantification | - | - |
Gene expression | miRNA isoform quantification | - | hg19.isoform |
Gene expression | miRNA isoform quantification | - | isoform |
Other | |||
Protein expression | MDA RPPA Core | - | |
Raw microarray data | Raw intensities | Illumina Human Methylation 450 | idat |
Raw Microarray Data | Raw intensities | Illumina Human Methylation 27 | idat |
Raw sequencing data | |||
Simple nucleotide variation | Simple somatic mutation | ||
Structural Rearrangement |
在这个例子中,我们将访问Harmonized数据库(legacy = FALSE
),并搜索recurrent glioblastoma multiform (GBM) 和low grade gliomas (LGG) 样本的所有DNA甲基化数据。
# 导入必备的包,在后续的代码中都要先导入包,才能使用这些函数
library(TCGAbiolinks)
library(SummarizedExperiment)
# 查询信息
query <- GDCquery(project = c("TCGA-GBM", "TCGA-LGG"),
data.category = "DNA Methylation",
legacy = FALSE,
platform = c("Illumina Human Methylation 450"),
sample.type = "Recurrent Solid Tumor"
)
# 展示查询结果
datatable(getResults(query),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
data_release | data_type | updated_datetime | file_name | submitter_id | file_id | file_size | cases | id | created_datetime | md5sum | data_format | access | platform | state | version | data_category | type | experimental_strategy | project | analysis_id | analysis_updated_datetime | analysis_created_datetime | analysis_submitter_id | analysis_state | analysis_workflow_link | analysis_workflow_type | tissue.definition |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12.0 - 15.0 | Methylation Beta Value | 2018-11-30T04:41:54.596454+00:00 | jhu-usc.edu_GBM.HumanMethylation450.2.lvl-3.TCGA-06-0171-02A-11D-2004-05.gdc_hg38.txt | 5978b8ef-dc9a-4a00-9c0e-ec1772bce4cc-beta-value | 9d5e1554-95cd-4ced-9b51-19e0b42d4b31 | 141286194 | TCGA-06-0171-02A-11D-2004-05 | 9d5e1554-95cd-4ced-9b51-19e0b42d4b31 | 2016-10-27T21:58:12.297090-05:00 | 6955a67ab70c840a668b49a42d4dae71 | TXT | open | Illumina Human Methylation 450 | released | 1 | DNA Methylation | methylation_beta_value | Methylation Array | TCGA-GBM | a22168b5-32a5-48e2-b603-f26c4ad16e95 | 2018-09-06T13:49:07.196637-05:00 | 2016-10-27T21:58:12.297090-05:00 | 5978b8ef-dc9a-4a00-9c0e-ec1772bce4cc-workflow | released | https://github.com/NCI-GDC/met-liftover-tool | Liftover | Recurrent Solid Tumor |
数据在这里只显示了一部分
在这个例子中,我们将访问Harmonized数据库(legacy = FALSE
),并搜索具有DNA甲基化数据和基因表达数据的Colon Adenocarcinoma tumor (TCGA-COAD)患者
query.met <- GDCquery(project = "TCGA-COAD",
data.category = "DNA Methylation",
legacy = FALSE,
platform = c("Illumina Human Methylation 450"))
query.exp <- GDCquery(project = "TCGA-COAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ")
# Get all patients that have DNA methylation and gene expression.
common.patients <- intersect(substr(getResults(query.met, cols = "cases"), 1, 12),
substr(getResults(query.exp, cols = "cases"), 1, 12))
# Only seelct the first 5 patients
query.met <- GDCquery(project = "TCGA-COAD",
data.category = "DNA Methylation",
legacy = FALSE,
platform = c("Illumina Human Methylation 450"),
barcode = common.patients[1:5])
query.exp <- GDCquery(project = "TCGA-COAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ",
barcode = common.patients[1:5])
datatable(getResults(query.met, cols = c("data_type","cases")),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
data_type | cases |
---|---|
Methylation Beta Value | TCGA-AA-3712-01A-21D-1721-05 |
Methylation Beta Value | TCGA-AA-3712-11A-01D-1721-05 |
Methylation Beta Value | TCGA-CK-6747-01A-11D-1837-05 |
Methylation Beta Value | TCGA-AA-3502-11A-01D-1407-05 |
Methylation Beta Value | TCGA-AA-3502-01A-01D-1407-05 |
Methylation Beta Value | TCGA-D5-6536-01A-11D-1721-05 |
Methylation Beta Value | TCGA-CM-6676-01A-11D-1837-05 |
datatable(getResults(query.exp, cols = c("data_type","cases")),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
data_type | cases |
---|---|
Gene Expression Quantification | TCGA-AA-3712-11A-01R-1723-07 |
Gene Expression Quantification | TCGA-AA-3712-01A-21R-1723-07 |
Gene Expression Quantification | TCGA-CK-6747-01A-11R-1839-07 |
Gene Expression Quantification | TCGA-AA-3502-01A-01R-1410-07 |
Gene Expression Quantification | TCGA-D5-6536-01A-11R-1723-07 |
Gene Expression Quantification | TCGA-CM-6676-01A-11R-1839-07 |
这个例子显示了用户如何搜索乳腺癌(Breast Cancer)的原始测序数据(“Controlled”),并验证文件名称和与之相关的Barcodes。
query <- GDCquery(project = c("TCGA-BRCA"),
data.category = "Sequencing Reads",
sample.type = "Primary solid Tumor")
# Only first 5 to make render faster
datatable(getResults(query, rows = 1:5,cols = c("file_name","cases")),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
file_name | cases |
---|---|
TCGA-A7-A26E-01A-11R-A168-13_mirna_gdc_realn.bam | TCGA-A7-A26E-01A-11R-A168-13 |
TCGA-E2-A156-01A-11D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam | TCGA-E2-A156-01A-11D-A12B-09 |
c399b9e0c9d7b4320262377b2e901557_gdc_realn.bam | TCGA-A7-A5ZX-01A-12D-A29N-09 |
TCGA-E2-A1LS-01A-12R-A156-13_mirna_gdc_realn.bam | TCGA-E2-A1LS-01A-12R-A156-13 |
45ea8a7da00be45d52b0cc712c7c771c_gdc_realn.bam | TCGA-D8-A1JK-01A-11D-A13L-09 |
该案例显示了用户如何搜索基于Illumina Human Methylation 450和Illumina Human Methylation 27平台的glioblastoma multiform (GBM)和low grade gliomas (LGG)的DNA甲基化数据。
query <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"),
legacy = TRUE,
data.category = "DNA methylation",
platform = c("Illumina Human Methylation 450", "Illumina Human Methylation 27"))
datatable(getResults(query, rows = 1:100),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
data_release | data_type | tags | file_name | submitter_id | file_id | file_size | cases | state_comment | id | md5sum | updated_datetime | data_format | access | platform | state | version | data_category | type | experimental_strategy | project | code | center_name | center_short_name | center_center_id | center_namespace | center_center_type | tissue.definition |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Methylation beta value | meth | jhu-usc.edu_GBM.HumanMethylation450.8.lvl-3.TCGA-76-6661-01B-11D-1844-05.txt | 8f3bf221-d738-4850-aa00-ca1e0d10a7e7 | 21285109 | TCGA-76-6661-01B-11D-1844-05 | 8f3bf221-d738-4850-aa00-ca1e0d10a7e7 | b6662864029f2b2f272569128a13371f | 2017-03-05T18:38:56.646240-06:00 | TXT | open | Illumina Human Methylation 450 | live | DNA methylation | file | Methylation array | TCGA-GBM | 05 | Johns Hopkins / University of Southern California | JHU_USC | 7ef3885b-37ce-5e16-8ba3-9d75b6690008 | jhu-usc.edu | CGCC | Primary solid Tumor |
数据在这里只显示了一部分
query <- GDCquery(project = c("TCGA-LUAD"),
legacy = TRUE,
data.type = "Methylation percentage",
experimental.strategy = "Bisulfite-Seq")
# VCF - controlled data
query <- GDCquery(project = c("TCGA-LUAD"),
legacy = TRUE,
data.type = "Bisulfite sequence alignment",
experimental.strategy = "Bisulfite-Seq")
# WGBS BAM files - controlled data
query <- GDCquery(project = c("TCGA-LUAD"),
legacy = TRUE,
data.type = "Aligned reads",
data.category = "Raw sequencing data",
experimental.strategy = "Bisulfite-Seq")
该案例显示了用户如何搜索glioblastoma multiform (GBM)的基因表达数据,并且该数据已经进行基因表达量标准化。更详细的信息,请参考rnaseqV2 TCGA wiki
# Gene expression aligned against hg19.
query.exp.hg19 <- GDCquery(project = "TCGA-GBM",
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
legacy = TRUE)
datatable(getResults(query.exp.hg19),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
data_release | data_type | tags | file_name | submitter_id | file_id | file_size | cases | state_comment | id | md5sum | updated_datetime | data_format | access | platform | state | version | data_category | type | experimental_strategy | project | code | center_name | center_short_name | center_center_id | center_namespace | center_center_type | tissue.definition |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gene expression quantification | normalized,gene,v2 | unc.edu.b469eb7c-723f-4870-b4e4-ebfaae7a118b.1536566.rsem.genes.normalized_results | 217d72e9-4d6f-409d-911c-0a70b17a0adc | 437283 | TCGA-14-0736-02A-01R-2005-01 | 217d72e9-4d6f-409d-911c-0a70b17a0adc | beda9f89f08fc6a892a72e8b704fdbd9 | 2017-03-05T11:34:30.601697-06:00 | TXT | open | Illumina HiSeq | live | Gene expression | file | RNA-Seq | TCGA-GBM | 07 | University of North Carolina | UNC | ee7a85b3-8177-5d60-a10c-51180eb9009c | unc.edu | CGCC | Recurrent Solid Tumor | ||||
Gene expression quantification | normalized,gene,v2 | unc.edu.152afe8c-f67c-4d7c-93ac-e1b7edd56c54.1544649.rsem.genes.normalized_results | 973ce0ac-f613-4b99-b2ab-3e2d5548f05f | 436272 | TCGA-06-0211-02A-02R-2005-01 | 973ce0ac-f613-4b99-b2ab-3e2d5548f05f | 84478e78d95e1155019ccb7e0e0fea2f | 2017-03-05T18:20:31.987895-06:00 | TXT | open | Illumina HiSeq | live | Gene expression | file | RNA-Seq | TCGA-GBM | 07 | University of North Carolina | UNC | ee7a85b3-8177-5d60-a10c-51180eb9009c | unc.edu | CGCC | Recurrent Solid Tumor |
如果要从查询对象获取文件清单, 可以使用函数getManifest
。如果将函数中的save
参数设置为 TRUE
, 则将创建一个 txt 文件。同时该文件可以用于GDC-client Data transfer tool (DTT) 及其对应的GUI版本ddt-ui。
getManifest(query.exp.hg19,save = FALSE)
## id
## 40 217d72e9-4d6f-409d-911c-0a70b17a0adc
## 97 973ce0ac-f613-4b99-b2ab-3e2d5548f05f
## filename
## 40 unc.edu.b469eb7c-723f-4870-b4e4-ebfaae7a118b.1536566.rsem.genes.normalized_results
## 97 unc.edu.152afe8c-f67c-4d7c-93ac-e1b7edd56c54.1544649.rsem.genes.normalized_results
## md5 size state
## 40 beda9f89f08fc6a892a72e8b704fdbd9 437283 live
## 97 84478e78d95e1155019ccb7e0e0fea2f 436272 live
目前,ATAC-seq数据可以从GDC publication page获得,具体的列表如下:
datatable(getResults(TCGAbiolinks:::GDCquery_ATAC_seq())[,c("file_name","file_size")],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
file_name | file_size |
---|---|
TCGA-ATAC_PanCancer_PeakSet.txt | 37522221 |
TCGA-ATAC_DataS1_DonorsAndStats_v4.xlsx | 251795 |
MESO_bigWigs.tgz | 1507969705 |
TCGA-ATAC_DataS5_GWAS_v2.xlsx | 999661 |
COAD_bigWigs.tgz | 8939070313 |
我们还可以使用函数GDCquery_ATAC_seq
过滤文件清单,并使用函数GDCdownload
过滤后的数据保存到本地。
query <- TCGAbiolinks:::GDCquery_ATAC_seq(file.type = "rds")
GDCdownload(query,method = "client")
query <- TCGAbiolinks:::GDCquery_ATAC_seq(file.type = "bigWigs")
GDCdownload(query,method = "client")
检索每个data_category + data_type + experimental_strategy + platform下的文件数。几乎像https://portal.gdc.cancer.gov/exploration
tab <- getSampleFilesSummary("TCGA-ACC")
datatable(tab,
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
本文由 石九流 创作,如果您觉得本文不错,请随意赞赏
采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
原文链接:https://blog.computsystmed.com/archives/translation-tcgabiolinks-searching-gdc-database
最后更新:2019-05-25 17:07:26
Update your browser to view this website correctly. Update my browser now