翻译 - TCGAbiolinks 5 - Mutation data

TCGAbiolinks:搜索,下载和可视化突变数据

发表时间:2019年3月20日

在运行程序前,我们需要导入必要的包:

library(TCGAbiolinks)

library(SummarizedExperiment)

1. 搜索和下载

TCGAbiolinks提供了一些从GDC下载突变数据(mutation data)的功能。下载数据有两种选择:

  1. GDCquery_Maf:下载与hg38基因组对齐的MAF
  2. GDCqueryGDCdownloadGDCpreprare:下载与hg19基因组对齐的MAF

1.1. 突变数据(hg38)

这个例子将下载的MAF(突变注释文件)用于variant calling pipeline muse(变异的探索流程muse)。变异的探索流程的选项有:musevarscan2somaticsnipermutect。有关更多信息,请访问GDC文档

maf <- GDCquery_Maf("CHOL", pipelines = "muse")

# Only first 50 to make render faster
datatable(maf[1:2,],
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)
Hugo_SymbolEntrez_Gene_IdCenterNCBI_BuildChromosomeStart_PositionEnd_PositionStrandVariant_ClassificationVariant_TypeReference_AlleleTumor_Seq_Allele1Tumor_Seq_Allele2dbSNP_RSdbSNP_Val_StatusTumor_Sample_BarcodeMatched_Norm_Sample_BarcodeMatch_Norm_Seq_Allele1Match_Norm_Seq_Allele2Tumor_Validation_Allele1Tumor_Validation_Allele2Match_Norm_Validation_Allele1Match_Norm_Validation_Allele2Verification_StatusValidation_StatusMutation_StatusSequencing_PhaseSequence_SourceValidation_MethodScoreBAM_FileSequencerTumor_Sample_UUIDMatched_Norm_Sample_UUIDHGVScHGVSpHGVSp_ShortTranscript_IDExon_Numbert_deptht_ref_countt_alt_countn_depthn_ref_countn_alt_countall_effectsAlleleGeneFeatureFeature_typeOne_ConsequenceConsequencecDNA_positionCDS_positionProtein_positionAmino_acidsCodonsExisting_variationALLELE_NUMDISTANCETRANSCRIPT_STRANDSYMBOLSYMBOL_SOURCEHGNC_IDBIOTYPECANONICALCCDSENSPSWISSPROTTREMBLUNIPARCRefSeqSIFTPolyPhenEXONINTRONDOMAINSGMAFAFR_MAFAMR_MAFASN_MAFEAS_MAFEUR_MAFSAS_MAFAA_MAFEA_MAFCLIN_SIGSOMATICPUBMEDMOTIF_NAMEMOTIF_POSHIGH_INF_POSMOTIF_SCORE_CHANGEIMPACTPICKVARIANT_CLASSTSLHGVS_OFFSETPHENOMINIMISEDExAC_AFExAC_AF_AdjExAC_AF_AFRExAC_AF_AMRExAC_AF_EASExAC_AF_FINExAC_AF_NFEExAC_AF_OTHExAC_AF_SASGENE_PHENOFILTERCONTEXTsrc_vcf_idtumor_bam_uuidnormal_bam_uuidcase_idGDC_FILTERCOSMICMC3_OverlapGDC_Validation_Status
FMN256776WUGSCGRCh38chr1240211162240211162+Nonsense_MutationSNPTTATCGA-4G-AAZT-01A-11D-A417-09TCGA-4G-AAZT-10A-01D-A41A-09SomaticIllumina HiSeq 200024c3dc90-d1f2-4256-9909-0d0c939c178fe75e6102-170d-49e4-89e6-7687cad1f6b6c.3992T>Ap.Leu1331Terp.L1331*ENST000003196536/1858342426FMN2,stop_gained,p.L1331*,ENST00000319653,NM_020066.4&NM_001305424.1;FMN2,downstream_gene_variant,,ENST00000447095,AENSG00000155816ENST00000319653Transcriptstop_gainedstop_gained4222/64343992/51691331/1722L/*tTa/tAa11FMN2HGNCHGNC:14074protein_codingYESCCDS31069.2ENSP00000318884Q9NZ56NM_020066.4;NM_001305424.16/18Pfam_domain:PF02181;SMART_domains:SM00498;Superfamily_domains:SSF101447HIGH1SNV51PASSGGAATTATTTT263c128d-cf0a-4a8b-bafa-fa84d9baeb2c5a30d2bd-9cab-44ef-9071-0b34d386a9c02c662e9d-0c78-4ec4-bc57-3f9573ffc678b10c64c2-7fd2-4210-b975-034affb14b57COSM4571189TrueUnknown
PAX35077WUGSCGRCh38chr2222297158222297158+Missense_MutationSNPGGTnovelTCGA-4G-AAZT-01A-11D-A417-09TCGA-4G-AAZT-10A-01D-A41A-09SomaticIllumina HiSeq 200024c3dc90-d1f2-4256-9909-0d0c939c178fe75e6102-170d-49e4-89e6-7687cad1f6b6c.141C>Ap.Asn47Lysp.N47KENST000003505262/852193229PAX3,missense_variant,p.N47K,ENST00000350526,NM_181457.3;PAX3,missense_variant,p.N47K,ENST00000392069,NM_181459.3;PAX3,missense_variant,p.N47K,ENST00000344493,NM_181461.3;PAX3,missense_variant,p.N47K,ENST00000392070,NM_181458.3;PAX3,missense_variant,p.N47K,ENST00000336840,NM_181460.3;PAX3,missense_variant,p.N47K,ENST00000409551,NM_001127366.2;PAX3,missense_variant,p.N47K,ENST00000409828,NM_000438.5;PAX3,missense_variant,p.N47K,ENST00000258387,NM_013942.4;CCDC140,upstream_gene_variant,,ENST00000295226,NM_153038.1TENSG00000135903ENST00000350526Transcriptmissense_variantmissense_variant278/3610141/144047/479N/KaaC/aaA1-1PAX3HGNCHGNC:8617protein_codingCCDS42826.1ENSP00000343052P23760A0A024R470UPI0000131369NM_181457.3deleterious(0)possibly_damaging(0.813)2/8Pfam_domain:PF00292;Prints_domain:PR00027;SMART_domains:SM00351;PROSITE_profiles:PS51057;Superfamily_domains:SSF46689MODERATESNV51PASSCTGCCGTTGAT263c128d-cf0a-4a8b-bafa-fa84d9baeb2c5a30d2bd-9cab-44ef-9071-0b34d386a9c02c662e9d-0c78-4ec4-bc57-3f9573ffc678b10c64c2-7fd2-4210-b975-034affb14b57TrueUnknown

1.2. 突变数据(hg19)

这个例子将下载与hg19(旧TCGA maf文件)对齐的MAF(突变注释文件)

query.maf.hg19 <- GDCquery(project = "TCGA-CHOL", 
                           data.category = "Simple nucleotide variation", 
                           data.type = "Simple somatic mutation",
                           access = "open", 
                           legacy = TRUE)
# Check maf availables
datatable(dplyr::select(getResults(query.maf.hg19),-contains("cases")),
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 10), 
          rownames = FALSE)
data_releasedata_typetagsfile_namesubmitter_idfile_idfile_sizestate_commentidmd5sumupdated_datetimedata_formataccessplatformstateversiondata_categorytypeexperimental_strategycreated_datetimeprojectcodecenter_namecenter_short_namecenter_center_idcenter_namespacecenter_center_typetissue.definition
Simple somatic mutationsnv,somatichgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.mafa8532d87-1eae-4289-8aea-3255d7b313cf2482745a8532d87-1eae-4289-8aea-3255d7b313cf8db4269d8aba6d8d397e2761e24e8e6e2017-03-05T09:28:11.866514-06:00MAFopenMixed platformsliveSimple nucleotide variationfileDNA-SeqTCGA-CHOL10Baylor College of MedicineBCMd3b8c887-498b-5490-903e-760403c68307hgsc.bcm.eduGSCPrimary solid Tumor
Simple somatic mutationsnv,somaticbcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf0d2e60c5-dd32-4a19-b600-7e76496f4f9410121180d2e60c5-dd32-4a19-b600-7e76496f4f9447268aa46006c53013466f740a3e14622017-03-05T10:25:29.699247-06:00MAFopenIllumina HiSeqliveSimple nucleotide variationfileDNA-SeqTCGA-CHOL34Canada's Michael Smith Genome Sciences CentreBCGSC380301b3-6f8d-581d-a81f-f4dd462df12bbcgsc.caGSCPrimary solid Tumor
Simple somatic mutationsomatic,snvucsc.edu_CHOL.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic.mafe45ec3d9-adcc-43db-a71c-9edaf7d11c86785408e45ec3d9-adcc-43db-a71c-9edaf7d11c86b44be3f2e6a994be766cc881a3143b2b2017-03-05T18:49:13.665667-06:00MAFopenIllumina GAliveSimple nucleotide variationfileDNA-SeqTCGA-CHOL25University of California, Santa CruzUCSC79cc1498-5d7f-5eae-b631-e74b78c13581ucsc.eduGSCPrimary solid Tumor
Simple somatic mutationsnv,somatichgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.maf448661d5-a89a-480e-adfd-1cce8eb74e702052077448661d5-a89a-480e-adfd-1cce8eb74e70ee6d4a3810593268b8038dfb13999ddd2017-03-05T00:22:24.678827-06:00MAFopenIllumina GAliveSimple nucleotide variationfileDNA-SeqTCGA-CHOL10Baylor College of MedicineBCMd3b8c887-498b-5490-903e-760403c68307hgsc.bcm.eduGSCPrimary solid Tumor
Simple somatic mutationsnv,somaticgsc_CHOL_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf2d9ed46f-36a5-4f87-9304-74ce626ae96d42741492d9ed46f-36a5-4f87-9304-74ce626ae96d9288f4c155d47f4cc090eee3312e09c22017-03-05T12:45:35.461959-06:00MAFopenIllumina GAliveSimple nucleotide variationfileDNA-Seq2016-06-13T17:02:09.527369-05:00TCGA-CHOL08Broad Institute of MIT and HarvardBI61d634b8-e8dd-58bf-9a65-1233dc7c8c6abroad.mit.eduGSCPrimary solid Tumor
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL", 
                           data.category = "Simple nucleotide variation", 
                           data.type = "Simple somatic mutation",
                           access = "open", 
                           file.type = "bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf",
                           legacy = TRUE)
GDCdownload(query.maf.hg19)
maf <- GDCprepare(query.maf.hg19)
# Only first 50 to make render faster
datatable(maf[1:2,],
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)
Hugo_SymbolEntrez_Gene_IdCenterNCBI_BuildChromosomeStart_PositionEnd_PositionStrandVariant_ClassificationVariant_TypeReference_AlleleTumor_Seq_Allele1Tumor_Seq_Allele2dbSNP_RSdbSNP_Val_StatusTumor_Sample_BarcodeMatched_Norm_Sample_BarcodeMatch_Norm_Seq_Allele1Match_Norm_Seq_Allele2Tumor_Validation_Allele1Tumor_Validation_Allele2Match_Norm_Validation_Allele1Match_Norm_Validation_Allele2Verification_StatusValidation_StatusMutation_StatusSequencing_PhaseSequence_SourceValidation_MethodScoreBAM_FileSequencerTumor_Sample_UUIDMatched_Norm_Sample_UUIDHGVScHGVSpHGVSp_ShortTranscript_IDExon_Numbert_deptht_ref_countt_alt_countn_depthn_ref_countn_alt_countall_effectsAlleleGeneFeatureFeature_typeOne_ConsequenceConsequencecDNA_positionCDS_positionProtein_positionAmino_acidsCodonsExisting_variationALLELE_NUMDISTANCETRANSCRIPT_STRANDSYMBOLSYMBOL_SOURCEHGNC_IDBIOTYPECANONICALCCDSENSPSWISSPROTTREMBLUNIPARCRefSeqSIFTPolyPhenEXONINTRONDOMAINSGMAFAFR_MAFAMR_MAFASN_MAFEAS_MAFEUR_MAFSAS_MAFAA_MAFEA_MAFCLIN_SIGSOMATICPUBMEDMOTIF_NAMEMOTIF_POSHIGH_INF_POSMOTIF_SCORE_CHANGEIMPACTPICKVARIANT_CLASSTSLHGVS_OFFSETPHENOMINIMISEDExAC_AFExAC_AF_AdjExAC_AF_AFRExAC_AF_AMRExAC_AF_EASExAC_AF_FINExAC_AF_NFEExAC_AF_OTHExAC_AF_SASGENE_PHENOFILTERCONTEXTsrc_vcf_idtumor_bam_uuidnormal_bam_uuidcase_idGDC_FILTERCOSMICMC3_OverlapGDC_Validation_Status
FMN256776WUGSCGRCh38chr1240211162240211162+Nonsense_MutationSNPTTATCGA-4G-AAZT-01A-11D-A417-09TCGA-4G-AAZT-10A-01D-A41A-09SomaticIllumina HiSeq 200024c3dc90-d1f2-4256-9909-0d0c939c178fe75e6102-170d-49e4-89e6-7687cad1f6b6c.3992T>Ap.Leu1331Terp.L1331*ENST000003196536/1858342426FMN2,stop_gained,p.L1331*,ENST00000319653,NM_020066.4&NM_001305424.1;FMN2,downstream_gene_variant,,ENST00000447095,AENSG00000155816ENST00000319653Transcriptstop_gainedstop_gained4222/64343992/51691331/1722L/*tTa/tAa11FMN2HGNCHGNC:14074protein_codingYESCCDS31069.2ENSP00000318884Q9NZ56NM_020066.4;NM_001305424.16/18Pfam_domain:PF02181;SMART_domains:SM00498;Superfamily_domains:SSF101447HIGH1SNV51PASSGGAATTATTTT263c128d-cf0a-4a8b-bafa-fa84d9baeb2c5a30d2bd-9cab-44ef-9071-0b34d386a9c02c662e9d-0c78-4ec4-bc57-3f9573ffc678b10c64c2-7fd2-4210-b975-034affb14b57COSM4571189TrueUnknown
PAX35077WUGSCGRCh38chr2222297158222297158+Missense_MutationSNPGGTnovelTCGA-4G-AAZT-01A-11D-A417-09TCGA-4G-AAZT-10A-01D-A41A-09SomaticIllumina HiSeq 200024c3dc90-d1f2-4256-9909-0d0c939c178fe75e6102-170d-49e4-89e6-7687cad1f6b6c.141C>Ap.Asn47Lysp.N47KENST000003505262/852193229PAX3,missense_variant,p.N47K,ENST00000350526,NM_181457.3;PAX3,missense_variant,p.N47K,ENST00000392069,NM_181459.3;PAX3,missense_variant,p.N47K,ENST00000344493,NM_181461.3;PAX3,missense_variant,p.N47K,ENST00000392070,NM_181458.3;PAX3,missense_variant,p.N47K,ENST00000336840,NM_181460.3;PAX3,missense_variant,p.N47K,ENST00000409551,NM_001127366.2;PAX3,missense_variant,p.N47K,ENST00000409828,NM_000438.5;PAX3,missense_variant,p.N47K,ENST00000258387,NM_013942.4;CCDC140,upstream_gene_variant,,ENST00000295226,NM_153038.1TENSG00000135903ENST00000350526Transcriptmissense_variantmissense_variant278/3610141/144047/479N/KaaC/aaA1-1PAX3HGNCHGNC:8617protein_codingCCDS42826.1ENSP00000343052P23760A0A024R470UPI0000131369NM_181457.3deleterious(0)possibly_damaging(0.813)2/8Pfam_domain:PF00292;Prints_domain:PR00027;SMART_domains:SM00351;PROSITE_profiles:PS51057;Superfamily_domains:SSF46689MODERATESNV51PASSCTGCCGTTGAT263c128d-cf0a-4a8b-bafa-fa84d9baeb2c5a30d2bd-9cab-44ef-9071-0b34d386a9c02c662e9d-0c78-4ec4-bc57-3f9573ffc678b10c64c2-7fd2-4210-b975-034affb14b57TrueUnknown

2. 可视化数据

要显示数据,您可以使用Bioconductor包maftools。有关详细信息,请查看其插图

library(maftools)
library(dplyr)
maf <- GDCquery_Maf("CHOL", pipelines = "muse") %>% read.maf

datatable(getSampleSummary(maf),
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)

oncoplot(maf = maf, top = 10, removeNonMutated = TRUE)
titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = titv)

3. 参考

更新时间:2019-05-25 17:37:29

本文由 AlphaJP 创作,如果您觉得本文不错,请随意赞赏
采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
原文链接:https://blog.computsystmed.com/archives/translation-tcgabiolinks-mutation-data
最后更新:2019-05-25 17:37:29

评论

Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×