翻译 - TCGAbiolinks 8 - Case Studies

TCGAbiolinks:实例探究

在运行程序前,我们需要导入必要的包:

library(TCGAbiolinks)

library(SummarizedExperiment)

1. 介绍

在这里,我们展示了TCGAbiolinks的完整分析流程。分别在四个案例中展示相应的代码,这些案例是:

  1. Expression pipeline (BRCA)
  2. Expression pipeline (GBM)
  3. Integration of DNA methylation and RNA expression pipeline (COAD)
  4. ELMER pipeline (KIRC)

2. 案例研究 1:泛癌下游分析BRCA

在这个案例,我们对乳腺浸润癌(TCGA_BRCA)开展了分析,分析的内容包括:

  1. 计算差异表达基因;
  2. 差异基因的富集分析;
  3. 差异基因的生存分析;
  4. 差异基因的PPI网络分析

2.1. 数据下载及预处理

library(SummarizedExperiment)
library(TCGAbiolinks)
query.exp <- GDCquery(project = "TCGA-BRCA",
                      legacy = TRUE,
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq",
                      file.type = "results",
                      experimental.strategy = "RNA-Seq",
                      sample.type = c("Primary solid Tumor","Solid Tissue Normal"))
GDCdownload(query.exp)
brca.exp <- GDCprepare(query = query.exp, save = TRUE, save.filename = "brcaExp.rda")

# get subtype information
dataSubt <- TCGAquery_subtype(tumor = "BRCA")

# get clinical data
dataClin <- GDCquery_clinic(project = "TCGA-BRCA","clinical") 

# Which samples are primary solid tumor
dataSmTP <- TCGAquery_SampleTypes(getResults(query.exp,cols="cases"),"TP") 
# which samples are solid tissue normal
dataSmNT <- TCGAquery_SampleTypes(getResults(query.exp,cols="cases"),"NT")

2.2. 基因差异表达分析

使用TCGAnalyze_DEA,我们比较了114个正常样本和1097个BRCA样品,并鉴定了3,390个差异表达基因(DEGs, log fold change >=1 and FDR < 1%)。

dataPrep <- TCGAanalyze_Preprocessing(object = brca.exp, cor.cut = 0.6)                      

dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep,
                                      geneInfo = geneInfo,
                                      method = "gcContent")                

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
                                  method = "quantile", 
                                  qnt.cut =  0.25)   

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
                            mat2 = dataFilt[,dataSmTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")  

2.3. 差异基因富集分析

为了理解这些差异表达基因的潜在的生物过程,我们使用TCGAnalyze_EA_complete函数进行了富集分析。

ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",
                                RegulonList = rownames(dataDEGs))  

TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
                        GOBPTab = ansEA$ResBP,
                        GOCCTab = ansEA$ResCC,
                        GOMFTab = ansEA$ResMF,
                        PathTab = ansEA$ResPat,
                        nRGTab = rownames(dataDEGs),
                        nBar = 20)

最终TCGAbiolinks输出bar chart,其中包含GO分析的GO:biological processGO:cellular componentGO:molecular functionPathway的结果

上面代码得出的结果如下所示:

2.4. 差异基因生存分析

生存分析有两种,分别是单因素生存分析和多因素生存分析。下面我们分别介绍两种分析方法。

2.4.1. 单因素生存分析

我们使用Kaplan-Meier analysis来计算每一个基因的survival univariate curves,并使用TCGAanalyze_SurvivalKM函数,通过log-Ratio test来评估相应的统计显著性(statistical significance);最终,我们从3,390个差异表达基因(DEGs)中获得555个对生存有影响的基因(p.value <0.05)

group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))

dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin,
                                   dataGE = dataFilt,
                                   Genelist = rownames(dataDEGs),
                                   Survresult = FALSE,
                                   ThreshTop = 0.67,
                                   ThreshDown = 0.33,
                                   p.cut = 0.05, group1, group2)

2.4.2. 多因素生存分析

我们使用Cox-regression analysis来计算survival multivariate curves,并使用TCGAnalyze_SurvivalCoxNET函数计算cox p-value值来评估统计显着性。Survival multivariate analysis发现了160个具有显著意义的基因(cox p-value FDR 5.00e-02)。

2.5. 差异基因PPI网络

在差异表达基因中(DEGs)中,通过univariate and multivariate analyses,我们发现了与生存相关的差异表达基因。下面,我们基于这个信息,进行网络分析。

通过使用STRING.,org.Hs.string version 10 (Human functional protein association network),我们构建出了PPI互作网络。并通过dnet包从PPI互作网络中提取出multivariate survival genes相关的网络,这个网络的的节点之间具备strong interaction (threshold = 700)。最终这个网络包含24个节点(nodes)和31条边(edges)。

require(dnet)  # to change
org.Hs.string <- dRDataLoader(RData = "org.Hs.string")

TabCoxNet <- TCGAvisualize_SurvivalCoxNET(dataClin,
                                          dataFilt, 
                                          Genelist = rownames(dataSurv),
                                          scoreConfidence = 700,
                                          org.Hs.string = org.Hs.string,
                                          titlePlot = "Case Study n.1 dnet")

上面代码获得的图如下所示:

3. 案例研究 2:泛癌下游分析LGG

在这个案例中,我们专注于脑低级别胶质瘤(TCGA-LGG)样品的分析。特别是,我们使用TCGAbiolinks下载293个具有分子亚型的样本。在这一节中,我们会以下几个分析:

  1. 处理样本的表达数据
  2. 对样本进行分类
  3. 不同类别的样本对肿瘤生存的影响
  4. 不同类别的样本的基因表达情况
  5. 不同类别的样本中基因突变情况

3.1. 基因表达数据下载

library(TCGAbiolinks)
library(SummarizedExperiment)

query.exp <- GDCquery(project = "TCGA-LGG", 
                      legacy = TRUE,
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type = "results",
                      experimental.strategy = "RNA-Seq",
                      sample.type = "Primary solid Tumor")
GDCdownload(query.exp)
lgg.exp <- GDCprepare(query = query.exp, save = TRUE, save.filename = "lggExp.rda")

3.2. 样品聚类分析 - 基于基因表达数据

在这里,我们将排除数据异常的样本、样本表达数据标准化、基因过滤,最终进行样本聚类,将LGG样本进行分类

  1. 异常样本数据排除

    首先,我们使用TCGAanalyze_Preprocessing函数执行一次Array Array Intensity correlation AAIC来搜索可能的异常值。我们使用表达数据中包含了分子亚型信息的全部样本,然后过滤掉没有分子信息的样本,并且仅使用IDHmut-codel (n=85)、IDHmut-non-codel (n=141)、IDHwt (n=56) 和 NA (11)来定义定义所有样本中皮尔逊相关的方形对称矩阵(n = 293)。根据这个矩阵,我们发现没有样本可以被识别为可能的异常值(low correlation (cor.cut = 0.6))。因此我们继续使用70个样本进行分析。

  2. 样本表达数据标准化

    其次,我们使用TCGAanalyze_Normalization函数借助EDASeq包对mRNA transcriptsmiRNA进行标准化:1)该函数使用Within-lane normalization procedures来调整read counts中的GC-content effect(或者其他gene-level effects):loess robust local regressionglobal-scalingfull-quantile normalization (Risso et al. 2011) ;2)使用between-lane normalization procedures来调整distributional differences between lanes(比如sequencing depth):global-scalingfull-quantile normalization (Bullard et al. 2010).

  3. 基因过滤

    第三,我们使用TCGAanalyze_Filtering函数,通过3个过滤器去除样品中低信号(low signal)的features / mRNAs,第一次过滤后获得4578个mRNA,第二次过滤后进一步获得4284个mRNA,第三次过滤后最终获得1187个mRNA。

  4. 样本聚类

    然后,我们在上述三次过滤后获得的1187个mRNA的基础上,应用了两个层次聚类(Hierarchical Cluster):1)第一个聚类使用ward.D2;2)第二个聚类使用ConsensusClusterPlus

在两次聚类分析之后,使用cut.tree = 4,我们获得n = 4个表达簇(EC)。

library(dplyr)

dataPrep <- TCGAanalyze_Preprocessing(object = lgg.exp, cor.cut = 0.6)
dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep,
                                      geneInfo = geneInfo,
                                      method = "gcContent")

datFilt <- dataNorm %>% TCGAanalyze_Filtering(method = "varFilter") %>%
    TCGAanalyze_Filtering(method = "filter1") %>%  TCGAanalyze_Filtering(method = "filter2",foldChange = 0.2)

data_Hc2 <- TCGAanalyze_Clustering(tabDF = datFilt,
                                   method = "consensus",
                                   methodHC = "ward.D2") 
# Add  cluster information to Summarized Experiment
colData(lgg.exp)$groupsHC <- paste0("EC",data_Hc2[[4]]$consensusClass)

3.3. 四类样本的生存分析

接下来的步骤是可视化数据。首先,我们绘制四个表达簇的生存曲线:

TCGAanalyze_survival(data = colData(lgg.exp),
                     clusterCol = "groupsHC",
                     main = "TCGA kaplan meier survival plot from consensus cluster",
                     legend = "RNA Group",height = 10,
                     risk.table = T,conf.int = F,
                     color = c("black","red","blue","green3"),
                     filename = "survival_lgg_expression_subtypes.png")

结果如下所示:

3.4. 四类样本的基因表达热图

同时,我们还绘制基因表达的热图:

TCGAvisualize_Heatmap(t(datFilt),
                      col.metadata =  colData(lgg.exp)[,c("barcode",
                                                          "groupsHC",
                                                          "subtype_Histology",
                                                          "subtype_IDH.codel.subtype")],
                      col.colors =  list(
                          groupsHC = c("EC1"="black",
                                       "EC2"="red",
                                       "EC3"="blue",
                                       "EC4"="green3")),
                      sortCol = "groupsHC",
                      type = "expression", # sets default color
                      scale = "row", # use z-scores for better visualization. Center gene expression level around 0.
                      title = "Heatmap from concensus cluster", 
                      filename = "case2_Heatmap.png",
                      cluster_rows = TRUE,
                      color.levels = colorRampPalette(c("green", "black", "red"))(n = 11),
                      extrems =seq(-5,5,1),
                      cluster_columns = FALSE,
                      width = 1000,
                      height = 1000)

结果如下所示:

3.5. 基因突变分析

最后,我们将看一下突变基因情况。我们首先通过GDCquery_Maf函数下载MAF文件。在这个例子中,我们将研究基因“ATRX”的突变情况。

LGGmut <- GDCquery_Maf(tumor = "LGG", pipelines = "muse")
# Selecting gene
mRNAsel <- "ATRX"
LGGselected <- LGGmut[LGGmut$Hugo_Symbol == mRNAsel,]

dataMut <- LGGselected[!duplicated(LGGselected$Tumor_Sample_Barcode),]
dataMut$Tumor_Sample_Barcode <- substr(dataMut$Tumor_Sample_Barcode,1,12)

# Adding the Expression Cluster classification found before
dataMut <- merge(dataMut, cluster, by.y="patient", by.x="Tumor_Sample_Barcode")
dataMut <- dataMut[dataMut$Variant_Classification!=0,]

4. 案例研究 3:ACC的甲基化和表达的整合

近年来的研究,已经描述了DNA甲基化与基因表达之间的关系,并且这种关系的研究通常很难实现。

本案例研究将展示研究DNA甲基化与基因表达之间关系的步骤。

4.1. DNA甲基化数据和基因表达数据下载

首先,我们下载ACC DNA甲基化数据(HumanMethylation450k platforms),以及ACC RNA表达数据(Illumina HiSeq platform)。

TCGAbiolinks将默认添加了研究人员已经发布的亚型分类。我们将使用这些分类作为示例。因此,选择CIMP-lowCIMP-high组进行RNA表达和DNA甲基化比较。

library(TCGAbiolinks)
library(SummarizedExperiment)
dir.create("case3")
setwd("case3")
#-----------------------------------
# STEP 1: Search, download, prepare |
#-----------------------------------
# 1.1 - DNA methylation
# ----------------------------------
query.met <- GDCquery(project = "TCGA-ACC", 
                      legacy = TRUE,
                      data.category = "DNA methylation",
                      platform = "Illumina Human Methylation 450")
GDCdownload(query.met)

acc.met <- GDCprepare(query = query.met,
                      save = TRUE, 
                      save.filename = "accDNAmet.rda",
                      summarizedExperiment = TRUE)

#-----------------------------------
# 1.2 - RNA expression
# ----------------------------------
query.exp <- GDCquery(project = "TCGA-ACC", 
                      legacy = TRUE,
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type = "results")
GDCdownload(query.exp)
acc.exp <- GDCprepare(query = query.exp, save = TRUE, save.filename = "accExp.rda")

4.2. DNA甲基化差异分析

对于DNA甲基化,我们进行DMR (different methylated region) 分析,获得探针在不同组的DNA甲基化差异及其显着性值。输出的结果可以在火山图中看到。

注意:该函数会运行wilcoxon test,因此根据样品的数量,函数的运行可能非常慢,需要数小时到数天。

# na.omit
acc.met <- subset(acc.met,subset = (rowSums(is.na(assay(acc.met))) == 0))

# Volcano plot
acc.met <- TCGAanalyze_DMR(acc.met, groupCol = "subtype_MethyLevel",
                           group1 = "CIMP-high",
                           group2="CIMP-low",
                           p.cut = 10^-5,
                           diffmean.cut = 0.25,
                           legend = "State",
                           plot.filename = "CIMP-highvsCIMP-low_metvolcano.png")

上面代码得到的火山图如下所示:

4.3. 基因表达差异分析

对于表达分析,我们进行差异表达分析(DEA, differential expression analysis),其将给出基因表达的fold change及其显着性值。

#-------------------------------------------------
# 2.3 - DEA - Expression analysis - volcano plot
# ------------------------------------------------
acc.exp.aux <- subset(acc.exp, 
                      select = colData(acc.exp)$subtype_MethyLevel %in% c("CIMP-high","CIMP-low"))

idx <- colData(acc.exp.aux)$subtype_MethyLevel %in% c("CIMP-high")
idx2 <- colData(acc.exp.aux)$subtype_MethyLevel %in% c("CIMP-low")

dataPrep <- TCGAanalyze_Preprocessing(object = acc.exp.aux, cor.cut = 0.6)

dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep,
                                      geneInfo = geneInfo,
                                      method = "gcContent")

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
                                  qnt.cut = 0.25,
                                  method='quantile')

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,idx],
                            mat2 = dataFilt[,idx2],
                            Cond1type = "CIMP-high",
                            Cond2type = "CIMP-low",
                            method = "glmLRT")

TCGAVisualize_volcano(x = dataDEGs$logFC,
                      y = dataDEGs$FDR,
                      filename = "Case3_volcanoexp.png",
                      x.cut = 3,
                      y.cut = 10^-5,
                      names = rownames(dataDEGs),
                      color = c("black","red","darkgreen"),
                      names.size = 2,
                      xlab = " Gene expression fold change (Log2)",
                      legend = "State",
                      title = "Volcano plot (CIMP-high vs CIMP-low)",
                      width = 10)

上面代码得到的图如下所示:

4.4. DNA甲基化和基因表达关联

最后,这个和前面两个分析结果,我们绘制一个starburst来选择具有重要生物学意义的基因。

观察:随着时间的推移,LGG样本的数量增加,临床数据也进行了更新。这个案例中,我们仅使用了示例中具有分类的样本。

#------------------------------------------
# 2.4 - Starburst plot
# -----------------------------------------
# If true the argument names of the geenes in circle 
# (biologically significant genes, has a change in gene
# expression and DNA methylation and respects all the thresholds)
# will be shown
# these genes are returned by the function see starburst object after the function is executed
starburst <- TCGAvisualize_starburst(met = acc.met, 
                                     exp = dataDEGs,
                                     genome = "hg19"
                                     group1 = "CIMP-high",
                                     group2 = "CIMP-low",
                                     filename = "starburst.png",
                                     met.platform = "450K",
                                     met.p.cut = 10^-5,
                                     exp.p.cut = 10^-5,
                                     diffmean.cut = 0.25,
                                     logFC.cut = 3,
                                     names = FALSE, 
                                     height=10,
                                     width=15,
                                     dpi=300)

上面代码得到的starburst图如下所示:

5. 案例研究 4:Elmer管道 - KIRC

ELMER是进行DNA甲基化和RNA表达分析的一个实例包(L. Yao et al. 2015,Chedraoui Silva et al. (2017),Yao, Berman, and Farnham (2015))。它旨在结合人类组织的DNA甲基化和基因表达数据,推断出多层次顺式调控网络(multi-level cis-regulatory networks)。ELMER使用DNA甲基化来识别远端探针(distal probes),并将它们与附近基因的表达相关联,以识别一个或多个转录靶标(transcriptional targets)。将这些反相关远端探针(anti-correlated distal probes)进行转录因子结合位点分析(Transcription factor (TF) binding site analysis),并欧联所有转录因子(TF)的表达分析,以此推断上游调节因子(upstream regulators)。该包可以很容易地应用于TCGA的癌症数据集,用于分析DNA甲基化和基因表达数据集。

ELMER分析包括以下步骤:

  1. 将数据转变为MultiAssayExperiment对象
  2. 当比较两组样本的时候,鉴定具DNA甲基化显著差异的远端探针(distal probes)。
  3. 使用DNA甲基化与基因表达相关性(methylation vs. expression correlation),鉴定DNA甲基化显著差异的远端探针的假定靶基因(putative target genes)
  4. 从每一个significant probe-gene队中,鉴定在对应探针上的富集的基元(motifs)
  5. 确定主要调控转录因子(TF),其表达与DNA甲基化相关,而该DNA甲基化在多个调控区域发生变化。

下面,我们将通过整合TCGAbiolinksELMER进行肾脏透明肾细胞癌(KIRC)的研究。

ELMER包

有关更多信息,请参阅ELMER包:

以及下面几篇文章:

5.1. DNA甲基化数据下载处理

首先,我们从搜索并下载KIRC的DNA甲基化数据(HumanMethylation450),并将下载数据转化为SummarizedExperiment对象:

library(TCGAbiolinks)
library(SummarizedExperiment)
library(ELMER)
library(parallel)
dir.create("case4")
setwd("case4")
#-----------------------------------
# STEP 1: Search, download, prepare |
#-----------------------------------
# 1.1 - DNA methylation
# ----------------------------------
query.met <- GDCquery(project = "TCGA-KIRC", 
                      data.category = "DNA Methylation",
                      platform = "Illumina Human Methylation 450")
GDCdownload(query.met)
kirc.met <- GDCprepare(query = query.met,
                       save = TRUE, 
                       save.filename = "kircDNAmet.rda",
                       summarizedExperiment = TRUE)

5.2. 基因表达数据下载处理

对于基因表达,我们将使用的数据类型是:Gene Expression Quantification

# Step 1.2 download expression data
#-----------------------------------
# 1.2 - RNA expression
# ----------------------------------
query.exp <- GDCquery(project = "TCGA-KIRC",
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification", 
                      workflow.type = "HTSeq - FPKM-UQ")
GDCdownload(query.exp)
kirc.exp <- GDCprepare(query = query.exp, 
                       save = TRUE, 
                       save.filename = "kircExp.rda")
kirc.exp <- kirc.exp

5.3. DNA甲基化显著差异的远端探针鉴定

来自r BiocStyle::Biocpkg(“MultiAssayExperiment”)的一个MultiAssayExperiment对象作为r BiocStyle::Biocpkg(“ELMER”)的多个主要函数的输入

我们首先需要获得远端探针(2 KB away from TSS)

distal.probes <- get.feature.probe(genome = "hg38", met.platform = "450K")

然后,我们使用createMAE函数保留具有DNA甲基化和基因表达数据的样品。

library(MultiAssayExperiment)
mae <- createMAE(exp = kirc.exp, 
                 met = kirc.met,
                 save = TRUE,
                 linearize.exp = TRUE,
                 filter.probes = distal.probes,
                 save.filename = "mae_kirc.rda",
                 met.platform = "450K",
                 genome = "hg38",
                 TCGA = TRUE)
# Remove FFPE samples
mae <- mae[,!mae$is_ffpe]

然后,我们执行ELMER以鉴定与正常样品相比在肿瘤样品中低甲基化(hypomethylated)的探针。

group.col <- "definition"
group1 <-  "Primary solid Tumor"
group2 <- "Solid Tissue Normal"
direction <- "hypo"
dir.out <- file.path("kirc",direction)
dir.create(dir.out, recursive = TRUE)
#--------------------------------------
# STEP 3: Analysis                     |
#--------------------------------------
# Step 3.1: Get diff methylated probes |
#--------------------------------------
sig.diff <- get.diff.meth(data = mae, 
                          group.col = group.col,
                          group1 =  group1,
                          group2 = group2,
                          minSubgroupFrac = 0.2,
                          sig.dif = 0.3,
                          diff.dir = direction, # Search for hypomethylated probes in group 1
                          cores = 1, 
                          dir.out = dir.out, 
                          pvalue = 0.01)

#-------------------------------------------------------------
# Step 3.2: Identify significant probe-gene pairs            |
#-------------------------------------------------------------
# Collect nearby 20 genes for Sig.probes
nearGenes <- GetNearGenes(data = mae, 
                          probes = sig.diff$probe, 
                          numFlankingGenes = 20, # 10 upstream and 10 dowstream genes
                          cores = 1)

pair <- get.pair(data = mae,
                 group.col = group.col,
                 group1 =  group1,
                 group2 = group2,
                 nearGenes = nearGenes,
                 minSubgroupFrac = 0.4, # % of samples to use in to create groups U/M
                 permu.dir = file.path(dir.out,"permu"),
                 permu.size = 100, # Please set to 100000 to get significant results
                 raw.pvalue  = 0.05,   
                 Pe = 0.01, # Please set to 0.001 to get significant results
                 filter.probes = TRUE, # See preAssociationProbeFiltering function
                 filter.percentage = 0.05,
                 filter.portion = 0.3,
                 dir.out = dir.out,
                 cores = 1,
                 label = direction)

# Identify enriched motif for significantly hypomethylated probes which 
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
                                     probes = pair$Probe, 
                                     dir.out = dir.out, 
                                     label = direction,
                                     min.incidence = 10,
                                     lower.OR = 1.1)
TF <- get.TFs(data = mae, 
              group.col = group.col,
              group1 =  group1,
              group2 = group2,
              minSubgroupFrac = 0.4,
              enriched.motif = enriched.motif,
              dir.out = dir.out, 
              cores = 1, 
              label = direction)

5.4. 远端探针与基因的关系

随后,更具这个分析结果,我们可以验证该探针附近的20个基因的表达与DNA甲基化之间的关系。其结果由ELMER散点图显示。

scatter.plot(data = mae,
             byProbe = list(probe = sig.diff$probe[1], numFlankingGenes = 20), 
             category = "definition", 
             dir.out = "plots",
             lm = TRUE, # Draw linear regression curve
             save = TRUE) 

结果如下所示:

5.5. Motif与基因、转录因子的关系

此外,我们还可以通过散点图观察KIRC样品中,具有UA6 motif的位点(sites)的平均DNA甲基化水平与转录因子("RUNX1","RUNX2","RUNX3")表达水平的关系。

scatter.plot(data = mae,
             byTF = list(TF = c("RUNX1","RUNX2","RUNX3"),
                         probe = enriched.motif[[names(enriched.motif)[10]]]), 
             category = "definition",
             dir.out = "plots",
             save = TRUE, 
             lm_line = TRUE)

结果如下所示:

我们还可以通过绘制热图来查看反相关(anticorrelated)的基因和探针对:

heatmapPairs(data = mae, 
             group.col = "definition",
             group1 = "Primary solid Tumor", 
             annotation.col = c("gender"),
             group2 = "Solid Tissue Normal",
             pairs = pair,
             filename =  "heatmap.pdf")

结果如下所示:

在下图中显示了选定的motif(lower boundary of OR above 1.8)的Odds Ratio(x轴),对应的Odds Ratio的下限高于1.8。并展示了每一个Odds Ratio的95%置信区间(95% confidence interval):

6. 会话信息

sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] TCGAbiolinks_2.10.5         png_0.1-7                  
##  [3] DT_0.5                      dplyr_0.8.0.1              
##  [5] SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
##  [7] BiocParallel_1.16.6         matrixStats_0.54.0         
##  [9] Biobase_2.42.0              GenomicRanges_1.34.0       
## [11] GenomeInfoDb_1.18.2         IRanges_2.16.0             
## [13] S4Vectors_0.20.1            BiocGenerics_0.28.0        
## [15] testthat_2.0.1             
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.3               circlize_0.4.5               
##   [3] AnnotationHub_2.14.5          aroma.light_3.12.0           
##   [5] plyr_1.8.4                    selectr_0.4-1                
##   [7] ConsensusClusterPlus_1.46.0   lazyeval_0.2.2               
##   [9] splines_3.5.3                 usethis_1.4.0                
##  [11] ggplot2_3.1.0                 sva_3.30.1                   
##  [13] digest_0.6.18                 foreach_1.4.4                
##  [15] htmltools_0.3.6               magrittr_1.5                 
##  [17] memoise_1.1.0                 cluster_2.0.7-1              
##  [19] doParallel_1.0.14             limma_3.38.3                 
##  [21] remotes_2.0.2                 ComplexHeatmap_1.20.0        
##  [23] Biostrings_2.50.2             readr_1.3.1                  
##  [25] annotate_1.60.1               sesameData_1.0.0             
##  [27] R.utils_2.8.0                 prettyunits_1.0.2            
##  [29] colorspace_1.4-1              ggrepel_0.8.0                
##  [31] blob_1.1.1                    rvest_0.3.2                  
##  [33] xfun_0.5                      callr_3.2.0                  
##  [35] crayon_1.3.4                  RCurl_1.95-4.12              
##  [37] jsonlite_1.6                  genefilter_1.64.0            
##  [39] survival_2.43-3               zoo_1.8-4                    
##  [41] iterators_1.0.10              glue_1.3.1                   
##  [43] survminer_0.4.3               gtable_0.2.0                 
##  [45] sesame_1.0.0                  zlibbioc_1.28.0              
##  [47] XVector_0.22.0                GetoptLong_0.1.7             
##  [49] wheatmap_0.1.0                pkgbuild_1.0.2               
##  [51] shape_1.4.4                   scales_1.0.0                 
##  [53] DESeq_1.34.1                  DBI_1.0.0                    
##  [55] edgeR_3.24.3                  ggthemes_4.1.0               
##  [57] Rcpp_1.0.1                    xtable_1.8-3                 
##  [59] progress_1.2.0                cmprsk_2.2-7                 
##  [61] matlab_1.0.2                  bit_1.1-14                   
##  [63] preprocessCore_1.44.0         km.ci_0.5-2                  
##  [65] htmlwidgets_1.3               httr_1.4.0                   
##  [67] RColorBrewer_1.1-2            pkgconfig_2.0.2              
##  [69] XML_3.98-1.19                 R.methodsS3_1.7.1            
##  [71] DNAcopy_1.56.0                locfit_1.5-9.1               
##  [73] labeling_0.3                  later_0.8.0                  
##  [75] tidyselect_0.2.5              rlang_0.3.1                  
##  [77] AnnotationDbi_1.44.0          munsell_0.5.0                
##  [79] tools_3.5.3                   downloader_0.4               
##  [81] cli_1.1.0                     ExperimentHub_1.8.0          
##  [83] generics_0.0.2                RSQLite_2.1.1                
##  [85] devtools_2.0.1                broom_0.5.1                  
##  [87] evaluate_0.13                 stringr_1.4.0                
##  [89] yaml_2.2.0                    processx_3.3.0               
##  [91] knitr_1.22                    bit64_0.9-7                  
##  [93] fs_1.2.7                      survMisc_0.5.5               
##  [95] randomForest_4.6-14           purrr_0.3.2                  
##  [97] EDASeq_2.16.3                 nlme_3.1-137                 
##  [99] mime_0.6                      R.oo_1.22.0                  
## [101] xml2_1.2.0                    biomaRt_2.38.0               
## [103] compiler_3.5.3                rstudioapi_0.10              
## [105] curl_3.3                      interactiveDisplayBase_1.20.0
## [107] tibble_2.1.1                  geneplotter_1.60.0           
## [109] stringi_1.4.3                 highr_0.7                    
## [111] ps_1.3.0                      GenomicFeatures_1.34.6       
## [113] desc_1.2.0                    lattice_0.20-38              
## [115] Matrix_1.2-16                 KMsurv_0.1-5                 
## [117] pillar_1.3.1                  BiocManager_1.30.4           
## [119] GlobalOptions_0.1.0           data.table_1.12.0            
## [121] bitops_1.0-6                  httpuv_1.5.0                 
## [123] rtracklayer_1.42.2            R6_2.4.0                     
## [125] latticeExtra_0.6-28           hwriter_1.3.2                
## [127] promises_1.0.1                ShortRead_1.40.0             
## [129] gridExtra_2.3                 codetools_0.2-16             
## [131] sessioninfo_1.1.1             assertthat_0.2.0             
## [133] pkgload_1.0.2                 rprojroot_1.3-2              
## [135] rjson_0.2.20                  withr_2.1.2                  
## [137] GenomicAlignments_1.18.1      Rsamtools_1.34.1             
## [139] GenomeInfoDbData_1.2.0        mgcv_1.8-27                  
## [141] hms_0.4.2                     tidyr_0.8.3                  
## [143] rmarkdown_1.12                ggpubr_0.2                   
## [145] shiny_1.2.0

7. 参考

  • Bullard, James H, Elizabeth Purdom, Kasper D Hansen, and Sandrine Dudoit. 2010. “Evaluation of Statistical Methods for Normalization and Differential Expression in mRNA-Seq Experiments.” BMC Bioinformatics 11 (1). BioMed Central Ltd:94.
  • Chedraoui Silva, Tiago, Simon G. Coetzee, Lijing Yao, Dennis J. Hazelett, Houtan Noushmehr, and Benjamin P. Berman. 2017. “Enhancer Linking by Methylation/Expression Relationships with the R Package Elmer Version 2.” bioRxiv. Cold Spring Harbor Labs Journals. https://doi.org/10.1101/148726.
  • Risso, Davide, Katja Schwartz, Gavin Sherlock, and Sandrine Dudoit. 2011. “GC-Content Normalization for Rna-Seq Data.” BMC Bioinformatics 12 (1). BioMed Central Ltd:480.
  • Yao, Lijing, Benjamin P Berman, and Peggy J Farnham. 2015. “Demystifying the Secret Mission of Enhancers: Linking Distal Regulatory Elements to Target Genes.” Critical Reviews in Biochemistry and Molecular Biology 50 (6). Taylor & Francis:550–73.
  • Yao, L, H Shen, PW Laird, PJ Farnham, and BP Berman. 2015. “Inferring Regulatory Element Landscapes and Transcription Factor Networks from Cancer Methylomes.” Genome Biology 16 (1):105–5.

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×