翻译 - peakPantheR-3: Parallel Annotation

Parallel Annotation

Arnaud Wolfer

2018-06-13

peakPantheR 被设计用来对MS文件进行检测、积分和报告预定义的特征(detection, integration and reporting of pre-defined features)。

对与上述的 “检测、积分和报告预定义的特征” 的理解,我目前是这样的:

detect:发现原始数据中的峰

integrate:计算峰的面积

features:不同组别之间存在差异的峰,这个差异的计算是通过integrate的结果

Parallel Annotation 设置为以并行的方式在多个文件中对多个化合物进行检测和积分(detect and integrate),并将结果存储在单个对象中。

使用一个示例数据集,主要目的如下:

  • 详细说明 Parallel Annotation 的概念
  • 应用 Parallel Annotation到一个案例dataset

1. Parallel Annotation 概念

并行的化合物积分(compound integration)过程如下:

  • 在多个文件中并行处理多个化合物,将结果存储在单个对象中

  • 加载预期的有兴趣的RT / m/z 区域(regions of interest, ROI)的列表和要处理的文件列表

    ROI 实际上就是指你希望探索的化合物的RT / m/z 信息

  • 用预期的ROI和文件路径初始化输出对象

  • 第一次流程(没有峰填充(peak filling)):

    • 对于每个文件,在每个ROI中检测特征并保留强度最强的一个(highest intensity)

      原始数据中,峰是通过采集一个一个具有一定intensity 的点构成的,那么keep the highest intensity one,实际上就是在 ROI 这个区域的峰的检测中,记录峰的顶点

    • 确定每个特征的峰值统计信息

    • 为每个 ROI 存储结果和 EIC (extracted ion chromatogram,提取离子色谱图)

      EIC 就是满足某个分子量范围(m/z)的 TIC 图,这里就是指满足 ROI 的 m/z 范围内的 TIC 图

  • 对第一次流程结果的可视化检查,更新ROI:

    • 诊断图:所有的 EICs,峰的顶点(peak apex)和 peakwidth 的被评估
    • 纠正ROI(消除干扰特性,纠正 RT 移位)
    • 如果没有检测到任何特性,则定义回退积分区域(fallback integration regions, FIR)(从RT* / m/z 的中位数开始到找到特征结束)
  • 初始化新输出对象,这个对象具有更新的兴趣区域(updated regions of interest, uROI)和回退积分区域(fallback integration regions, FIR)

  • 第二次流程(有峰填充(peak filling)):

    • 对于每个文件,在每个 uROI 中检测特征并保留强度最强的一个(highest intensity)
    • 确定每个特征的峰值统计信息
    • 在没有发现峰值的情况下对 FIR 进行积分(integrate FIR)
    • 为每个 uROI 存储结果和 EIC
  • 统计概括:

    • 绘制EICs、顶点(apex )和 peakwidth 进行评估
    • 比较第一和第二次处理流程
  • 返回结果对象和/或表(行:文件(实际是指样本),列:化合物)

2. Parallel Annotation 案例

我们可以使用peakPantheR_parallelAnnotation()faahKO 包的3个MS光谱文件(MS spectra file)中获得两个目标特征:

setRepositories(ind=1:4)
install.packages('faahKO')

2.1. Input Data

输入光谱数据:

library(faahKO)
## file paths
input_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                         system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                         system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF"

定义两个目标特征(targeed features)的信息并存储在一个表中:

  • cpdID (numeric)
  • cpdName (character)
  • rtMin (sec)
  • rtMax (sec)
  • rt (sec, optional / NA)
  • mzMin (m/z)
  • mzMax (m/z)
  • mz (m/z, optional / NA)
# targetFeatTable
input_targetFeatTable     <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", "mz", "mzMax"))), stringsAsFactors=F)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], as.numeric)
cpdIDcpdNamertMinrtrtMaxmzMinmzmzMax
ID-1Cpd 133103344.8883390522.194778522.2522.205222
ID-2Cpd 232803385.5773440496.195038496.2496.204962

可以提供额外的化合物和光谱元数据,但在匹配(fit)的时候不会使用:

# spectra Metadata
input_spectraMetadata  <- data.frame(matrix(c("sample type 1", "sample type 2", "sample type 1"), 3, 1, dimnames=list(c(), c("sampleType"))), stringsAsFactors=F)
sampleType
sample type 1
sample type 2
sample type 1

2.2. Initialise and Run Parallel Annotation

一个peakPantheRAnnotation 对象首次初始化的时候,需要通过spectraPaths参数指定需要处理的文件,通过targetFeatTable指定已知的化合物,此外通过spectraMetadatauROIFIR 可以贴加额外的信息和参数,并且如果使用他们,需要设定useUROI=TRUEuseFIR=TRUE

library(peakPantheR)
init_annotation <- peakPantheRAnnotation(spectraPaths = input_spectraPaths,
                                         targetFeatTable = input_targetFeatTable,
                                         spectraMetadata = input_spectraMetadata)

peakPantheRAnnotation 产生的对象init_annotation还没有注释,并不包含uROIFIR信息,也不能使用。

init_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 3 samples.
#>   updated ROI do not exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is not annotated

peakPantheR_parallelAnnotation()将并行地执行注释(如果ncores大于0),并返回成功的注释(result$annotation)和失败的注释(result$failures):

# annotate files serially
annotation_result <- peakPantheR_parallelAnnotation(init_annotation, ncores=0, verbose=TRUE)
#> Processing 2 compounds in 3 samples:
#>   uROI:  FALSE
#>   FIR:   FALSE
#> ----- ko15 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.59 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Found 2/2 features in 0.12 secs
#> Peak statistics done in: 0.03 secs
#> Feature search done in: 1.26 secs
#> ----- ko16 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.59 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 1.33 secs
#> ----- ko18 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.56 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 1.03 secs
#> Annotation object cannot be reordered by sample acquisition date
#> ----------------
#> Parallel annotation done in: 4.77 secs
#>   0 failure(s)

# successful fit
nbSamples(annotation_result$annotation)
#> [1] 3
data_annotation   <- annotation_result$annotation
data_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 3 samples.
#>   updated ROI do not exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is annotated

# list failed fit
annotation_result$failures
#> [1] file  error
#> <0 rows> (or 0-length row.names)

2.3. Process Parallel Annotation Results

基于匹配(fit)的结果,通过annotationParamsDiagnostic()来确定uROIFIR

  • uROI :通过已发现峰 (+/- 5% in RT)中的最小/最大(rtm/z)建立
  • FIR 通过已经发现的rtMinrtMaxmzMinmzMax的中位数来建立
updated_annotation  <- annotationParamsDiagnostic(data_annotation, verbose=TRUE)
#> uROI will be set as mimimum/maximum of found peaks (+/-5% of ROI in retention time)
#> FIR will be calculated as the median of found "rtMin","rtMax","mzMin","mzMax"

# uROI now exist
updated_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 3 samples.
#>   updated ROI exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is annotated

outputAnnotationDiagnostic() 将保存结果到本地,而保存的annotationParameters_summary.csv 文件包含原始的 ROI 和新确定的 uROIFIR,并用这些信息进行人工验证。此外,还为每个化合物提供了一个诊断图(diagnostic plot)以供参考,并可根据ncores并行生成相关结果:

# create a colourScale based on the sampleType
uniq_sType <- sort(unique(spectraMetadata(updated_annotation)$sampleType),na.last=TRUE)
col_sType  <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(updated_annotation)$sampleType] )

# output fit diagnostic to disk
outputAnnotationDiagnostic(updated_annotation, saveFolder='/output_folder/', savePlots=TRUE, sampleColour=col_sType, verbose=TRUE, ncores=2)
cpdIDcpdNameXROI_rtROI_mzROI_rtMinROI_rtMaxROI_mzMin
ID-1Cpd 1|3344.888522.233103390522.194778
ID-2Cpd 2|3385.577496.232803440496.195038
ROI_mzMaxXuROI_rtMinuROI_rtMaxuROI_mzMinuROI_mzMaxuROI_rt
522.205222|3305.758933411.43628522.194778522.2052223344.888
496.204962|3337.376663462.44903496.195038496.2049623385.577
uROI_mzXFIR_rtMinFIR_rtMaxFIR_mzMinFIR_mzMax
522.2|3326.106353407.27265522.194778522.205222
496.2|3365.023863453.40496496.195038496.204962

化合物 1 的诊断图:

1)顶部的面板是从所有样本中提取的 EIC 的重叠图,它的拟合曲线是虚线。

2)在EIC下的面板代表每一个发现的峰的保留时间的峰宽(peak RT peakwidth:rtMinrtMax 和点标记的顶点(apex )),按照顺序,第一个样本在最上面

3)底部的3个面板代表按运行过程发现的 RT(peakwidth),m/z(peakwidth)和peak area,同时右边有对应的直方图

可以根据化合物诊断图更新输出到.csv中的ROI;而 uROI(更新的 ROI,可能用于全部的样本)和 FIR(当没有发现峰值时,获得回退集成区域)也可以被调整以更好地适应峰。

2.4. New Initialisation with Updated Parameters to be Applied to All Study Samples

新的初始化和更新的参数将应用于所有的研究样本

在手动验证参考样本的匹配后,可以重新加载.csv文件中修改的参数并将其应用到所有的研究样本中。

2.4.1. Load new fit parameters

peakPantheR_loadAnnotationParamsCSV()将从.csv中加载新的参数(由outputAnnotationDiagnostic()生成),并初始化一个peakPantheRAnnotation 对象,而初始化过程中,不需要spectraPathsspectraMetadata或者cpdMetadata 参数,这是因为在之前的步骤中,这些信息都贴加了。useUROIuseFIR 被设置为FALSE ,后续将根据这个做相应的设置。uROIExist的构建取决于.csv文件中的uROI列的信息,只要没有NAuROIExist将被设置为TRUE

update_csv_path <- '/path_to_new_csv/'

# load csv
new_annotation <- peakPantheR_loadAnnotationParamsCSV(update_csv_path)
#> uROIExist set to TRUE
#> New peakPantheRAnnotation object initialised for 2 compounds

new_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 0 samples.
#>   updated ROI exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is not annotated

2.4.2. Add new samples to process

既然在QC样品上设置了匹配(fit)参数,同样的处理也可以应用到所有的研究样本中。resetAnnotation() 将重新初始化所有的结果,并修改目标样本或化合物(如果需要的话):

## new files
new_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                       system.file('cdf/WT/wt15.CDF', package = "faahKO"),
                       system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                       system.file('cdf/WT/wt16.CDF', package = "faahKO"),
                       system.file('cdf/KO/ko18.CDF', package = "faahKO"),
                       system.file('cdf/WT/wt18.CDF', package = "faahKO"))
#> [1] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt15.CDF"
#> [3] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF"
#> [4] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt16.CDF"
#> [5] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF"
#> [6] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt18.CDF"

## new spectra metadata
new_spectraMetadata  <- data.frame(matrix(c("KO", "WT", "KO", "WT", "KO", "WT"), 6, 1, dimnames=list(c(), c("Group"))), stringsAsFactors=F)
Group
KO
WT
KO
WT
KO
WT
## add new samples to the annotation loaded from csv, useUROI, useFIR

new_annotation <- resetAnnotation(new_annotation, spectraPaths=new_spectraPaths, spectraMetadata=new_spectraMetadata, useUROI=TRUE, useFIR=TRUE)
#> peakPantheRAnnotation object being reset:
#>   Previous "ROI", "cpdID" and "cpdName" value kept
#>   Previous "uROI" value kept
#>   Previous "FIR" value kept
#>   Previous "cpdMetadata" value kept
#>   New "spectraPaths" value set
#>   New "spectraMetadata" value set
#>   Previous "uROIExist" value kept
#>   New "useUROI" value set
#>   New "useFIR" value set
new_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 6 samples.
#>   updated ROI exist (uROI)
#>   uses updated ROI (uROI)
#>   uses fallback integration regions (FIR)
#>   is not annotated

2.5. Run Final Parallel Annotation

运行最后的注释:

# annotate files serially
new_annotation_result <- peakPantheR_parallelAnnotation(new_annotation, ncores=0, verbose=FALSE)
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.

# successful fit
nbSamples(new_annotation_result$annotation)
#> [1] 6

final_annotation      <- new_annotation_result$annotation
final_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 6 samples.
#>   updated ROI exist (uROI)
#>   uses updated ROI (uROI)
#>   uses fallback integration regions (FIR)
#>   is annotated

# list failed fit
new_annotation_result$failures
#> [1] file  error
#> <0 rows> (or 0-length row.names)

2.5.1. Output final results

outputAnnotationDiagnostic()保存最终匹配(fit)的结果到本地:

# create a colourScale based on the sampleType
uniq_group <- sort(unique(spectraMetadata(final_annotation)$Group),na.last=TRUE)
col_group  <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(final_annotation)$Group] )

# output fit diagnostic to disk
outputAnnotationDiagnostic(final_annotation, saveFolder='/final_output_folder/', savePlots=TRUE, sampleColour=col_group, verbose=TRUE)

对于每一个经过处理的样品,一个peakTables包含针对所有目标化合物的所有匹配(fit)信息。annotationTable( , column)将集合任何peakTables 列中所有样本和化合物的值:

# peakTables for the first sample
peakTables(final_annotation)[[1]]
foundrtMinrtrtMaxmzMinmzmzMaxpeakArea
TRUE334233423395522.2522.2522.218409123
TRUE334533873428496.2496.2496.235467323
maxIntMeasuredmaxIntPredictedis_filledppm_errorrt_dev_sec
889280907347FALSE0.02338-2.928
11289601113682FALSE0.02460.9518
tailingFactorasymmetryFactorcpdIDcpdName
203.5377.4ID-1Cpd 1
1.0051.009ID-2Cpd 2
# Extract the found peak area for all compounds and all samples
annotationTable(final_annotation, column='peakArea')
ID-1ID-2
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF1840912335467323
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt15.CDF2387126437965512
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF2477552537795145
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt16.CDF2501233234499235
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF2190956836717689
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt18.CDF2172913636961319

最后,所有的注释结果都可以通过outputAnnotationResult().csv文件保存到本地。这些.csv 文件将包含化合物元数据、光谱元数据和一个包含peakTables每一列信息的文件 (行:样本;列:化合物):

# save
outputAnnotationResult(final_annotation, saveFolder='/final_output_folder/', annotationName='ProjectName', verbose=TRUE)
#> Compound metadata saved at /final_output_folder/ProjectName_cpdMetadata.csv
#> Spectra metadata saved at /final_output_folder/ProjectName_spectraMetadata.csv
#> Peak measurement "found" saved at /final_output_folder/ProjectName_found.csv
#> Peak measurement "rtMin" saved at /final_output_folder/ProjectName_rtMin.csv
#> Peak measurement "rt" saved at /final_output_folder/ProjectName_rt.csv
#> Peak measurement "rtMax" saved at /final_output_folder/ProjectName_rtMax.csv
#> Peak measurement "mzMin" saved at /final_output_folder/ProjectName_mzMin.csv
#> Peak measurement "mz" saved at /final_output_folder/ProjectName_mz.csv
#> Peak measurement "mzMax" saved at /final_output_folder/ProjectName_mzMax.csv
#> Peak measurement "peakArea" saved at /final_output_folder/ProjectName_peakArea.csv
#> Peak measurement "maxIntMeasured" saved at /final_output_folder/ProjectName_maxIntMeasured.csv
#> Peak measurement "maxIntPredicted" saved at /final_output_folder/ProjectName_maxIntPredicted.csv
#> Peak measurement "is_filled" saved at /final_output_folder/ProjectName_is_filled.csv
#> Peak measurement "ppm_error" saved at /final_output_folder/ProjectName_ppm_error.csv
#> Peak measurement "rt_dev_sec" saved at /final_output_folder/ProjectName_rt_dev_sec.csv
#> Peak measurement "tailingFactor" saved at /final_output_folder/ProjectName_tailingFactor.csv
#> Peak measurement "asymmetryFactor" saved at /final_output_folder/ProjectName_asymmetryFactor.csv
#> Summary saved at /final_output_folder/ProjectName_summary.csv

3. See Also

更新时间:2019-05-22 22:29:31

本文由 石九流 创作,如果您觉得本文不错,请随意赞赏
采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
原文链接:https://blog.computsystmed.com/archives/---peakpanther-3-parallel-annotation
最后更新:2019-05-22 22:29:31

评论

Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×