Arnaud Wolfer
2018-06-13
peakPantheR
被设计用来对MS文件进行检测、积分和报告预定义的特征(detection, integration and reporting of pre-defined features)。
对与上述的 “检测、积分和报告预定义的特征” 的理解,我目前是这样的:
detect:发现原始数据中的峰
integrate:计算峰的面积
features:不同组别之间存在差异的峰,这个差异的计算是通过integrate的结果
Parallel Annotation 设置为以并行的方式在多个文件中对多个化合物进行检测和积分(detect and integrate),并将结果存储在单个对象中。
使用一个示例数据集,主要目的如下:
并行的化合物积分(compound integration)过程如下:
在多个文件中并行处理多个化合物,将结果存储在单个对象中
加载预期的有兴趣的RT / m/z 区域(regions of interest, ROI)的列表和要处理的文件列表
ROI 实际上就是指你希望探索的化合物的RT / m/z 信息
用预期的ROI和文件路径初始化输出对象
第一次流程(没有峰填充(peak filling)):
对于每个文件,在每个ROI中检测特征并保留强度最强的一个(highest intensity)
原始数据中,峰是通过采集一个一个具有一定intensity 的点构成的,那么keep the highest intensity one,实际上就是在 ROI 这个区域的峰的检测中,记录峰的顶点
确定每个特征的峰值统计信息
为每个 ROI 存储结果和 EIC (extracted ion chromatogram,提取离子色谱图)
EIC 就是满足某个分子量范围(m/z)的 TIC 图,这里就是指满足 ROI 的 m/z 范围内的 TIC 图
对第一次流程结果的可视化检查,更新ROI:
初始化新输出对象,这个对象具有更新的兴趣区域(updated regions of interest, uROI)和回退积分区域(fallback integration regions, FIR)
第二次流程(有峰填充(peak filling)):
统计概括:
返回结果对象和/或表(行:文件(实际是指样本),列:化合物)
我们可以使用peakPantheR_parallelAnnotation()
从 faahKO 包的3个MS光谱文件(MS spectra file)中获得两个目标特征:
setRepositories(ind=1:4)
install.packages('faahKO')
输入光谱数据:
library(faahKO)
## file paths
input_spectraPaths <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF"
定义两个目标特征(targeed features)的信息并存储在一个表中:
cpdID
(numeric)cpdName
(character)rtMin
(sec)rtMax
(sec)rt
(sec, optional / NA
)mzMin
(m/z)mzMax
(m/z)mz
(m/z, optional / NA
)# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", "mz", "mzMax"))), stringsAsFactors=F)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], as.numeric)
cpdID | cpdName | rtMin | rt | rtMax | mzMin | mz | mzMax |
---|---|---|---|---|---|---|---|
ID-1 | Cpd 1 | 3310 | 3344.888 | 3390 | 522.194778 | 522.2 | 522.205222 |
ID-2 | Cpd 2 | 3280 | 3385.577 | 3440 | 496.195038 | 496.2 | 496.204962 |
可以提供额外的化合物和光谱元数据,但在匹配(fit)的时候不会使用:
# spectra Metadata
input_spectraMetadata <- data.frame(matrix(c("sample type 1", "sample type 2", "sample type 1"), 3, 1, dimnames=list(c(), c("sampleType"))), stringsAsFactors=F)
sampleType |
---|
sample type 1 |
sample type 2 |
sample type 1 |
一个peakPantheRAnnotation
对象首次初始化的时候,需要通过spectraPaths
参数指定需要处理的文件,通过targetFeatTable
指定已知的化合物,此外通过spectraMetadata
、 uROI
、 FIR
可以贴加额外的信息和参数,并且如果使用他们,需要设定useUROI=TRUE
、 useFIR=TRUE
:
library(peakPantheR)
init_annotation <- peakPantheRAnnotation(spectraPaths = input_spectraPaths,
targetFeatTable = input_targetFeatTable,
spectraMetadata = input_spectraMetadata)
peakPantheRAnnotation
产生的对象init_annotation
还没有注释,并不包含uROI
和 FIR
信息,也不能使用。
init_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI do not exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is not annotated
peakPantheR_parallelAnnotation()
将并行地执行注释(如果ncores
大于0),并返回成功的注释(result$annotation
)和失败的注释(result$failures
):
# annotate files serially
annotation_result <- peakPantheR_parallelAnnotation(init_annotation, ncores=0, verbose=TRUE)
#> Processing 2 compounds in 3 samples:
#> uROI: FALSE
#> FIR: FALSE
#> ----- ko15 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.59 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Found 2/2 features in 0.12 secs
#> Peak statistics done in: 0.03 secs
#> Feature search done in: 1.26 secs
#> ----- ko16 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.59 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 1.33 secs
#> ----- ko18 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.56 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 1.03 secs
#> Annotation object cannot be reordered by sample acquisition date
#> ----------------
#> Parallel annotation done in: 4.77 secs
#> 0 failure(s)
# successful fit
nbSamples(annotation_result$annotation)
#> [1] 3
data_annotation <- annotation_result$annotation
data_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI do not exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is annotated
# list failed fit
annotation_result$failures
#> [1] file error
#> <0 rows> (or 0-length row.names)
基于匹配(fit)的结果,通过annotationParamsDiagnostic()
来确定uROI
和FIR
:
uROI
:通过已发现峰 (+/- 5% in RT)中的最小/最大(rt
和 m/z
)建立FIR
通过已经发现的rtMin
、 rtMax
、 mzMin
、 mzMax
的中位数来建立updated_annotation <- annotationParamsDiagnostic(data_annotation, verbose=TRUE)
#> uROI will be set as mimimum/maximum of found peaks (+/-5% of ROI in retention time)
#> FIR will be calculated as the median of found "rtMin","rtMax","mzMin","mzMax"
# uROI now exist
updated_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is annotated
outputAnnotationDiagnostic()
将保存结果到本地,而保存的annotationParameters_summary.csv
文件包含原始的 ROI
和新确定的 uROI
和 FIR
,并用这些信息进行人工验证。此外,还为每个化合物提供了一个诊断图(diagnostic plot)以供参考,并可根据ncores
并行生成相关结果:
# create a colourScale based on the sampleType
uniq_sType <- sort(unique(spectraMetadata(updated_annotation)$sampleType),na.last=TRUE)
col_sType <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(updated_annotation)$sampleType] )
# output fit diagnostic to disk
outputAnnotationDiagnostic(updated_annotation, saveFolder='/output_folder/', savePlots=TRUE, sampleColour=col_sType, verbose=TRUE, ncores=2)
cpdID | cpdName | X | ROI_rt | ROI_mz | ROI_rtMin | ROI_rtMax | ROI_mzMin |
---|---|---|---|---|---|---|---|
ID-1 | Cpd 1 | | | 3344.888 | 522.2 | 3310 | 3390 | 522.194778 |
ID-2 | Cpd 2 | | | 3385.577 | 496.2 | 3280 | 3440 | 496.195038 |
ROI_mzMax | X | uROI_rtMin | uROI_rtMax | uROI_mzMin | uROI_mzMax | uROI_rt |
---|---|---|---|---|---|---|
522.205222 | | | 3305.75893 | 3411.43628 | 522.194778 | 522.205222 | 3344.888 |
496.204962 | | | 3337.37666 | 3462.44903 | 496.195038 | 496.204962 | 3385.577 |
uROI_mz | X | FIR_rtMin | FIR_rtMax | FIR_mzMin | FIR_mzMax |
---|---|---|---|---|---|
522.2 | | | 3326.10635 | 3407.27265 | 522.194778 | 522.205222 |
496.2 | | | 3365.02386 | 3453.40496 | 496.195038 | 496.204962 |
化合物 1 的诊断图:
1)顶部的面板是从所有样本中提取的 EIC 的重叠图,它的拟合曲线是虚线。
2)在EIC下的面板代表每一个发现的峰的保留时间的峰宽(peak RT peakwidth:
rtMin
、rtMax
和点标记的顶点(apex )),按照顺序,第一个样本在最上面3)底部的3个面板代表按运行过程发现的
RT
(peakwidth),m/z
(peakwidth)和peak area
,同时右边有对应的直方图
可以根据化合物诊断图更新输出到.csv
中的ROI
;而 uROI
(更新的 ROI,可能用于全部的样本)和 FIR
(当没有发现峰值时,获得回退集成区域)也可以被调整以更好地适应峰。
新的初始化和更新的参数将应用于所有的研究样本
在手动验证参考样本的匹配后,可以重新加载.csv
文件中修改的参数并将其应用到所有的研究样本中。
peakPantheR_loadAnnotationParamsCSV()
将从.csv
中加载新的参数(由outputAnnotationDiagnostic()
生成),并初始化一个peakPantheRAnnotation
对象,而初始化过程中,不需要spectraPaths
、 spectraMetadata
或者cpdMetadata
参数,这是因为在之前的步骤中,这些信息都贴加了。useUROI
和 useFIR
被设置为FALSE
,后续将根据这个做相应的设置。uROIExist
的构建取决于.csv
文件中的uROI
列的信息,只要没有NA
,uROIExist
将被设置为TRUE
update_csv_path <- '/path_to_new_csv/'
# load csv
new_annotation <- peakPantheR_loadAnnotationParamsCSV(update_csv_path)
#> uROIExist set to TRUE
#> New peakPantheRAnnotation object initialised for 2 compounds
new_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 0 samples.
#> updated ROI exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is not annotated
既然在QC样品上设置了匹配(fit)参数,同样的处理也可以应用到所有的研究样本中。resetAnnotation()
将重新初始化所有的结果,并修改目标样本或化合物(如果需要的话):
## new files
new_spectraPaths <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/WT/wt15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/WT/wt16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"),
system.file('cdf/WT/wt18.CDF', package = "faahKO"))
#> [1] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt15.CDF"
#> [3] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF"
#> [4] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt16.CDF"
#> [5] "C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF"
#> [6] "C:/R/R-3.5.1/library/faahKO/cdf/WT/wt18.CDF"
## new spectra metadata
new_spectraMetadata <- data.frame(matrix(c("KO", "WT", "KO", "WT", "KO", "WT"), 6, 1, dimnames=list(c(), c("Group"))), stringsAsFactors=F)
Group |
---|
KO |
WT |
KO |
WT |
KO |
WT |
## add new samples to the annotation loaded from csv, useUROI, useFIR
new_annotation <- resetAnnotation(new_annotation, spectraPaths=new_spectraPaths, spectraMetadata=new_spectraMetadata, useUROI=TRUE, useFIR=TRUE)
#> peakPantheRAnnotation object being reset:
#> Previous "ROI", "cpdID" and "cpdName" value kept
#> Previous "uROI" value kept
#> Previous "FIR" value kept
#> Previous "cpdMetadata" value kept
#> New "spectraPaths" value set
#> New "spectraMetadata" value set
#> Previous "uROIExist" value kept
#> New "useUROI" value set
#> New "useFIR" value set
new_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 6 samples.
#> updated ROI exist (uROI)
#> uses updated ROI (uROI)
#> uses fallback integration regions (FIR)
#> is not annotated
运行最后的注释:
# annotate files serially
new_annotation_result <- peakPantheR_parallelAnnotation(new_annotation, ncores=0, verbose=FALSE)
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
# successful fit
nbSamples(new_annotation_result$annotation)
#> [1] 6
final_annotation <- new_annotation_result$annotation
final_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 6 samples.
#> updated ROI exist (uROI)
#> uses updated ROI (uROI)
#> uses fallback integration regions (FIR)
#> is annotated
# list failed fit
new_annotation_result$failures
#> [1] file error
#> <0 rows> (or 0-length row.names)
用outputAnnotationDiagnostic()
保存最终匹配(fit)的结果到本地:
# create a colourScale based on the sampleType
uniq_group <- sort(unique(spectraMetadata(final_annotation)$Group),na.last=TRUE)
col_group <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(final_annotation)$Group] )
# output fit diagnostic to disk
outputAnnotationDiagnostic(final_annotation, saveFolder='/final_output_folder/', savePlots=TRUE, sampleColour=col_group, verbose=TRUE)
对于每一个经过处理的样品,一个peakTables
包含针对所有目标化合物的所有匹配(fit)信息。annotationTable( , column)
将集合任何peakTables
列中所有样本和化合物的值:
# peakTables for the first sample
peakTables(final_annotation)[[1]]
found | rtMin | rt | rtMax | mzMin | mz | mzMax | peakArea |
---|---|---|---|---|---|---|---|
TRUE | 3342 | 3342 | 3395 | 522.2 | 522.2 | 522.2 | 18409123 |
TRUE | 3345 | 3387 | 3428 | 496.2 | 496.2 | 496.2 | 35467323 |
maxIntMeasured | maxIntPredicted | is_filled | ppm_error | rt_dev_sec |
---|---|---|---|---|
889280 | 907347 | FALSE | 0.02338 | -2.928 |
1128960 | 1113682 | FALSE | 0.0246 | 0.9518 |
tailingFactor | asymmetryFactor | cpdID | cpdName |
---|---|---|---|
203.5 | 377.4 | ID-1 | Cpd 1 |
1.005 | 1.009 | ID-2 | Cpd 2 |
# Extract the found peak area for all compounds and all samples
annotationTable(final_annotation, column='peakArea')
ID-1 | ID-2 | |
---|---|---|
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko15.CDF | 18409123 | 35467323 |
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt15.CDF | 23871264 | 37965512 |
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko16.CDF | 24775525 | 37795145 |
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt16.CDF | 25012332 | 34499235 |
C:/R/R-3.5.1/library/faahKO/cdf/KO/ko18.CDF | 21909568 | 36717689 |
C:/R/R-3.5.1/library/faahKO/cdf/WT/wt18.CDF | 21729136 | 36961319 |
最后,所有的注释结果都可以通过outputAnnotationResult()
以.csv
文件保存到本地。这些.csv
文件将包含化合物元数据、光谱元数据和一个包含peakTables
每一列信息的文件 (行:样本;列:化合物):
# save
outputAnnotationResult(final_annotation, saveFolder='/final_output_folder/', annotationName='ProjectName', verbose=TRUE)
#> Compound metadata saved at /final_output_folder/ProjectName_cpdMetadata.csv
#> Spectra metadata saved at /final_output_folder/ProjectName_spectraMetadata.csv
#> Peak measurement "found" saved at /final_output_folder/ProjectName_found.csv
#> Peak measurement "rtMin" saved at /final_output_folder/ProjectName_rtMin.csv
#> Peak measurement "rt" saved at /final_output_folder/ProjectName_rt.csv
#> Peak measurement "rtMax" saved at /final_output_folder/ProjectName_rtMax.csv
#> Peak measurement "mzMin" saved at /final_output_folder/ProjectName_mzMin.csv
#> Peak measurement "mz" saved at /final_output_folder/ProjectName_mz.csv
#> Peak measurement "mzMax" saved at /final_output_folder/ProjectName_mzMax.csv
#> Peak measurement "peakArea" saved at /final_output_folder/ProjectName_peakArea.csv
#> Peak measurement "maxIntMeasured" saved at /final_output_folder/ProjectName_maxIntMeasured.csv
#> Peak measurement "maxIntPredicted" saved at /final_output_folder/ProjectName_maxIntPredicted.csv
#> Peak measurement "is_filled" saved at /final_output_folder/ProjectName_is_filled.csv
#> Peak measurement "ppm_error" saved at /final_output_folder/ProjectName_ppm_error.csv
#> Peak measurement "rt_dev_sec" saved at /final_output_folder/ProjectName_rt_dev_sec.csv
#> Peak measurement "tailingFactor" saved at /final_output_folder/ProjectName_tailingFactor.csv
#> Peak measurement "asymmetryFactor" saved at /final_output_folder/ProjectName_asymmetryFactor.csv
#> Summary saved at /final_output_folder/ProjectName_summary.csv
本文由 石九流 创作,如果您觉得本文不错,请随意赞赏
采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
原文链接:https://blog.computsystmed.com/archives/---peakpanther-3-parallel-annotation
最后更新:2019-05-22 22:29:31
Update your browser to view this website correctly. Update my browser now