CIESC Journal

• 生物化学工程与技术 • 上一篇    下一篇

代谢物组学信息挖掘的WT-HCA方法

夏金梅 ;吴晓建; 元英进   

  1. 天津大学化工学院制药工程系

  • 出版日期:2007-07-05 发布日期:2007-07-05

Metabolomics data mining method: WT-HCA

XIA Jinmei;WU Xiaojian;YUAN Yingjin   

  • Online:2007-07-05 Published:2007-07-05

摘要:

针对现有的代谢物组学信息挖掘方法存在的问题,尝试将小波分析(wavelet analysis)与无导式模式识别手段等级聚类分析(hierarchical clustering analysis, HCA)相结合,整合小波分析在频域去噪及信息提取的能力和等级聚类分析客观性强的特点,建立了小波变换-等级聚类分析(wavelet transform-hierarchical clustering analysis, WT-HCA)方法。以文献拟南芥代谢物组数据为例,考察了所建立方法提取代谢物组信息的能力。结果表明,WT-HCA方法可以有效地提取代谢物组信息。在系统默认距离定义方案下,WT-HCA方法能将亲本两类样品完全分开,而HCA方法基本不能将样品区分开;在另一种距离定义方案(样品间距离为欧氏距离,类间距离为离差平方和距离)下,WT-HCA方法将4类样品中的3类完全正确归类,总的分类正确率达到了93.75%,显著高于HCA所得到的84.375%的总体分类正确率。

Abstract:

The data mining methods used currently are very sensitive to the data being processed, and thus are difficult to be generalized.In order to overcome the shortcomings of the present methods, the de-noise ability of wavelet analysis and the objectivity characteristics of hierarchical clustering analysis (HCA)were integrated by combining the two methods together, and thus a new method—wavelet transform-hierarchical clustering analysis (WT-HCA) was established.The information extracting ability of the new method WT-HCA was investigated.The results showed that WT-HCA could obtain information from metabolomics data effectively.Under the default distance definition, WT-HCA could distinguish the two parental lines totally while HCA could hardly do so.Under another distance definition (distance between samples is the Euclidean distance, distance between groups is the ward distance), the overall percentage of correctly clustered samples after using WT-HCA could reach 93.75%, while HCA could only reach 84.375%.