CIESC Journal ›› 2013, Vol. 64 ›› Issue (12): 4662-4666.DOI: 10.3969/j.issn.0438-1157.2013.12.058

Previous Articles     Next Articles

Eukaryotic promoter LS-SVM with GMM kernel

GUO Shuo1, YUAN Decheng1, GUO Wa2   

  1. 1. Information Engineering College, Shenyang University of Chemical Technology, Shenyang 110142, Liaoning, China;
    2. Human Resource Department, SGCC Liaoning Electric Power Co.LTD Tieling Power Supply Company, Tieling 112000, Liaoning, China
  • Received:2013-08-19 Revised:2013-09-07 Online:2013-12-05 Published:2013-12-05
  • Supported by:

    supported by the National Natural Science Foundation of China (61104093).

基于GMM核的LS-SVM真核启动子模型

郭烁1, 袁德成1, 郭娲2   

  1. 1. 沈阳化工大学信息工程学院, 辽宁 沈阳 100142;
    2. 国网辽宁省电力有限公司铁岭供电公司人力资源部, 辽宁 铁岭 112000
  • 通讯作者: 郭烁
  • 作者简介:郭烁(1978- ),女,博士,讲师。
  • 基金资助:

    国家自然科学基金项目(61104093);辽宁省科学研究基金项目(L2012141);辽宁省教学研究基金项目(2011A017)。

Abstract: Recognition of gene promoter DNA sequence is difficult with the complex structure and the huge amount of data.In this paper,the positional densities of oligonucleotides are modeled by Gaussian mixture model.It can identify less frequent but important motifs,since the positional density is independent of the actual occurrence frequency of the oligonucleotide.These motifs generally correspond to the consensus sequences of transcription factor binding site.GMM is used as eukaryotic promoter LS-SVM kernel,which simplifies the LS-SVM as LS model.The algorithm is simplified and the computational complexity is decreased.The simulation results show the accuracy is improved compared with Bayesian classifier,and is same to LS-SVM with RBF kernel,moreover the model building time is shorter.

Key words: Gaussian mixture model, kernel function, least square support vector machine, DNA, model reduction, algorithm

摘要: 由于真核启动子DNA序列结构复杂、数据量巨大,启动子序列辨识一直是一个难点。首先对真核启动子序列寡核苷酸位置分布特征进行高斯混合模型建模,能够将出现频率少但重要的基序提取出来。并将高斯混合模型作为真核启动子最小二乘支持向量机分类器中的核函数,将最小二乘支持向量机模型简化为最小二乘模型,计算量减少。辨识结果表明,该算法的辨识精度优于贝叶斯辨识算法,和RBF核LS-SVM相比,辨识精度基本相同,建模时间略有缩短。

关键词: 高斯混合模型, 核函数, 最小二乘支持向量机, 脱氧核糖核酸, 模型简化, 算法

CLC Number: