化工学报 ›› 2016, Vol. 67 ›› Issue (3): 820-826.DOI: 10.11949/j.issn.0438-1157.20151921

• 研究论文 • 上一篇    下一篇

一种新颖的小样本整体趋势扩散技术

朱宝1, 陈忠圣2, 余乐安1   

  1. 1. 北京化工大学经济管理学院, 北京 100029;
    2. 北京化工大学信息科学与技术学院, 北京 100029
  • 收稿日期:2015-12-17 修回日期:2016-01-06 出版日期:2016-03-05 发布日期:2016-01-12
  • 通讯作者: 余乐安
  • 基金资助:

    国家自然科学基金项目(71433001)。

A novel mega-trend-diffusion for small sample

ZHU Bao1, CHEN Zhongsheng2, YU Le'an1   

  1. 1. School of Economics and Management Science, Beijing University of Chemical Technology, Beijing 100029, China;
    2. College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • Received:2015-12-17 Revised:2016-01-06 Online:2016-03-05 Published:2016-01-12
  • Contact: 67
  • Supported by:

    supported by the National Natural Science Foundation of China(71433001).

摘要:

基于数据驱动的生产过程建模、优化与控制是当今学术界与企业界的研究与应用热点。大数据时代小样本问题不可忽视。针对诸如人工神经网络(ANNs)、极限学习机(ELMs)等传统建模方法在小样本条件下难以获得较高的学习精度,提出了一种新颖的多分布整体趋势扩散技术(multi-distribution mega-trend-diffusion, MD-MTD)用于提升小样本学习精度。通过整体扩散技术推估小样本属性可接受范围,在整体趋势扩散的基础上,增加了均匀分布和三角分布描述小样本数据特性,生成虚拟样本,填补小样本数据点间的信息间隔。利用标准函数产生标准样本,在正交实验和不均匀样本实验下论证了MD-MTD的合理性和有效性,用MLCC和PTA两个实际的工业数据集进一步验证了MD-MTD的实用性。实验结果表明,MD-MTD能提高小样本学习精度8%以上。

关键词: 小样本集, 整体趋势扩散技术, 虚拟样本, 正交实验

Abstract:

Process modeling, optimization and control methods based on data-driven attract attention to both academic community and business circles in terms of its research domains and applications. Even in Big Data era, small sample problems cannot be ignored. In view of the difficulty of obtaining high learning accuracy with small-sample-set using traditional modeling methods, such as artificial neural networks (ANNs), extreme learning machine (ELMs), etc., a novel technology of multi-distribution mega-trend-diffusion (MD-MTD) is proposed to improve the learning accuracy of small-sample-set. The mega-trend-diffusion (MTD) is employed to estimate the acceptable range of the attribution of small sample. The uniform distribution and triangular distribution are added based on MTD to describe data characteristics, which are used to generate virtual samples and fill information gaps among observations in small sample. A benchmarking function is utilized to generate benchmarking samples under the orthogonal test and inhomogeneous sample test in order to verify the reasonability and effectiveness of the MD-MTD, and two industrial real-world datasets include MLCC and PTA are used to further confirm the practicability of the MD-MTD. The results of the validation tests manifest that the proposed MD-MTD can improve the learning accuracy of more than 8% for small sample.

Key words: small-sample-set, mega-trend-diffusion, virtual sample, orthogonal test

中图分类号: