CIESC Journal ›› 2025, Vol. 76 ›› Issue (3): 1143-1155.DOI: 10.11949/0438-1157.20240872

• Process system engineering • Previous Articles     Next Articles

Multi-output tri-training heterogeneous soft sensor modeling based on time difference

Dafen WANG1(), Lili TANG2, Xinyan ZHANG1, Chunyu NIE1, Mingzhu LI3, Jing WU1,3()   

  1. 1.School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, Guizhou, China
    2.School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, Guangdong, China
    3.ZX-YZ School of Network Science, Haikou University of Economics, Haikou 570203, Hainan, China
  • Received:2024-08-01 Revised:2024-09-27 Online:2025-03-28 Published:2025-03-25
  • Contact: Jing WU

基于时差的多输出tri-training异构软测量建模

王大芬1(), 唐莉丽2, 张鑫焱1, 聂春雨1, 李明珠3, 吴菁1,3()   

  1. 1.贵州民族大学数据科学与信息工程学院,贵州 贵阳 550025
    2.华南理工大学自动化科学与工程学院,广东 广州 510641
    3.海口经济学院中芯依智网络学院,海南 海口 570203
  • 通讯作者: 吴菁
  • 作者简介:王大芬(1998—),女,硕士研究生,wangdafen2024@163.com
  • 基金资助:
    贵州省教育厅自然科学研究项目([2023]012);贵州省高层次创新型人才项目(GCC[2023]027);贵州民族大学博士科研启动项目(GZMUZK [2024]QD11);海南省自然科学基金项目(623QN256)

Abstract:

Soft sensor techniques provide an effective solution for the predicting of important and hard-to-measure variables in industrial processes. However, due to the complexity of industrial processes and the high cost of data acquisition, the distribution of labeled data and unlabeled data is unbalanced. At this point, constructing high-performance soft sensor models becomes a challenge. To address this problem, a multi-output tri-training heterogeneous soft sensor based on time difference is proposed. By constructing a new tri-training framework, three models, namely MGPR (multi-output Gaussian process regression), MRVM (multi-output relevance vector machine), and MLSSVM (multi-output least squares support vector machine), are used as baseline supervised regressors that are trained and iterated using labeled data; Meanwhile, the TD (time difference) is introduced to improve the dynamic characteristics of the model, and the parameters of the model are optimized by KF (Kalman filtering) to improve its prediction performance. Finally, the model was validated by simulating the wastewater treatment platform (benchmark simulation model 1, BSM1) and an actual wastewater treatment plant. The results show that the model can significantly improve the adaptive and predictive performance of the soft sensor model under the imbalance of data distribution compared with the traditional soft sensor modeling approach.

Key words: tri-training, soft sensor, time difference, co-training strategy, ensemble, prediction, process control

摘要:

软测量技术为工业过程中重要变量及难测变量的预测提供了一个有效的解决办法。然而,由于工业过程的复杂化和高昂的数据获取成本,使得标记数据与未标记数据分布不平衡。此时,构建高性能的软测量模型成为一个挑战。针对这一问题,提出了一种基于时差的多输出tri-training异构软测量方法。通过构建一种新的tri-training框架,采用多输出的高斯过程回归(multi-output Gaussian process regression,MGPR)、相关向量机(multi-output relevance vector machine,MRVM)、最小二乘支持向量机(multi-output least squares support vector machine,MLSSVM)三种模型作为基线监督回归器,使用标记数据进行训练和迭代;同时,引入时间差分(time difference,TD)改进模型的动态特性,并通过卡尔曼滤波(Kalman filtering,KF)优化模型的参数,提高其预测性能;最后通过模拟污水处理平台(benchmark simulation model 1,BSM1)和实际污水处理厂对该模型进行了验证。结果表明,与传统的软测量建模方法相比,该模型能显著提高数据分布不平衡下软测量模型的自适应性和预测性能。

关键词: tri-training, 软测量, 时间差分, 协同训练, 集成, 预测, 过程控制

CLC Number: