化工学报 ›› 2023, Vol. 74 ›› Issue (10): 4208-4217.DOI: 10.11949/0438-1157.20230858

• 过程系统工程 • 上一篇    下一篇

基于数据挖掘的玉米淀粉果糖生产流程的关键位点筛选

张忠义1(), 张磊1, 王宇2, 董亚超1, 陶进2, 李义2, 佟毅2, 庄钰1(), 刘琳琳1, 都健1()   

  1. 1.大连理工大学化工学院,辽宁 大连 116024
    2.中粮生物科技有限公司,北京 100005
  • 收稿日期:2023-08-18 修回日期:2023-10-17 出版日期:2023-10-25 发布日期:2023-12-22
  • 通讯作者: 庄钰,都健
  • 作者简介:张忠义(2000—),男,硕士研究生,dgzzy@mail.dlut.edu.cn
  • 基金资助:
    国家重点研发计划项目(2021YFD2101000)

Data mining-based screening of key points for corn starch and sugar production process

Zhongyi ZHANG1(), Lei ZHANG1, Yu WANG2, Yachao DONG1, Jin TAO2, Yi LI2, Yi TONG2, Yu ZHUANG1(), Linlin LIU1, Jian DU1()   

  1. 1.School of Chemical Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China
    2.COFCO Biotechnology Co. , Ltd. , Beijing 100005, China
  • Received:2023-08-18 Revised:2023-10-17 Online:2023-10-25 Published:2023-12-22
  • Contact: Yu ZHUANG, Jian DU

摘要:

玉米深加工制果糖生产流程存在着控制落后,生产加工缺乏精细化的问题,然而由于其流程复杂,基于机理的模型难以建立与优化。大数据技术提供了有效解决方案,通过大量的生产数据来挖掘流程中知识、筛选出关键位点。首先,选取了该工艺流程的关键目标变量并利用大数据技术对生产流程中的原始数据进行缺失值处理、异常值处理、降噪和降维等预处理,然后构建了随机森林(RF)、极端梯度提升树(XGBoost)和人工神经网络(ANN)三种机器学习模型,其模型的R2均达到0.90以上,最后利用SHAP方法对不同的机器学习模型进行解释,验证模型的可信性,得到不同模型的特征对预测结果的贡献程度,并综合不同模型解释的结果,得到生产流程中不同位点的重要程度排序,结合生产经验进行机理分析,得到最终的关键位点表。

关键词: 数据挖掘, 食品加工, 系统工程, 可解释性, 神经网络

Abstract:

The production process of corn deep processing to produce fructose has problems of outdated control and lack of refinement in production and processing. However, due to the complexity of the process, it is difficult to establish and optimize mechanism based models. Big data technology offers an effective solution by utilizing a substantial volume of production data to uncover process insights and identify key points. Initially, crucial target variables of this process were selected. Using big data technology, the original production data underwent preprocessing steps such as handling missing values, addressing outliers, noise reduction, and dimensionality reduction. Subsequently, three machine learning models—random forest (RF), extreme gradient boosting (XGBoost), and artificial neural network (ANN) were constructed, all achieving R2 values exceeding 0.90. Lastly, the SHAP method is used to explain different machine learning models, validate the credibility of the models, obtain the contribution levels of different features to the prediction results, and integrate the results of explanations from different models. This process generates a ranking of the importance of different points in the production process. Combining this with production experience, a mechanistic analysis is conducted to obtain the final key point table.

Key words: data mining, food processing, systems engineering, explainability, neural network

中图分类号: