CIESC Journal ›› 2025, Vol. 76 ›› Issue (6): 2838-2847.DOI: 10.11949/0438-1157.20241249

• Process system engineering • Previous Articles     Next Articles

Optimal control for neutralization process of citric acid through tricalcium reaction based on reinforcement learning algorithm

Lina ZHU1(), Maodong MIAO2, Sai JIN2, Zhonggai ZHAO1(), Fuxin SUN2, Guiyang SHI3, Fei LIU1   

  1. 1.Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Wuxi 214122, Jiangsu, China
    2.Jiangsu Guoxin Union Energy Co. , Ltd. , Wuxi 214203, Jiangsu, China
    3.National Engineering Research Center of Cereal Fermentation and Food Bicmanufacturing, Jiangnan University, Wuxi 214122, Jiangsu, China
  • Received:2024-11-05 Revised:2024-12-24 Online:2025-07-09 Published:2025-06-25
  • Contact: Zhonggai ZHAO

柠檬酸三钙中和过程的强化学习优化控制

祝丽娜1(), 苗茂栋2, 金赛2, 赵忠盖1(), 孙福新2, 石贵阳3, 刘飞1   

  1. 1.江南大学轻工过程先进控制教育部重点实验室,江苏 无锡 214122
    2.江苏国信协联能源有限公司,江苏 无锡 214203
    3.江南大学粮食发酵工艺与技术国家工程研究中心,江苏 无锡 214122
  • 通讯作者: 赵忠盖
  • 作者简介:祝丽娜(2000—),女,硕士研究生,3218385228@qq.com
  • 基金资助:
    国家自然科学基金项目(62473175)

Abstract:

Tricalcium neutralization process is a crucial procedure in the citric acid extraction technology and a key stage influencing the quality and yield of the final citric acid products. The process is characterized by time delay, absence of reference trajectory, significant variations in initial materials, irreversible reaction, which is difficult to be optimally controlled by traditional control algorithms. Aiming at the above problems, the actual tricalcium citrate neutralization process is optimized and controlled by the reinforcement learning algorithm deep deterministic policy (DDPG). Considering that model-based reinforcement learning approach enables the agent to conduct cost-free exploration within the learned model, the long short term memory (LSTM) model of the tricalcium neutralization process is established, and its loss function is improved to reduce the gap between the simulation model and the actual environment. Subsequently, the model is used to participate in reinforcement learning training, and finally the trained control strategy is used in the actual tricalcium neutralization process. The experimental results indicate that this method can successfully apply the optimal strategy trained through simulation to the actual tricalcium neutralization process, achieving satisfactory results.

Key words: tricalcium neutralization process, optimal control, DDPG, model-based reinforcement learning, LSTM

摘要:

三钙中和过程是柠檬酸提取工艺的重要工序,是影响柠檬酸成品质量、产品收率的关键工段。该过程具有时滞、无参考轨迹、初始物料变化大、反应不可逆等特点,传统控制算法很难对其进行优化控制。针对上述问题,用强化学习算法深度确定性策略(DDPG)对实际的三钙中和过程进行优化控制。考虑到基于模型的强化学习方法可使智能体在学习的模型中进行无成本的探索,建立三钙中和过程的长短期记忆(LSTM)模型,并对其损失函数进行改进,减小了仿真模型与实际环境的差距,然后利用该模型进行强化学习训练,并将训练好的控制策略用于实际三钙中和过程。实验结果表明,该方法可以将仿真训练出的最优策略成功应用于实际三钙中和过程,并取得较好的结果。

关键词: 三钙中和过程, 优化控制, 深度确定性策略, 基于模型的强化学习, 长短期记忆

CLC Number: