化工学报 ›› 2023, Vol. 74 ›› Issue (8): 3429-3437.DOI: 10.11949/0438-1157.20230451

• 过程系统工程 • 上一篇    下一篇

基于改进深度强化学习的乙烯裂解炉操作优化

诸程瑛(), 王振雷()   

  1. 华东理工大学信息科学与工程学院,上海 200237
  • 收稿日期:2023-05-09 修回日期:2023-07-12 出版日期:2023-08-25 发布日期:2023-10-18
  • 通讯作者: 王振雷
  • 作者简介:诸程瑛(1995—),女,硕士研究生, zhulady@me.com
  • 基金资助:
    国家自然科学基金基础科学中心项目(61988101);国家自然科学基金面上项目(62073142);中央高校基本科研业务费专项及浦东新区科技发展基金项目(PKX2021-R03)

Operation optimization of ethylene cracking furnace based on improved deep reinforcement learning algorithm

Chengying ZHU(), Zhenlei WANG()   

  1. College of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2023-05-09 Revised:2023-07-12 Online:2023-08-25 Published:2023-10-18
  • Contact: Zhenlei WANG

摘要:

乙烯裂解炉是乙烯生产的核心,对其生产操作优化的研究在提高乙烯工厂生产水平和经济效益方面具有重要意义。裂解炉中的裂解过程具有高维度、多模态和非线性的特征,传统优化方法难以实现根据工况变化的操作优化。针对上述问题,提出基于改进TD3深度强化学习算法的乙烯裂解炉操作优化,首先结合裂解过程将裂解炉一个运行周期内的操作策略视为顺序决策,利用实际生产过程数据和人工神经网络对裂解炉生产过程建模作为强化学习智能体交互的环境,然后引入多评价网络机制估计动作价值,有效缓解TD3训练缓慢和策略保守的现象,最后应用该算法求解乙烯裂解炉生产操作优化问题得到有效的优化策略,验证了所提算法的有效性。实验结果表明,所提出的操作优化策略显著提高了裂解炉主要产物的收率。

关键词: 深度强化学习, 乙烯裂解炉, 操作优化, 裂解过程, 模型, 神经网络, 算法

Abstract:

The ethylene cracker is the core of ethylene production, and the study of its production optimization is of great significance in improving the production level and economic efficiency of ethylene plants. The cracking process in the cracking furnace has high-dimensional, multi-modal and nonlinear characteristics, and it is difficult for traditional optimization methods to achieve operation optimization according to changes in working conditions. Therefore, we propose an improved deep reinforcement learning-based optimization method for ethylene cracker operation. Firstly, the operation strategy of the cracker within one cycle is considered as a sequential decision sequence, and then the process of ethylene cracker production operation optimization is modeled by combining the actual production process and artificial neural network. Secondly, the multi-Critics network mechanism is introduced to estimate the state-action value, which effectively reduces the slow training and conservation strategy of twin delayed deep deterministic policy gradient (TD3) algorithm. Finally, the algorithm is applied to solve the ethylene cracker production operation optimization problem to obtain an effective optimization strategy, which verifies the effectiveness of the proposed algorithm. The experiment results show that the proposed operation optimization strategy significantly improves the yields of the main product of the cracker.

Key words: deep reinforcement learning, ethylene cracking furnace, operation optimization, cracking process, model, neural network, algorithm

中图分类号: