Operation optimization of ethylene cracking furnace based on improved deep reinforcement learning algorithm

doi:10.11949/0438-1157.20230451

Abstract

Abstract:

The ethylene cracker is the core of ethylene production, and the study of its production optimization is of great significance in improving the production level and economic efficiency of ethylene plants. The cracking process in the cracking furnace has high-dimensional, multi-modal and nonlinear characteristics, and it is difficult for traditional optimization methods to achieve operation optimization according to changes in working conditions. Therefore, we propose an improved deep reinforcement learning-based optimization method for ethylene cracker operation. Firstly, the operation strategy of the cracker within one cycle is considered as a sequential decision sequence, and then the process of ethylene cracker production operation optimization is modeled by combining the actual production process and artificial neural network. Secondly, the multi-Critics network mechanism is introduced to estimate the state-action value, which effectively reduces the slow training and conservation strategy of twin delayed deep deterministic policy gradient (TD3) algorithm. Finally, the algorithm is applied to solve the ethylene cracker production operation optimization problem to obtain an effective optimization strategy, which verifies the effectiveness of the proposed algorithm. The experiment results show that the proposed operation optimization strategy significantly improves the yields of the main product of the cracker.

Key words: deep reinforcement learning, ethylene cracking furnace, operation optimization, cracking process, model, neural network, algorithm

摘要：

乙烯裂解炉是乙烯生产的核心，对其生产操作优化的研究在提高乙烯工厂生产水平和经济效益方面具有重要意义。裂解炉中的裂解过程具有高维度、多模态和非线性的特征，传统优化方法难以实现根据工况变化的操作优化。针对上述问题，提出基于改进TD3深度强化学习算法的乙烯裂解炉操作优化，首先结合裂解过程将裂解炉一个运行周期内的操作策略视为顺序决策，利用实际生产过程数据和人工神经网络对裂解炉生产过程建模作为强化学习智能体交互的环境，然后引入多评价网络机制估计动作价值，有效缓解TD3训练缓慢和策略保守的现象，最后应用该算法求解乙烯裂解炉生产操作优化问题得到有效的优化策略，验证了所提算法的有效性。实验结果表明，所提出的操作优化策略显著提高了裂解炉主要产物的收率。

关键词: 深度强化学习, 乙烯裂解炉, 操作优化, 裂解过程, 模型, 神经网络, 算法

CLC Number:

TP 272

Chengying ZHU, Zhenlei WANG. Operation optimization of ethylene cracking furnace based on improved deep reinforcement learning algorithm[J]. CIESC Journal, 2023, 74(8): 3429-3437.

诸程瑛, 王振雷. 基于改进深度强化学习的乙烯裂解炉操作优化[J]. 化工学报, 2023, 74(8): 3429-3437.

Figures/Tables 13

Fig.1 The process of reinforcement learning agent interacting with the environment

Table 1 Algorithm process of MTD3

算法1:MTD3算法流程

随机初始化N个Critic网络 $Q_{θ_{1}}, Q_{θ_{2}}, \dots, Q_{θ_{N}}$ ，随机初始化Actor网络 $π_{ω}$ ；

初始化目标网络： $θ_{1}^{'} \leftarrow θ_{1}, θ_{2}^{'} \leftarrow θ_{2}, ω^{'} \leftarrow ω$ ；

初始化经验回放区B；

For Step =1 $\to D_{m a x}$ do：

初始化随机过程，获取环境初始状态 $s$

for t = 1 to T do：

根据当前策略和噪声选择动作 $a ~ π_{ω} (s) + ϵ$

$ϵ ~ N (0, σ)$ ，执行动作a，获得奖励r，环境状态变为 $s^{'}$

从B中随机采样K组( $s$ , a, r, $s^{'}$ )经验四元组

$\tilde{a} \leftarrow π_{ω^{'}} (s^{'}) + ϵ$ ， $ϵ ~ c l i p [N (0, \tilde{σ}), - c, c]$

$y \leftarrow r + \frac{γ}{N} {\sum_{i = 1}^{N} Q}_{θ_{i}^{'}} (s^{'}, \tilde{a})$

更新Critic网络参数：

$θ_{i} \leftarrow a r g m i n_{θ_{i}} \frac{1}{K} \sum {[y - Q_{θ_{i}} (s, a)]}^{2}$

if t mod d then：（延迟d轮更新策略网络参数）

根据策略梯度，更新Actor网络参数：

$\nabla_{ω} J (ω) = \frac{1}{K} \sum \nabla_{a} Q_{θ_{1}} (s, a) |_{a = π_{ω} (s)} \nabla_{ω} π_{ω} (s)$

软更新目标Actor网络和目标Critic网络：

$ω^{'} \leftarrow τ ω + (1 - τ) ω^{'}$

$θ_{i}^{'} \leftarrow τ θ_{i} + (1 - τ) θ_{i}^{'}$

end if

end for

Table 1 Algorithm process of MTD3

算法1:MTD3算法流程

随机初始化N个Critic网络 $Q_{θ_{1}}, Q_{θ_{2}}, \dots, Q_{θ_{N}}$ ，随机初始化Actor网络 $π_{ω}$ ；

初始化目标网络： $θ_{1}^{'} \leftarrow θ_{1}, θ_{2}^{'} \leftarrow θ_{2}, ω^{'} \leftarrow ω$ ；

初始化经验回放区B；

For Step =1 $\to D_{m a x}$ do：

初始化随机过程，获取环境初始状态 $s$

for t = 1 to T do：

根据当前策略和噪声选择动作 $a ~ π_{ω} (s) + ϵ$

$ϵ ~ N (0, σ)$ ，执行动作a，获得奖励r，环境状态变为 $s^{'}$

从B中随机采样K组( $s$ , a, r, $s^{'}$ )经验四元组

$\tilde{a} \leftarrow π_{ω^{'}} (s^{'}) + ϵ$ ， $ϵ ~ c l i p [N (0, \tilde{σ}), - c, c]$

$y \leftarrow r + \frac{γ}{N} {\sum_{i = 1}^{N} Q}_{θ_{i}^{'}} (s^{'}, \tilde{a})$

更新Critic网络参数：

$θ_{i} \leftarrow a r g m i n_{θ_{i}} \frac{1}{K} \sum {[y - Q_{θ_{i}} (s, a)]}^{2}$

if t mod d then：（延迟d轮更新策略网络参数）

根据策略梯度，更新Actor网络参数：

$\nabla_{ω} J (ω) = \frac{1}{K} \sum \nabla_{a} Q_{θ_{1}} (s, a) |_{a = π_{ω} (s)} \nabla_{ω} π_{ω} (s)$

软更新目标Actor网络和目标Critic网络：

$ω^{'} \leftarrow τ ω + (1 - τ) ω^{'}$

$θ_{i}^{'} \leftarrow τ θ_{i} + (1 - τ) θ_{i}^{'}$

end if

end for

Fig.2 MTD3-based production operation optimization framework

Table 2 Information of neural network models

网络输入变量	网络输出	网络层数	每层网络神经元个数	测试集MSE
COT，DHR， $D_{t}$	$C_{2} H_{4}$	2	10	1.3334×10^-10
COT，DHR， $D_{t}$	$C_{3} H_{6}$	2	10	9.5165×10^-8
COT，DHR， $D_{t}$	$C_{4} H_{6}$	2	10	6.1901×10^-9
COT， $D_{t}$	TMT	2	8	1.5998×10^-9

Table 2 Information of neural network models

网络输入变量	网络输出	网络层数	每层网络神经元个数	测试集MSE
COT，DHR， $D_{t}$	$C_{2} H_{4}$	2	10	1.3334×10^-10
COT，DHR， $D_{t}$	$C_{3} H_{6}$	2	10	9.5165×10^-8
COT，DHR， $D_{t}$	$C_{4} H_{6}$	2	10	6.1901×10^-9
COT， $D_{t}$	TMT	2	8	1.5998×10^-9

Table 3 Information of action space

动作量	描述	取值范围	归一化范围
COT	炉管出口温度/℃	[837.00, 844.00]	$[- 1,1]$
DHR	汽烃比	[0.50, 0.52]	$[- 1,1]$

Table 3 Information of action space

动作量	描述	取值范围	归一化范围
COT	炉管出口温度/℃	[837.00, 844.00]	$[- 1,1]$
DHR	汽烃比	[0.50, 0.52]	$[- 1,1]$

Table 4 Information of states space

状态量	描述	范围	归一化范围
$D_{t}$	当前运行天数/d	$[1,72]$	$[0,1]$
$T_{t}$	当前炉管外壁温度/℃	$[900.00,1080.00]$	$[0,1]$
${P_{C_{2}}}_{H_{4_{t}}}$	当前 $C_{2} H_{4}$ 收率	$[25.55,25.70]$	$[0,1]$
${P_{C_{3}}}_{H_{6_{t}}}$	当前 $C_{3} H_{6}$ 收率	$[12.80,12.90]$	$[0,1]$
${P_{C_{4}}}_{H_{6_{t}}}$	当前 $C_{4} H_{6}$ 收率	$[4.90,5.20]$	$[0,1]$

Table 4 Information of states space

状态量	描述	范围	归一化范围
$D_{t}$	当前运行天数/d	$[1,72]$	$[0,1]$
$T_{t}$	当前炉管外壁温度/℃	$[900.00,1080.00]$	$[0,1]$
${P_{C_{2}}}_{H_{4_{t}}}$	当前 $C_{2} H_{4}$ 收率	$[25.55,25.70]$	$[0,1]$
${P_{C_{3}}}_{H_{6_{t}}}$	当前 $C_{3} H_{6}$ 收率	$[12.80,12.90]$	$[0,1]$
${P_{C_{4}}}_{H_{6_{t}}}$	当前 $C_{4} H_{6}$ 收率	$[4.90,5.20]$	$[0,1]$

Table 5 Parameters setting and grid search range

参数名称	参数网格搜索范围	参数设定值
奖励系数： $μ$ ， $α$ ， $β$ ， $λ$	{1,10^-1, 5× $10$ ^-2,10^-2}	10^-2，1，1，1
Punishment	{-40, -60, -80, -100}	-100
Critic网络个数N	{1, 2, 3, 4}	3
折扣因子 $γ$	{0.99, 0.95, 0.90}	0.99
Mini-Batch size (K)	{8, 16, 32, 64}	32
Actor网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-4
Critic网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-3
更新率 $τ$	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	5×10^-3
Buffer length	{10³, 10⁴, 10⁵}	10⁴
策略延迟更新步数d	{2, 4, 6, 8}	2

Table 5 Parameters setting and grid search range

参数名称	参数网格搜索范围	参数设定值
奖励系数： $μ$ ， $α$ ， $β$ ， $λ$	{1,10^-1, 5× $10$ ^-2,10^-2}	10^-2，1，1，1
Punishment	{-40, -60, -80, -100}	-100
Critic网络个数N	{1, 2, 3, 4}	3
折扣因子 $γ$	{0.99, 0.95, 0.90}	0.99
Mini-Batch size (K)	{8, 16, 32, 64}	32
Actor网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-4
Critic网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-3
更新率 $τ$	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	5×10^-3
Buffer length	{10³, 10⁴, 10⁵}	10⁴
策略延迟更新步数d	{2, 4, 6, 8}	2

Fig.3 The training episode reward curves of MTD3, TD3 and PPO

Fig.4 The TMT trending curves corresponding to the strategy obtained by MTD3 and TD3

Fig.5 The optimal COT strategy obtained by different algorithms

Fig.6 The optimal DHR strategy obtained by different algorithms

Fig.7 Yields of three ethylene cracking products corresponding to different optimization strategies

Table 6 The average yield of three ethylene cracking products obtained by different algorithms for different strategies

策略	单个运行周期内平均收率/%			三烯平均收率/%
策略	$C_{2} H_{4}$	$C_{3} H_{6}$	$C_{4} H_{6}$	三烯平均收率/%
优化前	25.57033224	12.78792472	4.993497869	43.35175483
MCOA	25.68160822	12.84300845	5.001654668	43.52627133
TD3	25.64259930	12.84057390	4.998896120	43.48206925
MTD3	25.68328649	12.80894534	5.038997277	43.53122910
PPO	25.63879185	12.82470653	4.991962284	43.45546066

Table 6 The average yield of three ethylene cracking products obtained by different algorithms for different strategies

策略	单个运行周期内平均收率/%			三烯平均收率/%
策略	$C_{2} H_{4}$	$C_{3} H_{6}$	$C_{4} H_{6}$	三烯平均收率/%
优化前	25.57033224	12.78792472	4.993497869	43.35175483
MCOA	25.68160822	12.84300845	5.001654668	43.52627133
TD3	25.64259930	12.84057390	4.998896120	43.48206925
MTD3	25.68328649	12.80894534	5.038997277	43.53122910
PPO	25.63879185	12.82470653	4.991962284	43.45546066

References 25

1	徐海丰.全球乙烯产业格局变化及发展前景分析[J].国际石油经济, 2023, 31(1): 65-70, 82.
	Xu H F. The change and development prospect of global ethylene industry[J]. International Petroleum Economics, 2023, 31(1): 65-70, 82.
2	陆浩.我国乙烯工业及下游产业链发展现状与展望[J]. 当代石油石化, 2022, 30(4): 22-27.
	Lu H. Development status and prospect of China’s ethylene industry chain[J]. Petroleum & Petrochemical Today, 2022, 30(4): 22-27.
3	刘春平, 王昕, 王振雷. 基于相关积分优化方法的裂解炉优化[J]. 化工学报, 2015, 66(10): 4067-4075.
	Liu C P, Wang X, Wang Z L. Optimization of cracking furnace based on correlation integral optimal method[J]. CIESC Journal, 2015, 66(10): 4067-4075.
4	Wang T, Ye Z C, Wang X J, et al. Improved distributed optimization algorithm and its application in energy saving of ethylene plant[J]. Chemical Engineering Science, 2022, 251: 117449.
5	耿志强, 毕帅, 王尊, 等. 基于改进NSGA-Ⅱ算法的乙烯裂解炉操作优化[J]. 化工学报, 2020, 71(3): 1088-1094.
	Geng Z Q, Bi S, Wang Z, et al. Operation optimization of ethylene cracking furnace based on improved NSGA-Ⅱ algorithm[J]. CIESC Journal, 2020, 71(3): 1088-1094.
6	Li C F, Zhu Q X, Geng Z Q. Multi-objective particle swarm optimization hybrid algorithm: an application on industrial cracking furnace[J]. Industrial & Engineering Chemistry Research, 2007, 46(11): 3602-3609.
7	严逍亚, 王振雷, 王昕. 多策略改进的土狼算法及工业应用[C/OL]//第31届中国过程控制会议(CPCC 2020)摘要集. 徐州, 2020: 59. .
	Yan X Y, Wang Z L, Wang X. A hybrid strategy modified coyote optimization algorithm and its industrial application[C/OL]//TCPC, CAA. CPCC 2020 Summary Set. Xuzhou, 2020: 59. .
8	Nian X Y, Wang Z L, Qian F. A hybrid algorithm based on differential evolution and group search optimization and its application on ethylene cracking furnace[J]. Chinese Journal of Chemical Engineering, 2013, 21(5): 537-543.
9	黄一俞. 乙烯裂解炉过程建模与操作优化[D]. 北京: 北京化工大学, 2005.
	Huang Y Y. Process modeling and operation optimization of ethylene cracking furnace[D]. Beijing: Beijing University of Chemical Technology, 2005.
10	王秋懿. 基于改进NNIA的乙烯裂解炉操作优化[D]. 北京: 北京化工大学, 2022.
	Wang Q Y. Operation optimization of ethylene cracking furnace based on improved NNIA[D]. Beijing: Beijing University of Chemical Technology, 2022.
11	尚田丰, 耿志强. 基于GA-RBF网络的乙烯裂解炉在线操作优化[J]. 计算机与应用化学, 2009, 26(8): 1003-1007.
	Shang T F, Geng Z Q. Online operation optimization in ethylene cracking furnace based on GA-RBF network[J]. Computers and Applied Chemistry, 2009, 26(8): 1003-1007.
12	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
13	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[EB/OL]. 2015. .
14	Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[EB/OL]. 2018. .
15	Sutton R S, Barto A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998.
16	南栖仙策. 强化学习控制白皮书[R/OL]. 2022. .
	POLIXIR 2022-RL-Control White Paper[R/OL]. 2022. .
17	Zhu L W, Cui Y D, Takami G, et al. Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process[J]. Control Engineering Practice, 2020, 97: 104331.
18	Powell B K M, Machalek D, Quah T. Real-time optimization using reinforcement learning[J]. Computers & Chemical Engineering, 2020, 143: 107077.
19	洪博岩. 乙烯裂解炉平均COT温度先进控制系统的开发与应用[J]. 石油化工高等学校学报, 2019, 32(2): 92-97.
	Hong B Y. Application and development of advanced control system of average COT temperature in the ethylene cracking furnace[J]. Journal of Petrochemical Universities, 2019, 32(2): 92-97.
20	Edwin E H, Arnesen T, Hugosson G I. Evaluation of thermal cracker operation by use of an infrared camera[J]. Proceedings of SPIE-The International Society for Optical Engineering, 1998, 3361(2): 125-136
21	Morales E F, Murrieta-Cid R, Becerra I, et al. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning[J]. Intelligent Service Robotics, 2021, 14(5): 773-805.
22	Bengio Y, Lodi A, Prouvost A. Machine learning for combinatorial optimization: a methodological tour d’horizon[J]. European Journal of Operational Research, 2021, 290(2): 405-421.
23	Plaat A. Deep Reinforcement Learning[M]. Singapore: Springer Nature Singapore, 2022.
24	Pan L, Cai Q P, Huang L B. Softmax deep double deterministic policy gradients[EB/OL]. 2020. .
25	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[EB/OL]. 2017. .

[1]	Jiahao SONG, Wen WANG. Study on coupling operation characteristics of Stirling engine and high temperature heat pipe [J]. CIESC Journal, 2023, 74(S1): 287-294.
[2]	Mengya LIAN, Yingying TAN, Lin WANG, Feng CHEN, Yifei CAO. Heating performance of air preheated integrated ground water heat pump air-conditioning system [J]. CIESC Journal, 2023, 74(S1): 311-319.
[3]	Zhenghao JIN, Lijie FENG, Shuhong LI. Energy and exergy analysis of a solution cross-type absorption-resorption heat pump using NH₃/H₂O as working fluid [J]. CIESC Journal, 2023, 74(S1): 53-63.
[4]	Kaijie WEN, Li GUO, Zhaojie XIA, Jianhua CHEN. A rapid simulation method of gas-solid flow by coupling CFD and deep learning [J]. CIESC Journal, 2023, 74(9): 3775-3785.
[5]	Ke LI, Jian WEN, Biping XIN. Study on influence mechanism of vacuum multi-layer insulation coupled with vapor-cooled shield on self-pressurization process of liquid hydrogen storage tank [J]. CIESC Journal, 2023, 74(9): 3786-3796.
[6]	Hao WANG, Zhenlei WANG. Model simplification strategy of cracking furnace coking based on adaptive spectroscopy method [J]. CIESC Journal, 2023, 74(9): 3855-3864.
[7]	Yue CAO, Chong YU, Zhi LI, Minglei YANG. Industrial data driven transition state detection with multi-mode switching of a hydrocracking unit [J]. CIESC Journal, 2023, 74(9): 3841-3854.
[8]	Gang YIN, Yihui LI, Fei HE, Wenqi CAO, Min WANG, Feiya YAN, Yu XIANG, Jian LU, Bin LUO, Runting LU. Early warning method of aluminum reduction cell leakage accident based on KPCA and SVM [J]. CIESC Journal, 2023, 74(8): 3419-3428.
[9]	Guoze CHEN, Dong WEI, Qian GUO, Zhiping XIANG. Optimal power point optimization method for aluminum-air batteries under load tracking condition [J]. CIESC Journal, 2023, 74(8): 3533-3542.
[10]	Jintong LI, Shun QIU, Wenshou SUN. Oxalic acid and UV enhanced arsenic leaching from coal in flue gas desulfurization by coal slurry [J]. CIESC Journal, 2023, 74(8): 3522-3532.
[11]	Xudong YU, Qi LI, Niancu CHEN, Li DU, Siying REN, Ying ZENG. Phase equilibria and calculation of aqueous ternary system KCl + CaCl₂ + H₂O at 298.2, 323.2, and 348.2 K [J]. CIESC Journal, 2023, 74(8): 3256-3265.
[12]	Linqi YAN, Zhenlei WANG. Multi-step predictive soft sensor modeling based on STA-BiLSTM-LightGBM combined model [J]. CIESC Journal, 2023, 74(8): 3407-3418.
[13]	Ye XU, Wenjun HUANG, Junpeng MI, Chuanchuan SHEN, Jianxiang JIN. Surge diagnosis method of centrifugal compressor based on multi-source data fusion [J]. CIESC Journal, 2023, 74(7): 2979-2987.
[14]	Yuying GUO, Jiaqiang JING, Wanni HUANG, Ping ZHANG, Jie SUN, Yu ZHU, Junxuan FENG, Hongjiang LU. Water-lubricated drag reduction and pressure drop model modification for heavy oil pipeline [J]. CIESC Journal, 2023, 74(7): 2898-2907.
[15]	Chunyu LIU, Huanyu ZHOU, Yue MA, Changtao YUE. Drying characteristics and mathematical model of CaO-conditioned oil sludge [J]. CIESC Journal, 2023, 74(7): 3018-3027.