基于改进深度强化学习的乙烯裂解炉操作优化

doi:10.11949/0438-1157.20230451

摘要/Abstract

摘要：

乙烯裂解炉是乙烯生产的核心，对其生产操作优化的研究在提高乙烯工厂生产水平和经济效益方面具有重要意义。裂解炉中的裂解过程具有高维度、多模态和非线性的特征，传统优化方法难以实现根据工况变化的操作优化。针对上述问题，提出基于改进TD3深度强化学习算法的乙烯裂解炉操作优化，首先结合裂解过程将裂解炉一个运行周期内的操作策略视为顺序决策，利用实际生产过程数据和人工神经网络对裂解炉生产过程建模作为强化学习智能体交互的环境，然后引入多评价网络机制估计动作价值，有效缓解TD3训练缓慢和策略保守的现象，最后应用该算法求解乙烯裂解炉生产操作优化问题得到有效的优化策略，验证了所提算法的有效性。实验结果表明，所提出的操作优化策略显著提高了裂解炉主要产物的收率。

关键词: 深度强化学习, 乙烯裂解炉, 操作优化, 裂解过程, 模型, 神经网络, 算法

Abstract:

The ethylene cracker is the core of ethylene production, and the study of its production optimization is of great significance in improving the production level and economic efficiency of ethylene plants. The cracking process in the cracking furnace has high-dimensional, multi-modal and nonlinear characteristics, and it is difficult for traditional optimization methods to achieve operation optimization according to changes in working conditions. Therefore, we propose an improved deep reinforcement learning-based optimization method for ethylene cracker operation. Firstly, the operation strategy of the cracker within one cycle is considered as a sequential decision sequence, and then the process of ethylene cracker production operation optimization is modeled by combining the actual production process and artificial neural network. Secondly, the multi-Critics network mechanism is introduced to estimate the state-action value, which effectively reduces the slow training and conservation strategy of twin delayed deep deterministic policy gradient (TD3) algorithm. Finally, the algorithm is applied to solve the ethylene cracker production operation optimization problem to obtain an effective optimization strategy, which verifies the effectiveness of the proposed algorithm. The experiment results show that the proposed operation optimization strategy significantly improves the yields of the main product of the cracker.

Key words: deep reinforcement learning, ethylene cracking furnace, operation optimization, cracking process, model, neural network, algorithm

中图分类号:

TP 272

诸程瑛, 王振雷. 基于改进深度强化学习的乙烯裂解炉操作优化[J]. 化工学报, 2023, 74(8): 3429-3437.

Chengying ZHU, Zhenlei WANG. Operation optimization of ethylene cracking furnace based on improved deep reinforcement learning algorithm[J]. CIESC Journal, 2023, 74(8): 3429-3437.

图/表 13

图1 强化学习智能体与环境交互的过程

Fig.1 The process of reinforcement learning agent interacting with the environment

表 1 MTD3算法流程

Table 1 Algorithm process of MTD3

算法1:MTD3算法流程

随机初始化N个Critic网络 $Q θ 1, Q θ 2, …, Q θ N$ ，随机初始化Actor网络 $π ω$ ；

初始化目标网络： $θ 1' ← θ 1, θ 2' ← θ 2, ω' ← ω$ ；

初始化经验回放区B；

For Step =1 $→ D m a x$ do：

初始化随机过程，获取环境初始状态 $s$

for t = 1 to T do：

根据当前策略和噪声选择动作 $a ~ π ω s + ϵ$

$ϵ ~ N 0, σ$ ，执行动作a，获得奖励r，环境状态变为 $s'$

从B中随机采样K组( $s$ , a, r, $s'$ )经验四元组

$a ˜ ← π ω' s' + ϵ$ ， $ϵ ~ c l i p N 0, σ ˜, - c, c$

$y ← r + γ N ∑ i = 1 N Q θ i' (s', a ˜)$

更新Critic网络参数：

$θ i ← a r g m i n θ i 1 K ∑ y - Q θ i (s, a) 2$

if t mod d then：（延迟d轮更新策略网络参数）

根据策略梯度，更新Actor网络参数：

$∇ ω J ω = 1 K ∑ ∇ a Q θ 1 s, a | a = π ω s ∇ ω π ω (s)$

软更新目标Actor网络和目标Critic网络：

$ω' ← τ ω + 1 - τ ω'$

$θ i' ← τ θ i + (1 - τ) θ i'$

end if

end for

表 1 MTD3算法流程

Table 1 Algorithm process of MTD3

算法1:MTD3算法流程

随机初始化N个Critic网络 $Q θ 1, Q θ 2, …, Q θ N$ ，随机初始化Actor网络 $π ω$ ；

初始化目标网络： $θ 1' ← θ 1, θ 2' ← θ 2, ω' ← ω$ ；

初始化经验回放区B；

For Step =1 $→ D m a x$ do：

初始化随机过程，获取环境初始状态 $s$

for t = 1 to T do：

根据当前策略和噪声选择动作 $a ~ π ω s + ϵ$

$ϵ ~ N 0, σ$ ，执行动作a，获得奖励r，环境状态变为 $s'$

从B中随机采样K组( $s$ , a, r, $s'$ )经验四元组

$a ˜ ← π ω' s' + ϵ$ ， $ϵ ~ c l i p N 0, σ ˜, - c, c$

$y ← r + γ N ∑ i = 1 N Q θ i' (s', a ˜)$

更新Critic网络参数：

$θ i ← a r g m i n θ i 1 K ∑ y - Q θ i (s, a) 2$

if t mod d then：（延迟d轮更新策略网络参数）

根据策略梯度，更新Actor网络参数：

$∇ ω J ω = 1 K ∑ ∇ a Q θ 1 s, a | a = π ω s ∇ ω π ω (s)$

软更新目标Actor网络和目标Critic网络：

$ω' ← τ ω + 1 - τ ω'$

$θ i' ← τ θ i + (1 - τ) θ i'$

end if

end for

图2 基于MTD3的生产操作优化框架

Fig.2 MTD3-based production operation optimization framework

表2 神经网络模型信息

Table 2 Information of neural network models

网络输入变量	网络输出	网络层数	每层网络神经元个数	测试集MSE
COT，DHR， $D t$	$C 2 H 4$	2	10	1.3334×10^-10
COT，DHR， $D t$	$C 3 H 6$	2	10	9.5165×10^-8
COT，DHR， $D t$	$C 4 H 6$	2	10	6.1901×10^-9
COT， $D t$	TMT	2	8	1.5998×10^-9

表2 神经网络模型信息

Table 2 Information of neural network models

网络输入变量	网络输出	网络层数	每层网络神经元个数	测试集MSE
COT，DHR， $D t$	$C 2 H 4$	2	10	1.3334×10^-10
COT，DHR， $D t$	$C 3 H 6$	2	10	9.5165×10^-8
COT，DHR， $D t$	$C 4 H 6$	2	10	6.1901×10^-9
COT， $D t$	TMT	2	8	1.5998×10^-9

表3 动作空间信息

Table 3 Information of action space

动作量	描述	取值范围	归一化范围
COT	炉管出口温度/℃	[837.00, 844.00]	$[- 1,1]$
DHR	汽烃比	[0.50, 0.52]	$[- 1,1]$

表3 动作空间信息

Table 3 Information of action space

动作量	描述	取值范围	归一化范围
COT	炉管出口温度/℃	[837.00, 844.00]	$[- 1,1]$
DHR	汽烃比	[0.50, 0.52]	$[- 1,1]$

表4 状态空间信息

Table 4 Information of states space

状态量	描述	范围	归一化范围
$D t$	当前运行天数/d	$[1,72]$	$[0,1]$
$T t$	当前炉管外壁温度/℃	$[900.00,1080.00]$	$[0,1]$
$P C 2 H 4 t$	当前 $C 2 H 4$ 收率	$[25.55,25.70]$	$[0,1]$
$P C 3 H 6 t$	当前 $C 3 H 6$ 收率	$[12.80,12.90]$	$[0,1]$
$P C 4 H 6 t$	当前 $C 4 H 6$ 收率	$[4.90,5.20]$	$[0,1]$

表4 状态空间信息

Table 4 Information of states space

状态量	描述	范围	归一化范围
$D t$	当前运行天数/d	$[1,72]$	$[0,1]$
$T t$	当前炉管外壁温度/℃	$[900.00,1080.00]$	$[0,1]$
$P C 2 H 4 t$	当前 $C 2 H 4$ 收率	$[25.55,25.70]$	$[0,1]$
$P C 3 H 6 t$	当前 $C 3 H 6$ 收率	$[12.80,12.90]$	$[0,1]$
$P C 4 H 6 t$	当前 $C 4 H 6$ 收率	$[4.90,5.20]$	$[0,1]$

表5 参数设置及网格搜索范围

Table 5 Parameters setting and grid search range

参数名称	参数网格搜索范围	参数设定值
奖励系数： $μ$ ， $α$ ， $β$ ， $λ$	{1,10^-1, 5× $10$ ^-2,10^-2}	10^-2，1，1，1
Punishment	{-40, -60, -80, -100}	-100
Critic网络个数N	{1, 2, 3, 4}	3
折扣因子 $γ$	{0.99, 0.95, 0.90}	0.99
Mini-Batch size (K)	{8, 16, 32, 64}	32
Actor网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-4
Critic网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-3
更新率 $τ$	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	5×10^-3
Buffer length	{10³, 10⁴, 10⁵}	10⁴
策略延迟更新步数d	{2, 4, 6, 8}	2

表5 参数设置及网格搜索范围

Table 5 Parameters setting and grid search range

参数名称	参数网格搜索范围	参数设定值
奖励系数： $μ$ ， $α$ ， $β$ ， $λ$	{1,10^-1, 5× $10$ ^-2,10^-2}	10^-2，1，1，1
Punishment	{-40, -60, -80, -100}	-100
Critic网络个数N	{1, 2, 3, 4}	3
折扣因子 $γ$	{0.99, 0.95, 0.90}	0.99
Mini-Batch size (K)	{8, 16, 32, 64}	32
Actor网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-4
Critic网络学习率	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	10^-3
更新率 $τ$	{10^-4, 5×10^-4, 10^-3, 5×10^-3}	5×10^-3
Buffer length	{10³, 10⁴, 10⁵}	10⁴
策略延迟更新步数d	{2, 4, 6, 8}	2

图3 MTD3、TD3和PPO训练episode reward曲线

Fig.3 The training episode reward curves of MTD3, TD3 and PPO

图4 MTD3与TD3所求策略对应的TMT变化趋势

Fig.4 The TMT trending curves corresponding to the strategy obtained by MTD3 and TD3

图5 不同算法所求最优COT策略

Fig.5 The optimal COT strategy obtained by different algorithms

图6 不同算法所求最优汽烃比策略

Fig.6 The optimal DHR strategy obtained by different algorithms

图7 不同优化策略对应的三烯收率

Fig.7 Yields of three ethylene cracking products corresponding to different optimization strategies

表6 不同算法所求策略对应的三烯平均收率

Table 6 The average yield of three ethylene cracking products obtained by different algorithms for different strategies

策略	单个运行周期内平均收率/%			三烯平均收率/%
策略	$C 2 H 4$	$C 3 H 6$	$C 4 H 6$	三烯平均收率/%
优化前	25.57033224	12.78792472	4.993497869	43.35175483
MCOA	25.68160822	12.84300845	5.001654668	43.52627133
TD3	25.64259930	12.84057390	4.998896120	43.48206925
MTD3	25.68328649	12.80894534	5.038997277	43.53122910
PPO	25.63879185	12.82470653	4.991962284	43.45546066

表6 不同算法所求策略对应的三烯平均收率

Table 6 The average yield of three ethylene cracking products obtained by different algorithms for different strategies

策略	单个运行周期内平均收率/%			三烯平均收率/%
策略	$C 2 H 4$	$C 3 H 6$	$C 4 H 6$	三烯平均收率/%
优化前	25.57033224	12.78792472	4.993497869	43.35175483
MCOA	25.68160822	12.84300845	5.001654668	43.52627133
TD3	25.64259930	12.84057390	4.998896120	43.48206925
MTD3	25.68328649	12.80894534	5.038997277	43.53122910
PPO	25.63879185	12.82470653	4.991962284	43.45546066

参考文献 25

1	徐海丰.全球乙烯产业格局变化及发展前景分析[J].国际石油经济, 2023, 31(1): 65-70, 82.
	Xu H F. The change and development prospect of global ethylene industry[J]. International Petroleum Economics, 2023, 31(1): 65-70, 82.
2	陆浩.我国乙烯工业及下游产业链发展现状与展望[J]. 当代石油石化, 2022, 30(4): 22-27.
	Lu H. Development status and prospect of China’s ethylene industry chain[J]. Petroleum & Petrochemical Today, 2022, 30(4): 22-27.
3	刘春平, 王昕, 王振雷. 基于相关积分优化方法的裂解炉优化[J]. 化工学报, 2015, 66(10): 4067-4075.
	Liu C P, Wang X, Wang Z L. Optimization of cracking furnace based on correlation integral optimal method[J]. CIESC Journal, 2015, 66(10): 4067-4075.
4	Wang T, Ye Z C, Wang X J, et al. Improved distributed optimization algorithm and its application in energy saving of ethylene plant[J]. Chemical Engineering Science, 2022, 251: 117449.
5	耿志强, 毕帅, 王尊, 等. 基于改进NSGA-Ⅱ算法的乙烯裂解炉操作优化[J]. 化工学报, 2020, 71(3): 1088-1094.
	Geng Z Q, Bi S, Wang Z, et al. Operation optimization of ethylene cracking furnace based on improved NSGA-Ⅱ algorithm[J]. CIESC Journal, 2020, 71(3): 1088-1094.
6	Li C F, Zhu Q X, Geng Z Q. Multi-objective particle swarm optimization hybrid algorithm: an application on industrial cracking furnace[J]. Industrial & Engineering Chemistry Research, 2007, 46(11): 3602-3609.
7	严逍亚, 王振雷, 王昕. 多策略改进的土狼算法及工业应用[C/OL]//第31届中国过程控制会议(CPCC 2020)摘要集. 徐州, 2020: 59. .
	Yan X Y, Wang Z L, Wang X. A hybrid strategy modified coyote optimization algorithm and its industrial application[C/OL]//TCPC, CAA. CPCC 2020 Summary Set. Xuzhou, 2020: 59. .
8	Nian X Y, Wang Z L, Qian F. A hybrid algorithm based on differential evolution and group search optimization and its application on ethylene cracking furnace[J]. Chinese Journal of Chemical Engineering, 2013, 21(5): 537-543.
9	黄一俞. 乙烯裂解炉过程建模与操作优化[D]. 北京: 北京化工大学, 2005.
	Huang Y Y. Process modeling and operation optimization of ethylene cracking furnace[D]. Beijing: Beijing University of Chemical Technology, 2005.
10	王秋懿. 基于改进NNIA的乙烯裂解炉操作优化[D]. 北京: 北京化工大学, 2022.
	Wang Q Y. Operation optimization of ethylene cracking furnace based on improved NNIA[D]. Beijing: Beijing University of Chemical Technology, 2022.
11	尚田丰, 耿志强. 基于GA-RBF网络的乙烯裂解炉在线操作优化[J]. 计算机与应用化学, 2009, 26(8): 1003-1007.
	Shang T F, Geng Z Q. Online operation optimization in ethylene cracking furnace based on GA-RBF network[J]. Computers and Applied Chemistry, 2009, 26(8): 1003-1007.
12	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
13	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[EB/OL]. 2015. .
14	Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[EB/OL]. 2018. .
15	Sutton R S, Barto A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998.
16	南栖仙策. 强化学习控制白皮书[R/OL]. 2022. .
	POLIXIR 2022-RL-Control White Paper[R/OL]. 2022. .
17	Zhu L W, Cui Y D, Takami G, et al. Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process[J]. Control Engineering Practice, 2020, 97: 104331.
18	Powell B K M, Machalek D, Quah T. Real-time optimization using reinforcement learning[J]. Computers & Chemical Engineering, 2020, 143: 107077.
19	洪博岩. 乙烯裂解炉平均COT温度先进控制系统的开发与应用[J]. 石油化工高等学校学报, 2019, 32(2): 92-97.
	Hong B Y. Application and development of advanced control system of average COT temperature in the ethylene cracking furnace[J]. Journal of Petrochemical Universities, 2019, 32(2): 92-97.
20	Edwin E H, Arnesen T, Hugosson G I. Evaluation of thermal cracker operation by use of an infrared camera[J]. Proceedings of SPIE-The International Society for Optical Engineering, 1998, 3361(2): 125-136
21	Morales E F, Murrieta-Cid R, Becerra I, et al. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning[J]. Intelligent Service Robotics, 2021, 14(5): 773-805.
22	Bengio Y, Lodi A, Prouvost A. Machine learning for combinatorial optimization: a methodological tour d’horizon[J]. European Journal of Operational Research, 2021, 290(2): 405-421.
23	Plaat A. Deep Reinforcement Learning[M]. Singapore: Springer Nature Singapore, 2022.
24	Pan L, Cai Q P, Huang L B. Softmax deep double deterministic policy gradients[EB/OL]. 2020. .
25	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[EB/OL]. 2017. .

[1]	宋嘉豪, 王文. 斯特林发动机与高温热管耦合运行特性研究[J]. 化工学报, 2023, 74(S1): 287-294.
[2]	连梦雅, 谈莹莹, 王林, 陈枫, 曹艺飞. 地下水预热新风一体化热泵空调系统制热性能研究[J]. 化工学报, 2023, 74(S1): 311-319.
[3]	金正浩, 封立杰, 李舒宏. 氨水溶液交叉型再吸收式热泵的能量及分析[J]. 化工学报, 2023, 74(S1): 53-63.
[4]	温凯杰, 郭力, 夏诏杰, 陈建华. 一种耦合CFD与深度学习的气固快速模拟方法[J]. 化工学报, 2023, 74(9): 3775-3785.
[5]	李科, 文键, 忻碧平. 耦合蒸气冷却屏的真空多层绝热结构对液氢储罐自增压过程的影响机制研究[J]. 化工学报, 2023, 74(9): 3786-3796.
[6]	王浩, 王振雷. 基于自适应谱方法的裂解炉烧焦模型化简策略[J]. 化工学报, 2023, 74(9): 3855-3864.
[7]	曹跃, 余冲, 李智, 杨明磊. 工业数据驱动的加氢裂化装置多工况切换过渡状态检测[J]. 化工学报, 2023, 74(9): 3841-3854.
[8]	于旭东, 李琪, 陈念粗, 杜理, 任思颖, 曾英. 三元体系KCl + CaCl₂ + H₂O 298.2、323.2及348.2 K相平衡研究及计算[J]. 化工学报, 2023, 74(8): 3256-3265.
[9]	闫琳琦, 王振雷. 基于STA-BiLSTM-LightGBM组合模型的多步预测软测量建模[J]. 化工学报, 2023, 74(8): 3407-3418.
[10]	尹刚, 李伊惠, 何飞, 曹文琦, 王民, 颜非亚, 向禹, 卢剑, 罗斌, 卢润廷. 基于KPCA和SVM的铝电解槽漏槽事故预警方法[J]. 化工学报, 2023, 74(8): 3419-3428.
[11]	李锦潼, 邱顺, 孙文寿. 煤浆法烟气脱硫中草酸和紫外线强化煤砷浸出过程[J]. 化工学报, 2023, 74(8): 3522-3532.
[12]	徐野, 黄文君, 米俊芃, 申川川, 金建祥. 多源信息融合的离心式压缩机喘振诊断方法[J]. 化工学报, 2023, 74(7): 2979-2987.
[13]	郭雨莹, 敬加强, 黄婉妮, 张平, 孙杰, 朱宇, 冯君炫, 陆洪江. 稠油管道水润滑减阻及压降预测模型修正[J]. 化工学报, 2023, 74(7): 2898-2907.
[14]	刘春雨, 周桓宇, 马跃, 岳长涛. CaO调质含油污泥干燥特性及数学模型[J]. 化工学报, 2023, 74(7): 3018-3027.
[15]	李艳辉, 丁邵明, 白周央, 张一楠, 于智红, 邢利梅, 高鹏飞, 王永贞. 非常规服役超临界锅炉的微纳尺度腐蚀动力学模型建立及应用[J]. 化工学报, 2023, 74(6): 2436-2446.