一种新型的基于Levenshtein距离层次聚类的时序操作优化方法

doi:10.11949/j.issn.0438-1157.20180855

摘要/Abstract

摘要：

现代流程工业过程中，DCS采集并存储了大量的操作时序数据，若能将其中有价值的操作经验和操作信息提取出来，则可大大提高操作系统的性能。然而，操作经验概念较为模糊，无法具体量化。因此，将具有时序特征的操作数据符号化，使操作经验以区块化形式表示，并提出一种基于Levenshtein距离的时序层次凝聚聚类算法，通过对操纵变量的历史时序操作数据进行相似性搜索，进而获得多种相似的操作模式，并将每种类型的操作模式对应的过程变量进行性能分析，从而得到并保存实际工作过程中所需的操作经验，以达到生产过程操作优化的目的。为了验证所提出方法，将其用于连续组分精馏操作过程，实验结果表明所提出的基于Levenshtein距离层次聚类的操作优化方法的有效性。

关键词: 时间序列, Levenshtein距离, 层次聚类, 操作优化, 精馏

Abstract:

In the modern process industry process, DCS collects and stores a large amount of operational temporal data. If valuable operational experience and operational information can be extracted, the performance of the operating system can be greatly improved. However, operational experience is vague and cannot be quantified by value. Therefore, the operational data with time series is symbolized so that the operational experience is represented in a block form. And we propose a hierarchical clustering algorithm based on Levenshtein distance for time series. By clustering of historical operational data in the time series of variables, a variety of similar operating modes are obtained, and the process variables corresponding to the type of operation mode perform performance analysis to obtain and preserve the operational experience required in the actual work process, thereby guiding the process operation of production. In order to verify the proposed method, it is applied to the continuous multi component distillation operation process. The results show the effectiveness of the proposed method.

Key words: time series, Levenshtein distance, hierarchical clustering, operational optimization, distillation

中图分类号:

TQ 028.8

朱坚, 杨博, 王永健, 唐晓婕, 李宏光. 一种新型的基于Levenshtein距离层次聚类的时序操作优化方法[J]. 化工学报, 2019, 70(2): 581-589.

Jian ZHU, Bo YANG, Yongjian WANG, Xiaojie TANG, Hongguang LI. New operation optimization method with time series based on Levenshtein distance hierarchical clustering[J]. CIESC Journal, 2019, 70(2): 581-589.

图/表 15

图1 时间序列子序列

Fig.1 Subsequence of time series

图2 符号聚合近似(SAX)示意图

Fig.2 Symbolic aggregate approximation

表1 概率区间划分断点

Table 1 Probability interval breakpoints

α	β ₁	β ₂	β ₃	β ₄	β ₅	β ₆	β ₇	β ₈	β ₉
3	-0.43	0.43
4	-0.67	0	0.67
5	-0.84	-0.25	0.25	0.84
6	-0.97	-0.43	0	0.43	0.97
7	-1.07	-0.57	-0.18	0.18	0.57	1.07
8	-1.05	-0.67	-0.32	0	0.32	0.67	1.15
9	-1.22	-0.76	-0.43	-0.14	0.14	0.43	0.76	1.22
10	-1.28	-0.84	-0.52	-0.25	0	0.25	0.52	0.84	1.28

图3 子序列相似性搜索

Fig.3 Subsequence similarity search

图4 Levenshtein距离计算

Fig.4 Calculations of Levenstein distances

表2 算法描述

Table 2 LD- hierarchical agglomerative clustering

Algorithm	LD-hierarchical agglomerative
1	inputs: 样本集合Ω={S ₁,S ₂,…,S_m },聚类簇编辑距离度量函数LD,聚类簇数目k=1.
2	for j = 1,2,…,m
3	C_j = {x_j }
4	End for
5	for i = 1,2,…,m
6	for j = 1,2,…,m
7	M(i,j) = LD(C_i ,C_j );
8	M(j,i) = M(i,j)
9	End for
10	End for
11	设置当前簇数目q=m,
12	While q>k
13	找出Levenshtein距离最近的两个簇C_i ,C_j ;
14	合并C_i ,C_j ;
15	For j=j ^+1,j ^+2,…,q
16	将簇C_j 重编号成C_j _-1;
17	End for
18	删除距离矩阵M第j ^行和j ^列;
19	For j=1,2,…,q-1
20	M(i ^, j) = LD( $C i $ , C_j );
21	M(j, i ^) = M(i ^, j);
22	End for
23	q = q ^-1
24	End while
25	outputs: 簇划分C={C ₁,C ₂,…,C_k }

表2 算法描述

Table 2 LD- hierarchical agglomerative clustering

Algorithm	LD-hierarchical agglomerative
1	inputs: 样本集合Ω={S ₁,S ₂,…,S_m },聚类簇编辑距离度量函数LD,聚类簇数目k=1.
2	for j = 1,2,…,m
3	C_j = {x_j }
4	End for
5	for i = 1,2,…,m
6	for j = 1,2,…,m
7	M(i,j) = LD(C_i ,C_j );
8	M(j,i) = M(i,j)
9	End for
10	End for
11	设置当前簇数目q=m,
12	While q>k
13	找出Levenshtein距离最近的两个簇C_i ,C_j ;
14	合并C_i ,C_j ;
15	For j=j ^+1,j ^+2,…,q
16	将簇C_j 重编号成C_j _-1;
17	End for
18	删除距离矩阵M第j ^行和j ^列;
19	For j=1,2,…,q-1
20	M(i ^, j) = LD( $C i $ , C_j );
21	M(j, i ^) = M(i ^, j);
22	End for
23	q = q ^-1
24	End while
25	outputs: 簇划分C={C ₁,C ₂,…,C_k }

图5 AGNES 树状图

Fig.5 AGNES tree

图6 连续多组分精馏装置

Fig.6 Continuous multi-component distillation

图7 操作模式聚类结果

Fig.7 Clustering results of operational modes

图8 基于LD与DTW聚类运算时间比较

Fig.8 Comparison of clustering speed of LD and DTW

图9 蒸汽流量W操作模式聚类可视化

Fig.9 Visualization of operational modes clustering

图10 塔顶苯与苯乙烯不同模式对应动态浓度曲线

Fig.10 Dynamic concentration curves of benzene and styrene corresponding to different operational modes

表3 原始数据区间划分断点

Table 3 Raw data interval breakpoints

Symbol	Breakpoint	W/(kmol/h)
a b c d e	-3	104.4
	-0.84	173.1
	-0.25	191.7
	0.25	207.6
	0.84	226.2
	3	294.9

图11 塔顶苯与苯乙烯操作模式A对应动态浓度曲线

Fig.11 Dynamic concentration curves of the benzene and styrene corresponding to operational mode A

符号说明

x	——液相摩尔分数
W	——蒸汽流量，kmol/h

参考文献 31

1	Piatetsky-Shapiro G . The data-mining industry coming of age[J]. IEEE Intelligent Systems, 1999, 14(6): 32-34.
2	Rossiter J A , Kouvaritakis B . Modelling and implicit modelling for predictive control[J]. International Journal of Control, 2001, (11): 1085-1095.
3	Favoreel W , De Moor B , Van Overschee P . Subspace state space system identification for industrial processes[J]. Journal of Process Control, 2000, (2): 149-155.
4	Braha D , Shmilovici A . Data mining source code for improving a cleaning process in the semiconductor industry[J]. IEEE Transactions on Semiconductor Manufacturing, 2002, 15(1): 91-101.
5	Dong L X , Xiao D M , Liu Y L . Rough set and radial basis function neural network based insulation data mining fault diagnosis for power transformer[J]. Journal of Harbin Institute of Technology, 2007, 14(2): 263-26.
6	Yang Q , Wang X . Challenging problems in data mining research[J]. Int. J. of Information Technology and Decision Making, 2006, 5(4): 597-604.
7	Agrawal R , Psaila G , Wimmers E , et al . Querying shapes of histories[C]//Proceeding of the 21st Int’l Conf. on Very Large Database(VLDB’95). San Francisco: Morgan Kaufmann Publishers, 1995: 502-514.
8	Keogh E , Lin J . Clustering of time-series subsequences is meaningless: implications for previous and future research[J]. Knowledge and Information Systems, 2005, 8(2): 154-177.
9	Berndt D J , James C . Using dynamic time warping to find patterns time series[C]//Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, Washington: KDD workshop, 1994: 359-370.
10	Wang H , Su H , Zheng K , et al . An effectiveness study on trajectory similarity measures[C]//Proceeding of the 24th Australasian Database Conf.. Darlinghurst: Australia Computer Society, 2013: 13-22.
11	Akatsukaa S , Nodab M . Similarity analysis of sequential alarms in plant operation data by using Levenshtein distance[C]// Proceedings of the 6th International Conference on Process Systems Engineering(PSE ASIA). Kagaku: Kagaku Ronbunshu, 2013: 25-27.
12	Lin J , Keogh E , Lonardi S , et al . A symbolic representation of time series, with implications for streaming algrithms[C]// Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. USA: ACM, 2003: 2-11.
13	Keogh E , Chakrabarti K , Pazzani M , et al . Dimensionality reduction for fast similarity search in large time series databases[J]. Knowl. Inf. Syst., 2001, 3(3): 263-286.
14	Chakrabarti K , Keogh E E , Mehrotra S , et al . Locally adaptive dimensionality reduction for indexing large time series databases[J]. ACM Trans.Database Syst., 2002, (27): 188-228.
15	Goldin D Q , Kanellakis P C . On similarity queries for time series data: constraint specification and implementation[M]//International Conference on Principles and Practice of Constraint Programming. Berlin: Springer Press, 1995: 137-153.
16	Tan S C , San Lau P , Yu X W .Finding similar time series in sales transaction data[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.Berlin: Springer International, 2015: 645-654.
17	Loh W K , Kim S W , Whang K Y .A subsequence matching algorithm that supports normalization transform in time series databases[J]. Data Mining and Knowledge Discovery, 2004, 9(1): 5-28.
18	Berndt D J , Clifford J . Using dynamic time warping to find patterns in time series[M]//KDD Workshop.Washington: KDD Press, 1994: 359-370.
19	Fu A W C , Keogh E , Lau L Y , et al . Scaling and time warping in time series querying[J]. The International Journal on Very Large Data Bases, 2008, 17(4): 899-921.
20	Bankό Z , Abonyi J . Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications, 2012, 39(17): 12814-12823.
21	戴东波, 汤春蕾, 熊赟 . 基于整体和局部相似性的序列聚类算法[J]. 软件学报, 2010, 21(4): 702-717.
	DAI D B , TANG C L , XIONG Y . Sequence clustering algorithm based on global and local similarity [J]. Journal of Software, 2010, 21(4): 702-717.
22	Levenshtein V . Binary codes capable of correcting deletions, insertions, and reversals[J]. Soviet Physics Doklady, 1966, 10(8): 707-710.
23	ILIOPOULOS C S , RAHMAN M S . New efficient algorithms for the LCS and constrained LCS problems[J]. Information Processing Letters, 2008, 106(1): 13-18.
24	Wagner R A , Fischer M J . The string-to-string correction problem[J]. Journal of the ACM, 1974, 21(1): 168-173.
25	Silva J D A , Hruschka E R . Extending k-means-based algorithms for evolving data streams with variable number of clusters[C]//International Conference on Machine Learning and Applications and Workshops(ICMLA). Hawaii: IEEE, 2011, 2: 14-19.
26	王勇, 唐靖, 饶勤菲, 等 . 高效率的 K-means 最佳聚类数确定算法[J]. 计算机应用, 2014, 34(5): 1331-1335.
	Wang Y , Tang J , Rao Q F , et al . High efficiency K-means optimal cluster number determination algorithm [J]. Journal of Computer Applications, 2014, 34(5): 1331-1335.
27	Celebi M E , Kingravi H A , Vela P A . A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1): 200-210.
28	Han J , Kamber M . Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann, 2001.
29	Narasimhan M , Jojic N , Bilmes J . Q-clustering[J]. Neural Information Processing Systems, 2005, 17: 1537-1544.
30	Li J F , Li J S , He H Q . A simple and accurate approach to hierarchical clustering[J]. Journal of Computational Information Systems, 2011, 7(7): 2577-2584.
31	Xu D , Tian Y . A comprehensive survey of clustering algrithms[J]. Ann. Data Sci., 2015, 2(2): 165-193.

[1]	王俐智, 杭钱程, 郑叶玲, 丁延, 陈家继, 叶青, 李进龙. 离子液体萃取剂萃取精馏分离丙酸甲酯+甲醇共沸物[J]. 化工学报, 2023, 74(9): 3731-3741.
[2]	诸程瑛, 王振雷. 基于改进深度强化学习的乙烯裂解炉操作优化[J]. 化工学报, 2023, 74(8): 3429-3437.
[3]	刘尚豪, 贾胜坤, 罗祎青, 袁希钢. 基于梯度提升决策树的三组元精馏流程结构最优化[J]. 化工学报, 2023, 74(5): 2075-2087.
[4]	李木金, 胡松, 施德磐, 赵鹏, 高瑞, 李进龙. 环氧丁烷尾气溶剂吸收及精制工艺[J]. 化工学报, 2023, 74(4): 1607-1618.
[5]	刘会影, 贾胜坤, 罗祎青, 袁希钢. 气相进料对隔板精馏塔优化设计的影响[J]. 化工学报, 2022, 73(7): 3090-3098.
[6]	段文婷, 任思月, 冯霄, 王彧斐. 与换热网络热集成的精馏塔压优化[J]. 化工学报, 2022, 73(5): 2052-2059.
[7]	刘鑫, 潘阳, 刘公平, 方静, 李春利, 李浩. 渗透汽化-隔壁塔精馏耦合初步分离费托合成水的过程研究[J]. 化工学报, 2022, 73(5): 2020-2030.
[8]	石晓青, 朱炜玄, 叶昊天, 韩志忠, 董宏光. 碳五隔壁反应精馏预处理工艺模拟及多目标优化[J]. 化工学报, 2022, 73(3): 1246-1255.
[9]	邬云飞, 栾小丽, 刘飞. 基于迁移学习的2，6-二甲酚纯度近红外光谱在线检测[J]. 化工学报, 2022, 73(2): 782-791.
[10]	柳旭, 许松林, 王燕飞. 原甲酸三甲酯-醋酸萃取精馏全局多目标优化[J]. 化工学报, 2022, 73(10): 4518-4526.
[11]	唐晓婕, 杨博, 李宏光. 复杂化工过程调控操纵策略的深度学习方法[J]. 化工学报, 2021, 72(9): 4830-4837.
[12]	宋振兴, 崔现宝, 张缨, 张雪梅, 何杰, 冯天扬, 王纪孝. 混合离子液体催化反应精馏合成乙酸正己酯[J]. 化工学报, 2021, 72(8): 4155-4165.
[13]	谢府命, 许锋, 罗雄麟. 工艺调度对乙炔加氢反应器优化运行策略的影响分析[J]. 化工学报, 2021, 72(5): 2718-2726.
[14]	王东亮, 谢江鹏, 周怀荣, 孟文亮, 杨勇, 李德磊. 基于MDEA的烟气SO₂捕集过程工艺参数和能量集成分析[J]. 化工学报, 2021, 72(3): 1521-1528.
[15]	陈熙理, 孙国铭, 贾胜坤, 罗祎青, 袁希钢. 基于决策树的三组元精馏序列结构最优合成规则识别[J]. 化工学报, 2021, 72(3): 1430-1437.