化工学报 ›› 2019, Vol. 70 ›› Issue (2): 581-589.DOI: 10.11949/j.issn.0438-1157.20180855
收稿日期:
2018-07-25
修回日期:
2018-11-30
出版日期:
2019-02-05
发布日期:
2019-02-05
通讯作者:
李宏光
作者简介:
<named-content content-type="corresp-name">朱坚</named-content>(1992—),男,硕士研究生,<email>1915203154@qq.com</email>|李宏光(1963—),男,博士,教授,<email>lihg@mail.buct.edu.cn</email>
Jian ZHU(),Bo YANG,Yongjian WANG,Xiaojie TANG,Hongguang LI()
Received:
2018-07-25
Revised:
2018-11-30
Online:
2019-02-05
Published:
2019-02-05
Contact:
Hongguang LI
摘要:
现代流程工业过程中,DCS采集并存储了大量的操作时序数据,若能将其中有价值的操作经验和操作信息提取出来,则可大大提高操作系统的性能。然而,操作经验概念较为模糊,无法具体量化。因此,将具有时序特征的操作数据符号化,使操作经验以区块化形式表示,并提出一种基于Levenshtein距离的时序层次凝聚聚类算法,通过对操纵变量的历史时序操作数据进行相似性搜索,进而获得多种相似的操作模式,并将每种类型的操作模式对应的过程变量进行性能分析,从而得到并保存实际工作过程中所需的操作经验,以达到生产过程操作优化的目的。为了验证所提出方法,将其用于连续组分精馏操作过程,实验结果表明所提出的基于Levenshtein距离层次聚类的操作优化方法的有效性。
中图分类号:
朱坚, 杨博, 王永健, 唐晓婕, 李宏光. 一种新型的基于Levenshtein距离层次聚类的时序操作优化方法[J]. 化工学报, 2019, 70(2): 581-589.
Jian ZHU, Bo YANG, Yongjian WANG, Xiaojie TANG, Hongguang LI. New operation optimization method with time series based on Levenshtein distance hierarchical clustering[J]. CIESC Journal, 2019, 70(2): 581-589.
α | β 1 | β 2 | β 3 | β 4 | β 5 | β 6 | β 7 | β 8 | β 9 |
---|---|---|---|---|---|---|---|---|---|
3 | -0.43 | 0.43 | |||||||
4 | -0.67 | 0 | 0.67 | ||||||
5 | -0.84 | -0.25 | 0.25 | 0.84 | |||||
6 | -0.97 | -0.43 | 0 | 0.43 | 0.97 | ||||
7 | -1.07 | -0.57 | -0.18 | 0.18 | 0.57 | 1.07 | |||
8 | -1.05 | -0.67 | -0.32 | 0 | 0.32 | 0.67 | 1.15 | ||
9 | -1.22 | -0.76 | -0.43 | -0.14 | 0.14 | 0.43 | 0.76 | 1.22 | |
10 | -1.28 | -0.84 | -0.52 | -0.25 | 0 | 0.25 | 0.52 | 0.84 | 1.28 |
表1 概率区间划分断点
Table 1 Probability interval breakpoints
α | β 1 | β 2 | β 3 | β 4 | β 5 | β 6 | β 7 | β 8 | β 9 |
---|---|---|---|---|---|---|---|---|---|
3 | -0.43 | 0.43 | |||||||
4 | -0.67 | 0 | 0.67 | ||||||
5 | -0.84 | -0.25 | 0.25 | 0.84 | |||||
6 | -0.97 | -0.43 | 0 | 0.43 | 0.97 | ||||
7 | -1.07 | -0.57 | -0.18 | 0.18 | 0.57 | 1.07 | |||
8 | -1.05 | -0.67 | -0.32 | 0 | 0.32 | 0.67 | 1.15 | ||
9 | -1.22 | -0.76 | -0.43 | -0.14 | 0.14 | 0.43 | 0.76 | 1.22 | |
10 | -1.28 | -0.84 | -0.52 | -0.25 | 0 | 0.25 | 0.52 | 0.84 | 1.28 |
Algorithm | LD-hierarchical agglomerative |
---|---|
1 | inputs: 样本集合Ω={S 1,S 2,…,Sm },聚类簇编辑距离度量函数LD,聚类簇数目k=1. |
2 | for j = 1,2,…,m |
3 | Cj = {xj } |
4 | End for |
5 | for i = 1,2,…,m |
6 | for j = 1,2,…,m |
7 | M(i,j) = LD(Ci ,Cj ); |
8 | M(j,i) = M(i,j) |
9 | End for |
10 | End for |
11 | 设置当前簇数目q=m, |
12 | While q>k |
13 | 找出Levenshtein距离最近的两个簇Ci ,Cj ; |
14 | 合并Ci ,Cj ; |
15 | For j=j *+1,j *+2,…,q |
16 | 将簇Cj 重编号成Cj -1; |
17 | End for |
18 | 删除距离矩阵M第j *行和j *列; |
19 | For j=1,2,…,q-1 |
20 | M(i *, j) = LD( |
21 | M(j, i *) = M(i *, j); |
22 | End for |
23 | q = q -1 |
24 | End while |
25 | outputs: 簇划分C={C 1,C 2,…,Ck } |
表2 算法描述
Table 2 LD- hierarchical agglomerative clustering
Algorithm | LD-hierarchical agglomerative |
---|---|
1 | inputs: 样本集合Ω={S 1,S 2,…,Sm },聚类簇编辑距离度量函数LD,聚类簇数目k=1. |
2 | for j = 1,2,…,m |
3 | Cj = {xj } |
4 | End for |
5 | for i = 1,2,…,m |
6 | for j = 1,2,…,m |
7 | M(i,j) = LD(Ci ,Cj ); |
8 | M(j,i) = M(i,j) |
9 | End for |
10 | End for |
11 | 设置当前簇数目q=m, |
12 | While q>k |
13 | 找出Levenshtein距离最近的两个簇Ci ,Cj ; |
14 | 合并Ci ,Cj ; |
15 | For j=j *+1,j *+2,…,q |
16 | 将簇Cj 重编号成Cj -1; |
17 | End for |
18 | 删除距离矩阵M第j *行和j *列; |
19 | For j=1,2,…,q-1 |
20 | M(i *, j) = LD( |
21 | M(j, i *) = M(i *, j); |
22 | End for |
23 | q = q -1 |
24 | End while |
25 | outputs: 簇划分C={C 1,C 2,…,Ck } |
Symbol | Breakpoint | W/(kmol/h) |
---|---|---|
a b c d e | -3 | 104.4 |
-0.84 | 173.1 | |
-0.25 | 191.7 | |
0.25 | 207.6 | |
0.84 | 226.2 | |
3 | 294.9 |
表3 原始数据区间划分断点
Table 3 Raw data interval breakpoints
Symbol | Breakpoint | W/(kmol/h) |
---|---|---|
a b c d e | -3 | 104.4 |
-0.84 | 173.1 | |
-0.25 | 191.7 | |
0.25 | 207.6 | |
0.84 | 226.2 | |
3 | 294.9 |
1 | Piatetsky-Shapiro G . The data-mining industry coming of age[J]. IEEE Intelligent Systems, 1999, 14(6): 32-34. |
2 | Rossiter J A , Kouvaritakis B . Modelling and implicit modelling for predictive control[J]. International Journal of Control, 2001, (11): 1085-1095. |
3 | Favoreel W , De Moor B , Van Overschee P . Subspace state space system identification for industrial processes[J]. Journal of Process Control, 2000, (2): 149-155. |
4 | Braha D , Shmilovici A . Data mining source code for improving a cleaning process in the semiconductor industry[J]. IEEE Transactions on Semiconductor Manufacturing, 2002, 15(1): 91-101. |
5 | Dong L X , Xiao D M , Liu Y L . Rough set and radial basis function neural network based insulation data mining fault diagnosis for power transformer[J]. Journal of Harbin Institute of Technology, 2007, 14(2): 263-26. |
6 | Yang Q , Wang X . Challenging problems in data mining research[J]. Int. J. of Information Technology and Decision Making, 2006, 5(4): 597-604. |
7 | Agrawal R , Psaila G , Wimmers E , et al . Querying shapes of histories[C]//Proceeding of the 21st Int’l Conf. on Very Large Database(VLDB’95). San Francisco: Morgan Kaufmann Publishers, 1995: 502-514. |
8 | Keogh E , Lin J . Clustering of time-series subsequences is meaningless: implications for previous and future research[J]. Knowledge and Information Systems, 2005, 8(2): 154-177. |
9 | Berndt D J , James C . Using dynamic time warping to find patterns time series[C]//Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, Washington: KDD workshop, 1994: 359-370. |
10 | Wang H , Su H , Zheng K , et al . An effectiveness study on trajectory similarity measures[C]//Proceeding of the 24th Australasian Database Conf.. Darlinghurst: Australia Computer Society, 2013: 13-22. |
11 | Akatsukaa S , Nodab M . Similarity analysis of sequential alarms in plant operation data by using Levenshtein distance[C]// Proceedings of the 6th International Conference on Process Systems Engineering(PSE ASIA). Kagaku: Kagaku Ronbunshu, 2013: 25-27. |
12 | Lin J , Keogh E , Lonardi S , et al . A symbolic representation of time series, with implications for streaming algrithms[C]// Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. USA: ACM, 2003: 2-11. |
13 | Keogh E , Chakrabarti K , Pazzani M , et al . Dimensionality reduction for fast similarity search in large time series databases[J]. Knowl. Inf. Syst., 2001, 3(3): 263-286. |
14 | Chakrabarti K , Keogh E E , Mehrotra S , et al . Locally adaptive dimensionality reduction for indexing large time series databases[J]. ACM Trans.Database Syst., 2002, (27): 188-228. |
15 | Goldin D Q , Kanellakis P C . On similarity queries for time series data: constraint specification and implementation[M]//International Conference on Principles and Practice of Constraint Programming. Berlin: Springer Press, 1995: 137-153. |
16 | Tan S C , San Lau P , Yu X W .Finding similar time series in sales transaction data[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.Berlin: Springer International, 2015: 645-654. |
17 | Loh W K , Kim S W , Whang K Y .A subsequence matching algorithm that supports normalization transform in time series databases[J]. Data Mining and Knowledge Discovery, 2004, 9(1): 5-28. |
18 | Berndt D J , Clifford J . Using dynamic time warping to find patterns in time series[M]//KDD Workshop.Washington: KDD Press, 1994: 359-370. |
19 | Fu A W C , Keogh E , Lau L Y , et al . Scaling and time warping in time series querying[J]. The International Journal on Very Large Data Bases, 2008, 17(4): 899-921. |
20 | Bankό Z , Abonyi J . Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications, 2012, 39(17): 12814-12823. |
21 | 戴东波, 汤春蕾, 熊赟 . 基于整体和局部相似性的序列聚类算法[J]. 软件学报, 2010, 21(4): 702-717. |
DAI D B , TANG C L , XIONG Y . Sequence clustering algorithm based on global and local similarity [J]. Journal of Software, 2010, 21(4): 702-717. | |
22 | Levenshtein V . Binary codes capable of correcting deletions, insertions, and reversals[J]. Soviet Physics Doklady, 1966, 10(8): 707-710. |
23 | ILIOPOULOS C S , RAHMAN M S . New efficient algorithms for the LCS and constrained LCS problems[J]. Information Processing Letters, 2008, 106(1): 13-18. |
24 | Wagner R A , Fischer M J . The string-to-string correction problem[J]. Journal of the ACM, 1974, 21(1): 168-173. |
25 | Silva J D A , Hruschka E R . Extending k-means-based algorithms for evolving data streams with variable number of clusters[C]//International Conference on Machine Learning and Applications and Workshops(ICMLA). Hawaii: IEEE, 2011, 2: 14-19. |
26 | 王勇, 唐靖, 饶勤菲, 等 . 高效率的 K-means 最佳聚类数确定算法[J]. 计算机应用, 2014, 34(5): 1331-1335. |
Wang Y , Tang J , Rao Q F , et al . High efficiency K-means optimal cluster number determination algorithm [J]. Journal of Computer Applications, 2014, 34(5): 1331-1335. | |
27 | Celebi M E , Kingravi H A , Vela P A . A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1): 200-210. |
28 | Han J , Kamber M . Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann, 2001. |
29 | Narasimhan M , Jojic N , Bilmes J . Q-clustering[J]. Neural Information Processing Systems, 2005, 17: 1537-1544. |
30 | Li J F , Li J S , He H Q . A simple and accurate approach to hierarchical clustering[J]. Journal of Computational Information Systems, 2011, 7(7): 2577-2584. |
31 | Xu D , Tian Y . A comprehensive survey of clustering algrithms[J]. Ann. Data Sci., 2015, 2(2): 165-193. |
[1] | 王俐智, 杭钱程, 郑叶玲, 丁延, 陈家继, 叶青, 李进龙. 离子液体萃取剂萃取精馏分离丙酸甲酯+甲醇共沸物[J]. 化工学报, 2023, 74(9): 3731-3741. |
[2] | 诸程瑛, 王振雷. 基于改进深度强化学习的乙烯裂解炉操作优化[J]. 化工学报, 2023, 74(8): 3429-3437. |
[3] | 刘尚豪, 贾胜坤, 罗祎青, 袁希钢. 基于梯度提升决策树的三组元精馏流程结构最优化[J]. 化工学报, 2023, 74(5): 2075-2087. |
[4] | 李木金, 胡松, 施德磐, 赵鹏, 高瑞, 李进龙. 环氧丁烷尾气溶剂吸收及精制工艺[J]. 化工学报, 2023, 74(4): 1607-1618. |
[5] | 刘会影, 贾胜坤, 罗祎青, 袁希钢. 气相进料对隔板精馏塔优化设计的影响[J]. 化工学报, 2022, 73(7): 3090-3098. |
[6] | 段文婷, 任思月, 冯霄, 王彧斐. 与换热网络热集成的精馏塔压优化[J]. 化工学报, 2022, 73(5): 2052-2059. |
[7] | 刘鑫, 潘阳, 刘公平, 方静, 李春利, 李浩. 渗透汽化-隔壁塔精馏耦合初步分离费托合成水的过程研究[J]. 化工学报, 2022, 73(5): 2020-2030. |
[8] | 石晓青, 朱炜玄, 叶昊天, 韩志忠, 董宏光. 碳五隔壁反应精馏预处理工艺模拟及多目标优化[J]. 化工学报, 2022, 73(3): 1246-1255. |
[9] | 邬云飞, 栾小丽, 刘飞. 基于迁移学习的2,6-二甲酚纯度近红外光谱在线检测[J]. 化工学报, 2022, 73(2): 782-791. |
[10] | 柳旭, 许松林, 王燕飞. 原甲酸三甲酯-醋酸萃取精馏全局多目标优化[J]. 化工学报, 2022, 73(10): 4518-4526. |
[11] | 唐晓婕, 杨博, 李宏光. 复杂化工过程调控操纵策略的深度学习方法[J]. 化工学报, 2021, 72(9): 4830-4837. |
[12] | 宋振兴, 崔现宝, 张缨, 张雪梅, 何杰, 冯天扬, 王纪孝. 混合离子液体催化反应精馏合成乙酸正己酯[J]. 化工学报, 2021, 72(8): 4155-4165. |
[13] | 谢府命, 许锋, 罗雄麟. 工艺调度对乙炔加氢反应器优化运行策略的影响分析[J]. 化工学报, 2021, 72(5): 2718-2726. |
[14] | 王东亮, 谢江鹏, 周怀荣, 孟文亮, 杨勇, 李德磊. 基于MDEA的烟气SO2捕集过程工艺参数和能量集成分析[J]. 化工学报, 2021, 72(3): 1521-1528. |
[15] | 陈熙理, 孙国铭, 贾胜坤, 罗祎青, 袁希钢. 基于决策树的三组元精馏序列结构最优合成规则识别[J]. 化工学报, 2021, 72(3): 1430-1437. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||