CIESC Journal ›› 2023, Vol. 74 ›› Issue (3): 1187-1194.DOI: 10.11949/0438-1157.20221216
• Process system engineering • Previous Articles Next Articles
Xinyuan WU(), Qilei LIU(
), Boyuan CAO, Lei ZHANG, Jian DU
Received:
2022-09-06
Revised:
2022-12-21
Online:
2023-04-19
Published:
2023-03-05
Contact:
Qilei LIU
通讯作者:
刘奇磊
作者简介:
吴心远(1997—),男,硕士研究生,806973411@mail.dlut.edu.cn
基金资助:
CLC Number:
Xinyuan WU, Qilei LIU, Boyuan CAO, Lei ZHANG, Jian DU. Group2vec: group vector representation and its property prediction applications based on unsupervised machine learning[J]. CIESC Journal, 2023, 74(3): 1187-1194.
吴心远, 刘奇磊, 曹博渊, 张磊, 都健. Group2vec:基于无监督机器学习的基团向量表示及其物性预测应用[J]. 化工学报, 2023, 74(3): 1187-1194.
基团 | 向量长度 |
---|---|
[R;SX2H0] | 18.27 |
[cX3H0][!R;CX4H1] | 18.00 |
[cX3H0][!R;CX4H0] | 17.52 |
[cX3H0][!R;OX2H0] | 17.27 |
[cX3H0][!R;NX3H1] | 16.58 |
Table 1 Top five groups with their vector lengths
基团 | 向量长度 |
---|---|
[R;SX2H0] | 18.27 |
[cX3H0][!R;CX4H1] | 18.00 |
[cX3H0][!R;CX4H0] | 17.52 |
[cX3H0][!R;OX2H0] | 17.27 |
[cX3H0][!R;NX3H1] | 16.58 |
输入特征 | 物性 数据库 | 评价指标 | 训练集 | 验证集 | 测试集 | 时长/s |
---|---|---|---|---|---|---|
Group2vec | ESOL | R2 | 0.841 | 0.856 | 0.848 | 0.224 |
FreeSolv | R2 | 0.926 | 0.919 | 0.927 | 0.067 | |
Tox21 | AUC | 0.855 | 0.834 | 0.838 | 3.040 | |
Mol2vec | ESOL | R2 | 0.963 | 0.759 | 0.772 | 2.358 |
FreeSolv | R2 | 0.984 | 0.924 | 0.873 | 1.449 | |
Tox21 | AUC | 0.975 | 0.782 | 0.763 | 18.254 | |
基团贡献法(GC) | ESOL | R2 | 0.862 | 0.788 | ||
FreeSolv | R2 | 0.944 | 0.855 | |||
Tox21 | AUC | 0.812 | 0.815 |
Table 2 The training results of the property prediction models
输入特征 | 物性 数据库 | 评价指标 | 训练集 | 验证集 | 测试集 | 时长/s |
---|---|---|---|---|---|---|
Group2vec | ESOL | R2 | 0.841 | 0.856 | 0.848 | 0.224 |
FreeSolv | R2 | 0.926 | 0.919 | 0.927 | 0.067 | |
Tox21 | AUC | 0.855 | 0.834 | 0.838 | 3.040 | |
Mol2vec | ESOL | R2 | 0.963 | 0.759 | 0.772 | 2.358 |
FreeSolv | R2 | 0.984 | 0.924 | 0.873 | 1.449 | |
Tox21 | AUC | 0.975 | 0.782 | 0.763 | 18.254 | |
基团贡献法(GC) | ESOL | R2 | 0.862 | 0.788 | ||
FreeSolv | R2 | 0.944 | 0.855 | |||
Tox21 | AUC | 0.812 | 0.815 |
分子 | Top-1 | Top-3 | Top-5 |
---|---|---|---|
O=S(=O)(O)c(cc(c(c1cc(S(=O)(=O)O)c2N)c2)S(=O)(=O)O)c1 | 100% | 100% | 100% |
O=S(=O)(O)c(c(N)cc(N)c1)c1 | 100% | 100% | — |
OC(=O)C(N)CCN | 100% | 100% | 100% |
O=C(O)C(N)CCCN | 100% | 100% | 100% |
O=C(O)C(N)CCCNC(=N)N | 100% | 33% | 40% |
c(cccc1)(c1)CCCCCCCCCCCCC | 0 | 100% | — |
CCCCCCC(C)(C)c1cc2OC(C)(C)C3CC=C(C)CC3c2c(O)c1 | 0 | 33% | 60% |
O=C(OC)CCCCCCCCCCCCCCC CCCCCC | 0 | 67% | — |
c1ccccc1c(c2)ccc(c2)COc3c4C5CC(C)=CCC5C(C)(C)Oc4cc (CCCCC)c3 | 0 | 0 | 0 |
c1ccccc1c(c2)ccc(c2)COc3c4C5CC(=C)CCC5C(C)(C)Oc4cc (CCCCC)c3 | 0 | 0 | 0 |
Table 3 Comparisons of group weights
分子 | Top-1 | Top-3 | Top-5 |
---|---|---|---|
O=S(=O)(O)c(cc(c(c1cc(S(=O)(=O)O)c2N)c2)S(=O)(=O)O)c1 | 100% | 100% | 100% |
O=S(=O)(O)c(c(N)cc(N)c1)c1 | 100% | 100% | — |
OC(=O)C(N)CCN | 100% | 100% | 100% |
O=C(O)C(N)CCCN | 100% | 100% | 100% |
O=C(O)C(N)CCCNC(=N)N | 100% | 33% | 40% |
c(cccc1)(c1)CCCCCCCCCCCCC | 0 | 100% | — |
CCCCCCC(C)(C)c1cc2OC(C)(C)C3CC=C(C)CC3c2c(O)c1 | 0 | 33% | 60% |
O=C(OC)CCCCCCCCCCCCCCC CCCCCC | 0 | 67% | — |
c1ccccc1c(c2)ccc(c2)COc3c4C5CC(C)=CCC5C(C)(C)Oc4cc (CCCCC)c3 | 0 | 0 | 0 |
c1ccccc1c(c2)ccc(c2)COc3c4C5CC(=C)CCC5C(C)(C)Oc4cc (CCCCC)c3 | 0 | 0 | 0 |
1 | 张磊, 贺丁, 刘琳琳, 等. 基于模型的化工产品设计方法: 综述与展望[J]. 化工进展, 2021, 40(4): 1746-1754. |
Zhang L, He D, Liu L L, et al. Model-based chemical product design—review and perspectives[J]. Chemical Industry and Engineering Progress, 2021, 40(4): 1746-1754. | |
2 | Carbó-Dorca R. Non-linear terms & variational approach in quantum QSPR[J]. Journal of Mathematical Chemistry, 2004, 36(3): 241-260. |
3 | Blumberger J. Free energies for biological electron transfer from QM/MM calculation: method, application and critical assessment[J]. Physical Chemistry Chemical Physics, 2008, 10(37): 5651-5667. |
4 | Giddings J C, Eyring H. A molecular dynamic theory of chromatography[J]. The Journal of Physical Chemistry, 2002, 59(5): 416-421. |
5 | Hansson T. Molecular dynamics simulations[J]. Current Opinion in Structural Biology, 2002, 12(2): 190-196. |
6 | Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[EB/OL]. arXiv:1301.3781v3. . |
7 | Schuster M, Paliwal K K. Bidirectional recurrent neural networks[J]. IEEE transactions on Signal Processing, 1997, 45(11): 2673-2681. |
8 | Santana M V, Silva-Jr F P. De novo design and bioactivity prediction of SARS-CoV-2 main protease inhibitors using recurrent neural network-based transfer learning[J]. BMC Chemistry, 2021, 15(1): 1-20. |
9 | Alsenan S, Al-Turaiki I, Hafez A. A recurrent neural network model to predict blood-brain barrier permeability[J]. Computational Biology and Chemistry, 2020, 89: 107377. |
10 | Li Z, Jiang M, Wang S, et al. Deep learning methods for molecular representation and property prediction[J]. Drug Discovery Today, 2022, 27(12): 103373. |
11 | Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
12 | Fukushima K, Miyake S. Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition[M]//Competition and Cooperation in Neural Nets. Berlin, Heidelberg: Springer, 1982: 267-285. |
13 | LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551. |
14 | Lawrence S, Giles C L, Tsoi A C, et al. Face recognition: a convolutional neural-network approach[J]. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113. |
15 | Goh G B, Hodas N O, Siegel C, et al. SMILES2Vec: an interpretable general-purpose deep neural network for predicting chemical properties[EB/OL]. arXiv: 1712.02034,2017. . |
16 | Jaeger S, Fulle S, Turk S. Mol2vec: unsupervised machine learning approach with chemical intuition[J]. Journal of Chemical Information and Modeling, 2018, 58(1): 27-35. |
17 | Rogers D, Hahn M. Extended-connectivity fingerprints[J]. Journal of Chemical Information and Modeling, 2010, 50(5): 742-754. |
18 | Zhang S, Tong H H, Xu J J, et al. Graph convolutional networks: a comprehensive review[J]. Computational Social Networks, 2019, 6: 11. |
19 | Veličković P, Cucurull G, Casanova A, et al. Graph attention networks[EB/OL]. arXiv: 1710.10903, 2017. . |
20 | Joback K G, Robert C R. Estimation of pure-component properties from group-contributions[J]. Chemical Engineering Communications, 1987, 57(1/2/3/4/5/6): 233-243. |
21 | Constantinou L, Gani R. New group contribution method for estimating properties of pure compounds[J]. AIChE Journal, 1994, 40(10): 1697-1710. |
22 | Marrero J, Gani R. Group-contribution based estimation of pure component properties[J]. Fluid Phase Equilibria, 2001, 183: 183-208. |
23 | Liu Q L, Jiang Y K, Zhang L, et al. A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship[J]. Frontiers of Chemical Science and Engineering, 2022, 16(2): 152-167. |
24 | Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62. |
25 | Ruddigkeit L, van Deursen R, Blum L C, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17[J]. Journal of Chemical Information and Modeling, 2012, 52(11): 2864-2875. |
26 | Wu Z Q, Ramsundar B, Feinberg E N, et al. MoleculeNet: a benchmark for molecular machine learning[J]. Chemical Science, 2017, 9(2): 513-530. |
27 | Mansouri K, Grulke C M, Judson R S, et al. OPERA models for predicting physicochemical properties and environmental fate endpoints[J]. Journal of Cheminformatics, 2018, 10(1): 10. |
28 | Landrum G. RDKit: open-source cheminformatics software[EB/OL]. . |
29 | Oliphant T E. Python for scientific computing[J]. Computing in Science & Engineering, 2007, 9(3): 10-20. |
30 | Chauhan V K, Dahiya K, Sharma A. Problem formulations and solvers in linear SVM: a review[J]. Artificial Intelligence Review, 2019, 52(2) : 803-855. |
[1] | Dian LIN, Guomei JIANG, Xiubin XU, Bo ZHAO, Dongmei LIU, Xu WU. Preparation and drag reduction effect of silicon-based liquid-like anti-crude-oil-adhesion coatings [J]. CIESC Journal, 2023, 74(8): 3438-3445. |
[2] | Gang YIN, Yihui LI, Fei HE, Wenqi CAO, Min WANG, Feiya YAN, Yu XIANG, Jian LU, Bin LUO, Runting LU. Early warning method of aluminum reduction cell leakage accident based on KPCA and SVM [J]. CIESC Journal, 2023, 74(8): 3419-3428. |
[3] | Linqi YAN, Zhenlei WANG. Multi-step predictive soft sensor modeling based on STA-BiLSTM-LightGBM combined model [J]. CIESC Journal, 2023, 74(8): 3407-3418. |
[4] | Yuying GUO, Jiaqiang JING, Wanni HUANG, Ping ZHANG, Jie SUN, Yu ZHU, Junxuan FENG, Hongjiang LU. Water-lubricated drag reduction and pressure drop model modification for heavy oil pipeline [J]. CIESC Journal, 2023, 74(7): 2898-2907. |
[5] | Yuan YU, Weiwei CHEN, Junjie FU, Jiaxiang LIU, Zhiwei JIAO. Study and prediction of flow field in the annular region of geometrically similar turbo air classifier [J]. CIESC Journal, 2023, 74(6): 2363-2373. |
[6] | Xuejin GAO, Yuzhuo YAO, Huayun HAN, Yongsheng QI. Fault monitoring of fermentation process based on attention dynamic convolutional autoencoder [J]. CIESC Journal, 2023, 74(6): 2503-2521. |
[7] | Yanhui LI, Shaoming DING, Zhouyang BAI, Yinan ZHANG, Zhihong YU, Limei XING, Pengfei GAO, Yongzhen WANG. Corrosion micro-nano scale kinetics model development and application in non-conventional supercritical boilers [J]. CIESC Journal, 2023, 74(6): 2436-2446. |
[8] | Cheng YUN, Qianlin WANG, Feng CHEN, Xin ZHANG, Zhan DOU, Tingjun YAN. Deep-mining risk evolution path of chemical processes based on community structure [J]. CIESC Journal, 2023, 74(4): 1639-1650. |
[9] | Jiahui CHEN, Xinze YANG, Guzhong CHEN, Zhen SONG, Zhiwen QI. A critical discussion on developing molecular property prediction models: density of ionic liquids as example [J]. CIESC Journal, 2023, 74(2): 630-641. |
[10] | Yajing ZHAO, Jijiang HU, Suyun JIE, Bo-Geng LI. Modification of unsaturated polyester resin by HTPB: effect of introducing method of the rubber [J]. CIESC Journal, 2023, 74(2): 883-892. |
[11] | Kenian SHI, Jingyuan ZHENG, Yu QIAN, Siyu YANG. Two-stage stochastic programming of steam power system based on Markov chain [J]. CIESC Journal, 2023, 74(2): 807-817. |
[12] | Xuejin GAO, Kun CHENG, Huayun HAN, Huihui Gao, Yongsheng QI. Fault diagnosis of chillers using central loss conditional generative adversarial network [J]. CIESC Journal, 2022, 73(9): 3950-3962. |
[13] | Jing YANG, Zhenkang LIN, Jun TANG, Cheng FAN, Kening SUN. A review of fault characteristics, fault diagnosis and identification for lithium-ion battery systems [J]. CIESC Journal, 2022, 73(8): 3394-3405. |
[14] | Xinjie ZHOU, Jianlin WANG, Xingcong AI, Enguang SUI, Rutong WANG. IDPC-RVM based online prediction of quality variables for multimode batch processes [J]. CIESC Journal, 2022, 73(7): 3120-3130. |
[15] | Le ZHOU, Chengkai SHEN, Chao WU, Beiping HOU, Zhihuan SONG. Deep fusion feature extraction network and its application in chemical process soft sensing [J]. CIESC Journal, 2022, 73(7): 3156-3165. |
Viewed | ||||||
Full text 589
|
|
|||||
Abstract |
|
|||||