化工学报 ›› 2024, Vol. 75 ›› Issue (9): 3221-3230.DOI: 10.11949/0438-1157.20240334
收稿日期:
2024-03-25
修回日期:
2024-05-17
出版日期:
2024-09-25
发布日期:
2024-10-10
通讯作者:
满奕
作者简介:
赵武灵(1998—),男,硕士研究生,zwl981950@163.com
基金资助:
Received:
2024-03-25
Revised:
2024-05-17
Online:
2024-09-25
Published:
2024-10-10
Contact:
Yi MAN
摘要:
纳米纤维素因其多样化的原料、制备方法以及改性方法而展现出丰富的分子结构及性能。但正因其结构多样性,在传统方法下研发周期长,研发成本高,若能从微观尺度设计分子结构则有助于大幅缩短该周期,而目前,现有的分子结构预测模型多适用于无机材料,对纳米纤维素的适应性有限。基于变分编码器搭建了纳米纤维素分子结构预测模型,针对纳米纤维素结构特点,设计了4条独有的结构生成约束。模型的结构生成准确率达到约63.0%。模型在识别部分结构方面表现优异,对主体结构识别率达到87.0%,能有效解耦纳米纤维素主体结构与改性基团结构,并在一定程度上证明了提出的模型框架对纳米纤维素及衍生材料的结构预测具有可行性,有助于相关材料的研发与制备。
中图分类号:
赵武灵, 满奕. 基于变分编码器的纳米纤维素分子结构预测模型框架研究[J]. 化工学报, 2024, 75(9): 3221-3230.
Wuling ZHAO, Yi MAN. Research on framework of nanocellulose molecular structure prediction model based on variational encoder[J]. CIESC Journal, 2024, 75(9): 3221-3230.
纳米纤维素主要分词序列 | 其余主要化学分词序列 |
---|---|
‘C1C(C(C(C(O1)O)O)O)O’, | ‘C(C(=O)O)’, ‘C(=O)C’, ‘C’, ‘[C@@]’ |
‘C1CCOC1’, ‘O1CCCC1’, ‘[O]1CCCC1’, | ‘S(=O)(=O)O’, ‘[N@@+]’, ‘[NH2+]’ |
‘OC[C@H](O)[C@@H](O)[C@H](O)CO’ | ‘[C@@H]’, ‘[OH+]’, ‘[CH2-]’ |
‘C(C1[C@H](C(C(C(O1)O)O)O)O [C@H]2C(C(C(C(O2)CO)O)O)O)O’ | ‘[P@+]’, ‘[Cl+2]’, ‘[S@@]’, ‘[Si@]’, ‘[BH3-]’, ‘CC(O)C’, ‘#’, ‘[O-]’ |
表1 数据库部分主要分词序列
Table 1 The main word segmentation sequence of the database
纳米纤维素主要分词序列 | 其余主要化学分词序列 |
---|---|
‘C1C(C(C(C(O1)O)O)O)O’, | ‘C(C(=O)O)’, ‘C(=O)C’, ‘C’, ‘[C@@]’ |
‘C1CCOC1’, ‘O1CCCC1’, ‘[O]1CCCC1’, | ‘S(=O)(=O)O’, ‘[N@@+]’, ‘[NH2+]’ |
‘OC[C@H](O)[C@@H](O)[C@H](O)CO’ | ‘[C@@H]’, ‘[OH+]’, ‘[CH2-]’ |
‘C(C1[C@H](C(C(C(O1)O)O)O)O [C@H]2C(C(C(C(O2)CO)O)O)O)O’ | ‘[P@+]’, ‘[Cl+2]’, ‘[S@@]’, ‘[Si@]’, ‘[BH3-]’, ‘CC(O)C’, ‘#’, ‘[O-]’ |
相关模型 | 二维结构重建结果准确率 |
---|---|
HierVAE[ | 0.799 |
NC-VAE(本研究) | 0.630 |
JT-VAE[ | 0.585 |
CG-VAE[ | 0.424 |
CVAE[ | 0.215 |
表2 模型准确率性能对比
Table 2 Model accuracy performance comparison
相关模型 | 二维结构重建结果准确率 |
---|---|
HierVAE[ | 0.799 |
NC-VAE(本研究) | 0.630 |
JT-VAE[ | 0.585 |
CG-VAE[ | 0.424 |
CVAE[ | 0.215 |
纳米纤维素及衍生物 | 模型目标 | 模型预测 | 准确率/% |
---|---|---|---|
磷酸化纳米 纤维素 | O=P(O)(O)OC[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | O=C(C((CCOC[C@H]1OCC(OC[C@@H][C@@H]O | 70.45 |
TEMPO氧化 纳米纤维素 | O=C(O)[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | O=CO[C@H][C@H]O[C@H][C@H](O[C@@H]1O | 79.49 |
磷酸化纳米 纤维素Ⅱ | O=[P@H](O)OC[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | C=[P@H]OO[C@H]O[C@H][C@H][C@H][C@H]C[C@H][C@H][C@H]O[C@@H]1O | 58.33 |
纳米纤维素 3,5-二甲基苯基氨基甲酸酯 | C[c:6]1[cH:1][c:2]([CH3:1])[cH:3][c:4](NC(=O)OC[C@@H]2C[C@H] (OC(=O)N[c:4]3[cH:3][c:2]([CH3:1])[cH:1][c:6](C)[cH:5]3)[C@@H] (OC(=O)N[c:4]3[cH:3][c:2]([CH3:1])[cH:1][c:6](C)[cH:5]3)[C@H](O)O2)[cH:5]O1 | O[c:6]1O[c:2]O[CH3:1]O[cH:3]OONCOOOOOCO2C[C@H]([C@H][C@H]([C@H][C@H][C@H]N[c:4]3[cH:3][C@H])[c:6]()[cH:5]3)C=)[cH:3][cH:1][c:6]()[cH:5])[C@H](O1 | 56.85 |
高碘酸氧化 纤维素 | O=CCO[C@H](CO)[C@@H](O)C=O | O=CO([C@H]O[C@H][C@H](O | 69.23 |
表3 纳米纤维素及其衍生物分子序列部分预测结果示例
Table 3 Examples of partial prediction results of molecular sequences of nanocellulose and its derivatives
纳米纤维素及衍生物 | 模型目标 | 模型预测 | 准确率/% |
---|---|---|---|
磷酸化纳米 纤维素 | O=P(O)(O)OC[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | O=C(C((CCOC[C@H]1OCC(OC[C@@H][C@@H]O | 70.45 |
TEMPO氧化 纳米纤维素 | O=C(O)[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | O=CO[C@H][C@H]O[C@H][C@H](O[C@@H]1O | 79.49 |
磷酸化纳米 纤维素Ⅱ | O=[P@H](O)OC[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O | C=[P@H]OO[C@H]O[C@H][C@H][C@H][C@H]C[C@H][C@H][C@H]O[C@@H]1O | 58.33 |
纳米纤维素 3,5-二甲基苯基氨基甲酸酯 | C[c:6]1[cH:1][c:2]([CH3:1])[cH:3][c:4](NC(=O)OC[C@@H]2C[C@H] (OC(=O)N[c:4]3[cH:3][c:2]([CH3:1])[cH:1][c:6](C)[cH:5]3)[C@@H] (OC(=O)N[c:4]3[cH:3][c:2]([CH3:1])[cH:1][c:6](C)[cH:5]3)[C@H](O)O2)[cH:5]O1 | O[c:6]1O[c:2]O[CH3:1]O[cH:3]OONCOOOOOCO2C[C@H]([C@H][C@H]([C@H][C@H][C@H]N[c:4]3[cH:3][C@H])[c:6]()[cH:5]3)C=)[cH:3][cH:1][c:6]()[cH:5])[C@H](O1 | 56.85 |
高碘酸氧化 纤维素 | O=CCO[C@H](CO)[C@@H](O)C=O | O=CO([C@H]O[C@H][C@H](O | 69.23 |
1 | 李艳丽, 邢惠萍, 李玉虎. CNC、CNF及BC对纸张加固效果的比较研究[J]. 中国造纸学报, 2021, 36(3): 81-86. |
Li Y L, Xing H P, Li Y H. A comparative study on paper strengthening performance of CNC, CNF, and BC[J]. Transactions of China Pulp and Paper, 2021, 36(3): 81-86. | |
2 | Wei L Q, Agarwal U P, Hirth K C, et al. Chemical modification of nanocellulose with canola oil fatty acid methyl ester[J]. Carbohydrate Polymers, 2017, 169: 108-116. |
3 | Xie H X, Du H S, Yang X H, et al. Recent strategies in preparation of cellulose nanocrystals and cellulose nanofibrils derived from raw cellulose materials[J]. International Journal of Polymer Science, 2018, 2018: 1-25. |
4 | Jiménez A, Bidare P, Hassanin H, et al. Powder-based laser hybrid additive manufacturing of metals: a review[J]. The International Journal of Advanced Manufacturing Technology, 2021, 114(1): 63-96. |
5 | Oganov A R, Pickard C J, Zhu Q, et al. Structure prediction drives materials discovery[J]. Nature Reviews Materials, 2019, 4: 331-348. |
6 | Marakana P G, Dey A, Saini B. Isolation of nanocellulose from lignocellulosic biomass: synthesis, characterization, modification, and potential applications[J]. Journal of Environmental Chemical Engineering, 2021, 9(6): 106606. |
7 | Kumar R, Rai B, Gahlyan S, et al. A comprehensive review on production, surface modification and characterization of nanocellulose derived from biomass and its commercial applications[J]. Express Polymer Letters, 2021, 15(2): 104-120. |
8 | Chen Z K, Bononi F C, Sievers C A, et al. UV-visible absorption spectra of solvated molecules by quantum chemical machine learning[J]. Journal of Chemical Theory and Computation, 2022, 18(8): 4891-4902. |
9 | Schleder G R, Padilha A C M, Acosta C M, et al. From DFT to machine learning: recent approaches to materials science—a review[J]. Journal of Physics: Materials, 2019, 2(3): 032001. |
10 | Ryan K, Lengyel J, Shatruk M. Crystal structure prediction via deep learning[J]. Journal of the American Chemical Society, 2018, 140(32): 10158-10168. |
11 | Chen J, Chaudhari N S. Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction[J]. Soft Computing, 2006, 10(4): 315-324. |
12 | Kadurin A, Nikolenko S, Khrabrov K, et al. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico[J]. Molecular Pharmaceutics, 2017, 14(9): 3098-3104. |
13 | Kim H, Ko S, Kim B J, et al. Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder[J]. Journal of Cheminformatics, 2022, 14(1): 83. |
14 | Thakur V, Guleria A, Kumar S, et al. Recent advances in nanocellulose processing, functionalization and applications: a review[J]. Materials Advances, 2021, 2(6): 1872-1895. |
15 | Tajbakhsh N, Jeyaseelan L, Li Q, et al. Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation[J]. Medical Image Analysis, 2020, 63: 101693. |
16 | 徐飞翔, 蒋丽群, 郑安庆, 等. 碳基固体酸催化纤维素热解制备左旋葡聚糖和左旋葡萄糖酮[J]. 化工学报, 2022, 73(3): 1166-1172. |
Xu F X, Jiang L Q, Zheng A Q, et al. Carbon-based solid acid catalyzed the pyrolysis of cellulose to produce levoglucosan and levoglucosenone[J]. CIESC Journal, 2022, 73(3): 1166-1172. | |
17 | Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J]. Pattern Recognition, 1997, 30(7): 1145-1159. |
18 | 付俊俊, 田彦, 陶劲松. 纳米微晶纤维素的表面基团及其改性[J]. 中国造纸, 2018, 37(1): 50-59. |
Fu J J, Tian Y, Tao J S. The surface groups and chemical modification of nanocrystalline cellulose[J]. China Pulp & Paper, 2018, 37(1): 50-59. | |
19 | 姚一军, 王鸿儒. 纤维素化学改性的研究进展[J]. 材料导报, 2018, 32(19): 3478-3488. |
Yao Y J, Wang H R. An overview on chemical modification of cellulose[J]. Materials Review, 2018, 32(19): 3478-3488. | |
20 | 陈子健, 唐艳军, 朱鹏, 等. 羧甲基纤维素的制备及其应用进展[J]. 中国造纸学报, 2022, 37(3): 144-154. |
Chen Z J, Tang Y J, Zhu P, et al. Progress in preparation and applications of carboxymethyl cellulose[J]. Transactions of China Pulp and Paper, 2022, 37(3): 144-154. | |
21 | 刘雄利, 王安, 王春平, 等. 纤维素纳米纤丝的制备和改性研究进展[J]. 中国造纸, 2020, 39(4): 74-83. |
Liu X L, Wang A, Wang C P, et al. Research progress in preparation and modification of cellulose nanofibril[J]. China Pulp & Paper, 2020, 39(4): 74-83. | |
22 | Yang X P, Biswas S K, Han J Q, et al. Surface and interface engineering for nanocellulosic advanced materials[J]. Advanced Materials, 2021, 33(28): 2002264. |
23 | Lian P, Yan R H, Wu Z G, et al. Thermal performance of novel form-stable disodium hydrogen phosphate dodecahydrate-based composite phase change materials for building thermal energy storage[J]. Advanced Composites and Hybrid Materials, 2023, 6(2): 74. |
24 | Cai Y, Cui J, Chen M, et al. Multifunctional enhancement for highly stable and efficient perovskite solar cells[J]. Advanced Functional Materials, 2021, 31(7): 2005776. |
25 | Gražulis S, Chateigner D, Downs R T, et al. Crystallography open database—an open-access collection of crystal structures[J]. Journal of Applied Crystallography, 2009, 42(4): 726-729. |
26 | Hellenbrandt M. The inorganic crystal structure database (ICSD)—present and future[J]. Crystallography Reviews, 2004, 10(1): 17-22. |
27 | Rosen A S, Iyer S M, Ray D, et al. Machine learning the quantum-chemical properties of metal-organic frameworks for accelerated materials discovery[J]. Matter, 2021, 4(5): 1578-1597. |
28 | Pareek J, Jacob J. Data Compression and Visualization Using PCA and T-SNE[M]. Singapore: Springer Singapore, 2021: 327-337. |
29 | Jin W G, Barzilay R, Jaakkola T. Hierarchical generation of molecular graphs using structural motifs[C]//Proceedings of the 37th International Conference on Machine Learning. ACM, 2020: 4839-4848. |
30 | Jin W G, Barzilay R, Jaakkola T. Junction tree variational autoencoder for molecular graph generation[C]// Proceedings of the International Conference on Machine Learning. 2018. |
31 | Liu Q, Allamanis M, Brockschmidt M, et al. Constrained graph variational autoencoders for molecule design[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: ACM, 2018: 7806-7815. |
32 | Gómez-Bombarelli R, Wei J N, Duvenaud D, et al. Automatic chemical design using a data-driven continuous representation of molecules[J]. ACS Central Science, 2018, 4(2): 268-276. |
33 | Ochiai T, Inukai T, Akiyama M, et al. Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity[J]. Communications Chemistry, 2023, 6(1): 249. |
[1] | 郭鑫, 李文静, 乔俊飞. 基于自组织模块化神经网络的污水处理过程出水参数预测[J]. 化工学报, 2024, 75(9): 3242-3254. |
[2] | 李倩, 张蓉民, 林子杰, 战琪, 蔡伟华. 基于机器学习的印刷电路板式换热器流动换热预测与仿真[J]. 化工学报, 2024, 75(8): 2852-2864. |
[3] | 张晗, 张淑宁, 刘珂, 邓冠龙. 基于慢特征分析与最小二乘支持向量回归集成的草酸钴合成过程粒度预报[J]. 化工学报, 2024, 75(6): 2313-2321. |
[4] | 王文雅, 张玮, 楼小玲, 钟若菲, 陈冰冰, 贠军贤. 纳米纤维素嵌合型晶胶微球的多微管成形与模拟[J]. 化工学报, 2024, 75(5): 2060-2071. |
[5] | 文华强, 孙全虎, 申威峰. 基于分子碎片化学空间的智能分子定向生成框架[J]. 化工学报, 2024, 75(4): 1655-1667. |
[6] | 陈思睿, 毕景良, 王雷, 李元媛, 陆规. 气液两相流流型特征无监督提取的卷积自编码器:机理及应用[J]. 化工学报, 2024, 75(3): 847-857. |
[7] | 张领先, 刘斌, 邓琳, 任宇航. 基于改进TSO优化Xception的PEMFC故障诊断[J]. 化工学报, 2024, 75(3): 945-955. |
[8] | 蒙西, 王岩, 孙子健, 乔俊飞. 基于注意力模块化神经网络的城市固废焚烧过程氮氧化物排放预测[J]. 化工学报, 2024, 75(2): 593-603. |
[9] | 肖拥君, 时兆翀, 万仁, 宋璠, 彭昌军, 刘洪来. 反向传播神经网络用于预测离子液体的自扩散系数[J]. 化工学报, 2024, 75(2): 429-438. |
[10] | 温凯杰, 郭力, 夏诏杰, 陈建华. 一种耦合CFD与深度学习的气固快速模拟方法[J]. 化工学报, 2023, 74(9): 3775-3785. |
[11] | 诸程瑛, 王振雷. 基于改进深度强化学习的乙烯裂解炉操作优化[J]. 化工学报, 2023, 74(8): 3429-3437. |
[12] | 闫琳琦, 王振雷. 基于STA-BiLSTM-LightGBM组合模型的多步预测软测量建模[J]. 化工学报, 2023, 74(8): 3407-3418. |
[13] | 尹刚, 李伊惠, 何飞, 曹文琦, 王民, 颜非亚, 向禹, 卢剑, 罗斌, 卢润廷. 基于KPCA和SVM的铝电解槽漏槽事故预警方法[J]. 化工学报, 2023, 74(8): 3419-3428. |
[14] | 徐野, 黄文君, 米俊芃, 申川川, 金建祥. 多源信息融合的离心式压缩机喘振诊断方法[J]. 化工学报, 2023, 74(7): 2979-2987. |
[15] | 董茂林, 陈李栋, 黄六莲, 吴伟兵, 戴红旗, 卞辉洋. 酸性助水溶剂制备木质纳米纤维素及功能应用研究进展[J]. 化工学报, 2023, 74(6): 2281-2295. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 259
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 128
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||