化工学报 ›› 2022, Vol. 73 ›› Issue (12): 5461-5468.DOI: 10.11949/0438-1157.20221081

• 过程系统工程 • 上一篇    下一篇

基于深度学习的金属离子-有机配体配位稳定常数的预测

齐书平(), 王文龙, 张磊(), 都健   

  1. 大连理工大学化工学院,化工系统工程研究所,辽宁 大连 116024
  • 收稿日期:2022-07-30 修回日期:2022-09-23 出版日期:2022-12-05 发布日期:2023-01-17
  • 通讯作者: 张磊
  • 作者简介:齐书平(1997—),女,硕士研究生,1137272405@qq.com
  • 基金资助:
    国家自然科学基金项目(22278053);大连市高层次人才创新支持计划项目(2021RQ105)

A deep learning-based model for predicting the stability constants of metal ions with organic ligands

Shuping QI(), Wenlong WANG, Lei ZHANG(), Jian DU   

  1. Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2022-07-30 Revised:2022-09-23 Online:2022-12-05 Published:2023-01-17
  • Contact: Lei ZHANG

摘要:

金属离子-有机配体配合物的稳定性由金属离子种类、有机配体结构和实验条件三个因素决定。使用传统方法获得配合物的稳定常数耗时耗力,不利于特异性金属螯合剂的高通量筛选。因此,基于多头图注意力网络提出一种综合考虑多种影响因素的配合物稳定常数高通量预测模型。首先,对从数据库中提取的7127个配合物中涉及到的1371个有机分子生成分子属性图,其次利用多头图注意力网络对分子属性图进行特征提取,并将提取的分子特征拼接编码的金属离子和实验条件,最后送入全连接层进行配合物稳定常数的预测。模型在测试集上的R2和RMSE分别为0.956和1.251,表明所构建模型具有良好的泛化能力。此外,将文献中基于密度泛函理论计算的螯合物稳定常数与模型预测值进行对比,结果表明模型更为可靠与高效。

关键词: 配合物, 稳定常数, 多头图注意力网络, 深度学习

Abstract:

The stability of metal ion-organic ligand complexes is determined by three factors: metal ion species, organic ligand structure and experimental conditions. Obtaining the stability constants of complexes using traditional methods is time-consuming and labor-intensive, which is not conducive to high-throughput screening of specific metal chelators. Therefore, based on multi-head graph attention network (multi-head GAT), a high-throughput prediction model of complex stability constant is proposed in this paper. Firstly, molecular attribute diagrams were generated for 1371 organic molecules out of 7127 complexes extracted from mini stability constant database. Second, the multi-head graph attention network is used to extract the features of the attributed molecular graph. The extracted molecular features are spliced with the metal ions and the experimental conditions encoded by one-hot encoding. Finally, all feature codes are sent to the fully connected layer to predict the stability constants of the complexes. The determination coefficient (R2) and root mean square error (RMSE) of the model on the test set are 0.956 and 1.251, respectively, indicating that the model has good generalization ability. In addition, using the model to predict the stability constants of chelates in the literature, the model proposed in this paper is more reliable and efficient than the results based on density functional theory (DFT) calculations.

Key words: complex, stability constant, multi-head graph attention network, deep learning

中图分类号: