CIESC Journal ›› 2025, Vol. 76 ›› Issue (1): 93-106.DOI: 10.11949/0438-1157.20240663

• Thermodynamics • Previous Articles     Next Articles

Predicting and interpreting the toxicity of ionic liquids using graph neural network

Haijun FENG1(), Bingxuan ZHANG1, Jian ZHOU2   

  1. 1.School of Computer Sciences, Shenzhen Institute of Information Technology, Shenzhen 518172, Guangdong, China
    2.School of Chemistry and Chemical Engineering, Guangdong Provincial Key Laboratory for Green Chemical Product Technology, South China University of Technology, Guangzhou 510640, Guangdong, China
  • Received:2024-06-14 Revised:2024-09-25 Online:2025-02-08 Published:2025-01-25
  • Contact: Haijun FENG

图神经网络模型预测和解释离子液体毒性的研究

冯海军1(), 章冰璇1, 周健2   

  1. 1.深圳信息职业技术学院计算机学院,广东 深圳 518172
    2.华南理工大学化学与化工学院,广东省绿色化学产品技术重点实验室,广东 广州 510640
  • 通讯作者: 冯海军
  • 作者简介:冯海军(1982—),男,博士,讲师,fenghj@sziit.edu.cn
  • 基金资助:
    广东省教育厅2024年度高等学校科研平台和项目(2024KTSCX256);深圳市教育科学2023年度规划课题项目(xbjy23002)

Abstract:

Ionic liquids are potentially toxic to the environment, how to control the toxicity of ionic liquids is one of the key factors. To understand their toxicity mechanisms, three traditional machine learning methods (support vector machine, random forest, multilayer perceptron) and three graph neural network models (graph attention network, message passing neural network, graph convolutional model) were established to predict the toxicity of ionic liquids in four living organisms (leukemia rat cell line IPC-81, acetylcholinesterase, Escherichia coli, and Vibrio fischeri). The simplified molecular-input line-entry system (SMILES) of molecules and toxicity lgEC50 values work as the input and output respectively. In the three traditional machine learning methods, extended-connectivity fingerprints (ECFPs) were used to represent molecules. While in the three graph neural network models, molecular graphs were used to represent molecules. Benefiting from molecular structure information, the graph convolutional model (GCM) had lower RMSE and MAE, and higher R2 than other models in all four datasets. Therefore, the GCM model was superior in predicting the toxicity of ionic liquids. Meanwhile, based on the GCM model, an intepretability model was established to analyze the contribution of atomic groups to the toxicity of ionic liquids in a data-driven procedure. The aromatic ring of cations and long alkyl chain could produce toxicity. Atomic groups such as S+, P+, N+, and NH+ could significantly enhance the toxicity of ionic liquids, while atomic groups such as P-, F, B-, and C could effectively reduce the toxicity of ionic liquids. This discovery provides a theoretical basis for rapid screening and development of greener and low-toxicity ionic liquids.

Key words: ionic liquids, toxicity, machine learning, graph neural network, model, prediction, interpretability

摘要:

离子液体对环境有潜在毒性,为了解其毒性机制,建立了三种传统机器学习(支持向量机,随机森林,多层感知机)和三种图神经网络(图注意力网络,消息传递神经网络,图卷积模型)模型,预测离子液体对大鼠IPC-81细胞等4种活生物体的毒性。凭借分子结构信息,图卷积模型在4个数据集中的RMSE和MAE均最低,R2均最高,因此,图卷积模型在预测离子液体毒性上更优越。同时,基于图卷积模型,建立毒性解释模型,从数据驱动上来分析原子基团对毒性的贡献。阳离子的芳香环和长烷基链会产生毒性,S+、P+、N+、NH+等原子基团会显著增强离子液体的毒性,而P-、F、B-、C等原子基团会有效降低离子液体的毒性。该发现可为快速筛选和开发更绿色低毒型离子液体提供理论依据。

关键词: 离子液体, 毒性, 机器学习, 图神经网络, 模型, 预测, 可解释性

CLC Number: