CIESC Journal ›› 2023, Vol. 74 ›› Issue (2): 630-641.DOI: 10.11949/0438-1157.20221060

• Thermodynamics • Previous Articles     Next Articles

A critical discussion on developing molecular property prediction models: density of ionic liquids as example

Jiahui CHEN(), Xinze YANG, Guzhong CHEN, Zhen SONG(), Zhiwen QI   

  1. State Key Laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2022-07-27 Revised:2022-09-22 Online:2023-03-21 Published:2023-02-05
  • Contact: Zhen SONG


陈家辉(), 杨鑫泽, 陈顾中, 宋震(), 漆志文   

  1. 华东理工大学化工学院,化学工程联合国家重点实验室,上海 200237


Molecular property prediction models are powerful tools for screening or designing chemicals to meet specific application requirements. However, many key aspects in model development such as the size and diversity of dataset, test set partitioning method, cross-validation, and algorithm selection are not treated with enough rigor, which could lead to doubtful estimation of the true predictive performance of models. Taking the group contribution method to predict the density of ionic liquids as an example, the importance of dataset partitioning and cross-validation in the modeling of molecular property prediction models was discussed. An automatic group fragmentation method of ILs is proposed and the effect of group occurrence threshold(evaluated by the number of ILs containing the group in the dataset) on the prediction accuracy is investigated. By comparing five regression algorithms(multiple linear regression, ridge regression, random forest, support vector machine, and neural network), the group contribution model based on ridge regression has the best prediction performance. The average relative error obtained on the composed dataset is 1.88%.

Key words: molecular property prediction, modelling, dataset partitioning, cross-validation, algorithms, ionic liquids, density



关键词: 分子性质预测, 模型, 数据集划分, 交叉验证, 算法, 离子液体, 密度

CLC Number: