化工学报

• •    

基于知识增强策略的烯烃氢甲酰化催化剂垂类大模型

陈伟建(), 许昊翔(), 程道建()   

  1. 北京化工大学化学工程学院,北京 100029
  • 收稿日期:2025-12-11 修回日期:2026-01-09 出版日期:2026-01-14
  • 通讯作者: 许昊翔,程道建
  • 作者简介:陈伟建(2002—),男,硕士研究生,2024210100@buct.edu.cn
  • 基金资助:
    国家自然科学基金项目(U23A20110);国家自然科学基金项目(22522802)

Knowledge-enhanced strategy based olefin hydroformylation catalyst focused large language model

Weijian CHEN(), Haoxiang XU(), Daojian CHENG()   

  1. School of Chemical Engineering, Beijing University of Chemical Technology, Beijing 100029, China
  • Received:2025-12-11 Revised:2026-01-09 Online:2026-01-14
  • Contact: Haoxiang XU, Daojian CHENG

摘要:

大型语言模型(LLM)在化学研究中展现出广阔潜力,但面向催化剂研发等专业领域时,其任务表现仍缺乏系统化、多层级的量化评估。为此,本研究以烯烃氢甲酰化催化体系为探针对象,构建涵盖填空、简答、解析与推理四类任务的多层级评估框架,旨在系统比较四种通用大模型(DeepSeek-V3.2-Think、Qwen3、Gemini-3-Pro-Preview、GPT-4o)的基础能力,并探究不同知识增强策略对模型性能的提升效果。首先,在统一任务体系下评估各通用模型的基线表现,筛选出综合性能最优的模型作为后续实验底座;随后,分别接入大知识图谱、小知识图谱、向量相似度检索三种知识增强方法,系统检验其在多类任务中的增益作用。评估采用填空题的正确率与简答、解析、推理题的合规性、准确性、鲁棒性、完整性四维量化指标。结果表明:Gemini-3-Pro-Preview综合表现最佳;引入知识增强策略后,模型性能进一步提升,且不同增强方式在不同任务中呈现差异化优势——知识图谱在复杂推理与机理类任务中表现突出,而向量相似度检索则在信息检索类任务中增益显著。本研究通过系统评测与知识增强实验,为面向专业化工领域的垂类大模型构建与优化提供了方法参考与实证依据,也为化工过程智能化建模与决策支持系统的开发提供了新思路。

关键词: 大语言模型, 知识增强, 垂类大模型, 智能化, 烯烃氢甲酰化

Abstract:

LLMs have demonstrated substantial potential in chemical research; however, their performance in specialized domains such as catalyst development remains insufficiently quantified in a systematic and multi-level manner. In this study, we take the olefin hydroformylation catalytic system as a probe to establish a multi-level evaluation framework encompassing cloze, short-answer, analytical, and reasoning tasks. This framework aims to systematically compare the foundational capabilities of four general-purpose large models (DeepSeek-V3.2-Think, Qwen3, Gemini-3-Pro-Preview, and GPT-4o) and to investigate the performance gains achieved through different knowledge-enhancement strategies. Initially, the baseline performance of each model was assessed under a unified task system, and the model with the highest overall performance was selected as the experimental foundation for subsequent tests. Thereafter, three knowledge-enhancement methods—large knowledge graph, small knowledge graph, and vector similarity retrieval—were integrated to systematically examine their benefits across multiple task types. Evaluation metrics included accuracy for cloze tasks and four quantitative dimensions—compliance, correctness, robustness, and completeness—for short-answer, analytical, and reasoning tasks. The results indicate that Gemini-3-Pro-Preview exhibited the best overall performance, and the introduction of knowledge-enhancement strategies further improved model performance. Notably, different enhancement approaches showed task-specific advantages: knowledge graphs excelled in complex reasoning and mechanistic tasks, while vector similarity retrieval provided substantial gains in information retrieval tasks. This study offers a systematic evaluation and empirical demonstration for constructing and optimizing vertical-domain large models in specialized chemical fields and provides new insights for the development of intelligent modeling and decision-support systems in chemical processes.

Key words: large language model, knowledge enhancement, vertical-specific large model, intelligence, olefin hydroformylation

中图分类号: