CIESC Journal ›› 2025, Vol. 76 ›› Issue (12): 6497-6507.DOI: 10.11949/0438-1157.20250802
• Intelligent process engineering • Previous Articles Next Articles
Ting YU1(
), Yingqi LIU2(
), Hengfei WANG2, Tao ZHU2, Helin GONG3, Zonghui LU1, Yuanzheng XIN1, Hui HE1, Guoan YE1(
)
Received:2025-07-21
Revised:2025-08-07
Online:2026-01-23
Published:2025-12-31
Contact:
Guoan YE
于婷1(
), 刘英琦2(
), 王恒飞2, 朱涛2, 龚禾林3, 卢宗慧1, 信远征1, 何辉1, 叶国安1(
)
通讯作者:
叶国安
作者简介:于婷(1986—),女,博士,副研究员,yuting043703@126.com基金资助:CLC Number:
Ting YU, Yingqi LIU, Hengfei WANG, Tao ZHU, Helin GONG, Zonghui LU, Yuanzheng XIN, Hui HE, Guoan YE. Evaluating large language models for prediction of pulsed column extraction process for spent fuel reprocessing[J]. CIESC Journal, 2025, 76(12): 6497-6507.
于婷, 刘英琦, 王恒飞, 朱涛, 龚禾林, 卢宗慧, 信远征, 何辉, 叶国安. 基于大语言模型的乏燃料后处理脉冲柱萃取过程预测[J]. 化工学报, 2025, 76(12): 6497-6507.
Add to citation manager EndNote|Ris|BibTeX
| 案例 | 名称与核心任务 | 复杂度 | 考察的核心认知能力 | 考察重点 |
|---|---|---|---|---|
| 1 | PUREX流程铀钚共萃取 | 低 | 知识检索与复述 | 基础宏观模拟,对标准流程和模型的复现能力 |
| 2 | PUREX流程U/Pu分离 | 中 | 知识整合 | 涉及氧化还原反应动力学项的引入能力 |
| 3 | 脉冲参数对传质影响 | 中高 | 机理推理 | 对流体力学与传质耦合关系的理解深度 |
| 4 | 次锕系元素分离 | 高 | 系统综合与外推 | 在新体系中综合分析多组分竞争效应的能力 |
| 5 | 工艺故障诊断 | 高 | 溯因推理 | 从模糊的现象反推潜在原因的诊断与归纳能力 |
Table 1 Brief introduction of test cases
| 案例 | 名称与核心任务 | 复杂度 | 考察的核心认知能力 | 考察重点 |
|---|---|---|---|---|
| 1 | PUREX流程铀钚共萃取 | 低 | 知识检索与复述 | 基础宏观模拟,对标准流程和模型的复现能力 |
| 2 | PUREX流程U/Pu分离 | 中 | 知识整合 | 涉及氧化还原反应动力学项的引入能力 |
| 3 | 脉冲参数对传质影响 | 中高 | 机理推理 | 对流体力学与传质耦合关系的理解深度 |
| 4 | 次锕系元素分离 | 高 | 系统综合与外推 | 在新体系中综合分析多组分竞争效应的能力 |
| 5 | 工艺故障诊断 | 高 | 溯因推理 | 从模糊的现象反推潜在原因的诊断与归纳能力 |
| 维度 | 核心问题 | 评价要点 |
|---|---|---|
| 合规性 | 是否严格遵循指令 | 格式符合度、任务完成度、无信息越界 |
| 鲁棒性 | 如何处理不确定性 | 无事实捏造、能识别信息缺失、承认知识边界 |
| 准确性 | 技术细节是否精确 | 公式/数据/概念的精确性、逻辑严谨性 |
| 完整性 | 是否覆盖所有要点 | 无关键信息遗漏、全面回答所有子问题 |
Table 2 Definitions of evaluation dimensions
| 维度 | 核心问题 | 评价要点 |
|---|---|---|
| 合规性 | 是否严格遵循指令 | 格式符合度、任务完成度、无信息越界 |
| 鲁棒性 | 如何处理不确定性 | 无事实捏造、能识别信息缺失、承认知识边界 |
| 准确性 | 技术细节是否精确 | 公式/数据/概念的精确性、逻辑严谨性 |
| 完整性 | 是否覆盖所有要点 | 无关键信息遗漏、全面回答所有子问题 |
| 任务阶段 | 传统方法(博士生/青年研究员) | LLM辅助方法(专家提示+审查) |
|---|---|---|
| 文献调研与模型筛选 | 8~16 h | <5 min |
| 核心方程整理与符号定义 | 4~8 h | <5 min |
| 初稿撰写与格式调整 | 2~4 h | <10 min |
| 专家审查与最终确认 | — | 约10 min |
| 总计估算耗时 | 1~3 d | <30 min |
Table 3 Comparison of time efficiency for basic modeling tasks
| 任务阶段 | 传统方法(博士生/青年研究员) | LLM辅助方法(专家提示+审查) |
|---|---|---|
| 文献调研与模型筛选 | 8~16 h | <5 min |
| 核心方程整理与符号定义 | 4~8 h | <5 min |
| 初稿撰写与格式调整 | 2~4 h | <10 min |
| 专家审查与最终确认 | — | 约10 min |
| 总计估算耗时 | 1~3 d | <30 min |
| 模型 | 案例1 | 案例2 | 案例3 | 案例4 | 案例5 |
|---|---|---|---|---|---|
| ChatGPT-o3 | 4.67 | 4.75 | 4.33 | 4.25 | 3.92 |
| DeepSeek-R1 | 4.58 | 4.58 | 4.33 | 4.42 | 4.08 |
| Qwen3 | 4.33 | 4.67 | 4.33 | 4.25 | 3.83 |
| Gemini 2.5 Pro Preview | 4.67 | 4.75 | 4.25 | 4.25 | 3.92 |
Table 4 Average scores of each model in five cases
| 模型 | 案例1 | 案例2 | 案例3 | 案例4 | 案例5 |
|---|---|---|---|---|---|
| ChatGPT-o3 | 4.67 | 4.75 | 4.33 | 4.25 | 3.92 |
| DeepSeek-R1 | 4.58 | 4.58 | 4.33 | 4.42 | 4.08 |
| Qwen3 | 4.33 | 4.67 | 4.33 | 4.25 | 3.83 |
| Gemini 2.5 Pro Preview | 4.67 | 4.75 | 4.25 | 4.25 | 3.92 |
| 模型 | API调用成本/(USD/每百万Tokens) | 平均响应延迟/(s/案例) | 开源情况 | 简要评述 |
|---|---|---|---|---|
| ChatGPT-o3 | 输入: 10.00 USD/输出: 40.00 USD | 约120.78 | 不开源 | 标准任务表现可靠,但成本极高,整体性价比不足 |
| DeepSeek-R1 | 输入: 0.55 USD/输出: 2.19 USD | 约131.00 | 开源 | 诊断任务能力突出,成本优势巨大,但响应延迟最长 |
| Qwen3 | 输入: 0.40 USD/输出:4.00 USD | 约109.07 | 开源 | 响应速度最快,成本极具竞争力,是效率最高的选择 |
| Gemini 2.5 Pro Preview | 输入: 1.25USD/输出: 10.00USD | 约109.53 | 不开源 | 闭源模型中的均衡选项,速度快,但成本显著高于开源模型 |
Table 5 Comprehensive practicality comparison of LLMs
| 模型 | API调用成本/(USD/每百万Tokens) | 平均响应延迟/(s/案例) | 开源情况 | 简要评述 |
|---|---|---|---|---|
| ChatGPT-o3 | 输入: 10.00 USD/输出: 40.00 USD | 约120.78 | 不开源 | 标准任务表现可靠,但成本极高,整体性价比不足 |
| DeepSeek-R1 | 输入: 0.55 USD/输出: 2.19 USD | 约131.00 | 开源 | 诊断任务能力突出,成本优势巨大,但响应延迟最长 |
| Qwen3 | 输入: 0.40 USD/输出:4.00 USD | 约109.07 | 开源 | 响应速度最快,成本极具竞争力,是效率最高的选择 |
| Gemini 2.5 Pro Preview | 输入: 1.25USD/输出: 10.00USD | 约109.53 | 不开源 | 闭源模型中的均衡选项,速度快,但成本显著高于开源模型 |
| 评分 | 合规性 | 准确性 | 鲁棒性 | 完整性 |
|---|---|---|---|---|
| 5 | 完全遵循指令,格式与任务均无缺失。 | 核心与辅助内容(模型、方程、概念)均完全正确。 | 主动识别信息缺失,并清晰阐述假设及其影响。 | 全面回答所有子问题,无任何遗漏。 |
| 4 | 基本遵循,偶有格式或次要指令瑕疵。 | 核心内容正确,次要细节(如辅助方程、边界条件)有轻微瑕疵。 | 能识别信息缺失,但对假设影响的讨论不充分。 | 回答所有核心问题,但忽略了某个次要子问题。 |
| 3 | 完成核心任务,但部分指令或格式有明显偏差。 | 核心模型选择合理,但关键方程或概念存在明显错误。 | 回答内容保守,但未明确指出信息缺失。 | 遗漏了多个子问题或分析要点。 |
| 2 | 核心任务未完成,或回答结构严重混乱。 | 模型选择不当或核心方程存在根本性错误(事实性幻觉)。 | 未能识别信息缺失并给出了过于确信的结论。 | 回答内容严重残缺,仅覆盖小部分问题。 |
| 1 | 回答内容与指令要求完全无关。 | 内容充斥大量事实性错误,完全不可信。 | 主动捏造事实(幻觉)以应对信息缺失。 | 未回答任何提示词中提出的具体问题。 |
Table B.1 Scoring rubric for expert evaluation
| 评分 | 合规性 | 准确性 | 鲁棒性 | 完整性 |
|---|---|---|---|---|
| 5 | 完全遵循指令,格式与任务均无缺失。 | 核心与辅助内容(模型、方程、概念)均完全正确。 | 主动识别信息缺失,并清晰阐述假设及其影响。 | 全面回答所有子问题,无任何遗漏。 |
| 4 | 基本遵循,偶有格式或次要指令瑕疵。 | 核心内容正确,次要细节(如辅助方程、边界条件)有轻微瑕疵。 | 能识别信息缺失,但对假设影响的讨论不充分。 | 回答所有核心问题,但忽略了某个次要子问题。 |
| 3 | 完成核心任务,但部分指令或格式有明显偏差。 | 核心模型选择合理,但关键方程或概念存在明显错误。 | 回答内容保守,但未明确指出信息缺失。 | 遗漏了多个子问题或分析要点。 |
| 2 | 核心任务未完成,或回答结构严重混乱。 | 模型选择不当或核心方程存在根本性错误(事实性幻觉)。 | 未能识别信息缺失并给出了过于确信的结论。 | 回答内容严重残缺,仅覆盖小部分问题。 |
| 1 | 回答内容与指令要求完全无关。 | 内容充斥大量事实性错误,完全不可信。 | 主动捏造事实(幻觉)以应对信息缺失。 | 未回答任何提示词中提出的具体问题。 |
| [1] | Birhane A, Kasirzadeh A, Leslie D, et al. Science in the age of large language models[J]. Nature Reviews Physics, 2023, 5(5): 277-280. |
| [2] | Wang H C, Fu T F, Du Y Q, et al. Scientific discovery in the age of artificial intelligence[J]. Nature, 2023, 620(7972): 47-60. |
| [3] | Bender E M, Koller A. Climbing towards NLU: on meaning, form, and understanding in the age of data[C]//Jurafsky D,Chai J, Schluter N, et al. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 5185-5198. |
| [4] | Boiko D A, MacKnight R, Kline B, et al. Autonomous chemical research with large language models[J]. Nature, 2023, 624(7992): 570-578. |
| [5] | Jablonka K M, Schwaller P, Ortega-Guerrero A, et al. Leveraging large language models for predictive chemistry[J]. Nature Machine Intelligence, 2024, 6(2): 161-169. |
| [6] | Irwin R, Dimitriadis S, He J Z, et al. Chemformer: a pre-trained transformer for computational chemistry[J]. Machine Learning: Science and Technology, 2022, 3(1): 015022. |
| [7] | Tshitoyan V, Dagdelen J, Weston L, et al. Unsupervised word embeddings capture latent knowledge from materials science literature[J]. Nature, 2019, 571(7763): 95-98. |
| [8] | Butler K T, Davies D W, Cartwright H, et al. Machine learning for molecular and materials science[J]. Nature, 2018, 559(7715): 547-555. |
| [9] | Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
| [10] | Senior A W, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710. |
| [11] | Lin Z M, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. |
| [12] | Shoombuatong W, Schaduangrat N, Mookdarsanit P, et al. Advancing the accuracy of clathrin protein prediction through multi-source protein language models[J]. Scientific Reports, 2025, 15: 24403. |
| [13] | Carvalho T F M, Silva J C F, Calil I P, et al. Rama: a machine learning approach for ribosomal protein prediction in plants[J]. Scientific Reports, 2017, 7: 16273. |
| [14] | 叶国安, 郑卫芳, 何辉, 等. 我国核燃料后处理技术现状和发展[J]. 原子能科学技术, 2020, 54(S1): 75-83. |
| Ye G A, Zheng W F, He H, et al. Current status and development of nuclear fuel reprocessing technology in China[J]. Atomic Energy Science and Technology, 2020, 54(S1): 75-83. | |
| [15] | 于婷, 张音音, 张睿志, 等. 基于机器学习的30% TBP/煤油-硝酸体系中主要组分的分配比预测研究[J]. 原子能科学技术, 2025, 59(1): 14-23. |
| Yu T, Zhang Y Y, Zhang R Z, et al. Distribution ratio prediction of major components in 30% TBP/kerosene-HNO3 system based on machine learning[J]. Atomic Energy Science and Technology, 2025, 59(1): 14-23. | |
| [16] | Baron P, Cornet S M, Collins E D, et al. A review of separation processes proposed for advanced fuel cycles based on technology readiness level assessments[J]. Progress in Nuclear Energy, 2019, 117: 103091. |
| [17] | Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: generative models for matter engineering[J]. Science, 2018, 361(6400): 360-365. |
| [18] | Lu X, Li Y, Chen D D, et al. Challenges of high-fidelity virtual reactor for exascale computing and research progress of China Virtual Reactor[J]. Nuclear Engineering and Design, 2023, 413: 112566. |
| [19] | Weidinger L, Uesato J, Rauh M, et al. Taxonomy of risks posed by language models[C]// Diaz F, Ekstrand M D, Hytönen E M T, et al. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. New York, USA: Association for Computing Machinery, 2022: 214-229. |
| [20] | Ji Z W, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38. |
| [21] | Wu K, Wu E, Wei K, et al. An automated framework for assessing how well LLMs cite relevant medical references[J]. Nature Communications, 2025, 16: 3615. |
| [22] | Chen Q Y, Hu Y, Peng X Q, et al. Benchmarking large language models for biomedical natural language processing applications and recommendations[J]. Nature Communications, 2025, 16: 3280. |
| [23] | Sandmann S, Hegselmann S, Fujarski M, et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making[J]. Nature Medicine, 2025, 31(8): 2546-2549. |
| [25] | DeepSeek-AI, Guo D, Yang D, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning[EB/OL]. 2025[2025-07-17]. . |
| [26] | Yang An, Li Anfeng, Yang Baosong, et al. Qwen3 technical report[EB/OL]. 2025[2025-07-17]. . |
| [27] | Team Gemini. Gemini 2.5: our most intelligent AI model[EB/OL]. Mountain View: Google, 2025[2025-06-30]. . |
| [28] | Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc, 2022: 24824-24837. |
| [29] | Pires T, Schlinger E, Garrette D. How multilingual is multilingual BERT[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 4996-5001. |
| [30] | Zhang Y, Li Y F, Cui L Y, et al. Siren's song in the AI ocean: a survey on hallucination in large language models[EB/OL]. [2025-07-17].. |
| [31] | Agrawal A, Suzgun M, Mackey L, et al. Do language models know when they're hallucinating references[EB/OL]. 2023[2025-07-17]. . |
| [24] | OpenAI. Introducing-o3-and-o4-mini[EB/OL]. San Francisco: OpenAI, 2025[2025-06-30]. |
| [1] | Senqing ZHUO, Hua CHEN, Wei CHEN, Bin SHANG, Hengheng LIU, Tangtang GU, Wei BAI, Longyan WANG, Haomin CAO, Guoliang DING. Model development and software implementation for predicting APF of multi-split air conditioning system [J]. CIESC Journal, 2025, 76(S1): 370-376. |
| [2] | Xiaoguang MI, Guogang SUN, Hao CHENG, Xiaohui ZHANG. Performance simulation model and validation of printed circuit natural gas cooler [J]. CIESC Journal, 2025, 76(S1): 426-434. |
| [3] | Wenfeng ZHANG, Wei GUO, Xinyu ZHANG, Haomin CAO, Guoliang DING. Model development and software implementation of the aluminum tube and aluminum fin heat exchanger [J]. CIESC Journal, 2025, 76(S1): 84-92. |
| [4] | Peng TIAN, Zhonglin ZHANG, Chao REN, Guochao MENG, Xiaogang HAO, Yegang LIU, Qiwang HOU, Abuliti ABUDULA, Guoqing GUAN. Modeling and optimization of rectisol process based on self-heat regeneration [J]. CIESC Journal, 2025, 76(9): 4601-4612. |
| [5] | Ke LI, Haolin XIE, Jian WEN. Multi-objective genetic algorithm optimization for thermal insulation performance of liquid hydrogen tank with multiple vapor-cooled shields [J]. CIESC Journal, 2025, 76(8): 4217-4227. |
| [6] | Yufeng TANG, Chunhui TAO, Yongzheng WANG, Yinhui LI, Ran DUAN, Zeyi ZHAO, Heping MA. Preparation of carbon based porous adsorbent with ultra high specific surface area and its Kr gas storage performance [J]. CIESC Journal, 2025, 76(7): 3339-3349. |
| [7] | Yan LI, Meili LEI, Xingang LI. Regulation strategy of sequential simulated moving bed structure based on separation performance [J]. CIESC Journal, 2025, 76(5): 2219-2229. |
| [8] | Zongting WANG, Lili WANG, Xiaoyan SUN, Li XIA, Shaohui TAO, Shuguang XIANG. Simplified phase equilibrium correlation-based efficient and short-cut distillation column model [J]. CIESC Journal, 2025, 76(3): 1133-1142. |
| [9] | Zhu FANG, Yaqin LIAO, Qian ZHANG, Yiyang ZHANG, Shuiqing LI. Study on adhesion loose packing limit of adhesive ellipsoids based on random ballistic deposition method [J]. CIESC Journal, 2025, 76(11): 5533-5543. |
| [10] | Jun LUAN, Lei SONG, Mingming GE, Zhijie SHANG, Xiaoming LI, Gang LI, Xinze LI, Wenjing DU. Performance analysis of combined heat and power system equipped with electric boiler and thermal storage device [J]. CIESC Journal, 2025, 76(11): 6058-6065. |
| [11] | Yibai LI, Shichang LIU, Jing WANG, Yongzhong LIU. Process simulations and multi-ion coupled transport mechanism for hydrogen-driven electrochemical CO2 capture system [J]. CIESC Journal, 2025, 76(11): 5951-5964. |
| [12] | Jiguang DONG, Shaolei XIE, Dong SHI, Lijuan LI, Chenyu ZHAO, Yujie HUANG, Chenglong SHI, Taoshan XU, Dawei CAO. Lithium extraction by n-octyl salicylate extraction system: influence of structural alterations in the synergist on extract performance [J]. CIESC Journal, 2025, 76(10): 5190-5202. |
| [13] | Jun LI, Liang ZHAO, Jinsen GAO, Chunming XU. Research progress of extraction technology in processing different distillate by grade and composition [J]. CIESC Journal, 2024, 75(4): 1065-1080. |
| [14] | Yiru WEN, Jia FU, Dahuan LIU. Advances in machine learning-based materials research for MOFs: energy gas adsorption separation [J]. CIESC Journal, 2024, 75(4): 1370-1381. |
| [15] | Dong HAN, Ningning GAO, Xinde TANG, Shenggao GONG, Liangshu XIA. Model development for simulating bubble breakup in gas-liquid bubbly flows with the Eulerian-Lagrangian approach [J]. CIESC Journal, 2024, 75(2): 553-565. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||