PRIME, a temperature-guided language model revolutionizes protein engineering

PRIME, a temperature-guided language model revolutionizes protein engineering

PDF

Yuanxi Yu^b, Qianhui Wang^a, Yike Zou^a^,^b^,^*

Acta Pharmaceutica Sinica B | 2025, 15(6) : 3343 - 3345

Less

Acta Pharmaceutica Sinica B | 2025, 15(6): 3343-3345

• HIGHLIGHT •

PRIME, a temperature-guided language model revolutionizes protein engineering

Full

Yuanxi Yu^b, Qianhui Wang^a, Yike Zou^a^,^b^,^*

Affiliations

^aSchool of Pharmaceutical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China

^bZhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 201203, China

doi: 10.1016/j.apsb.2025.04.010

Outline

Key words

Protein engineering / Directed evolution / Pre-trained language model / Deep learning / Thermal stability / Enzyme activity / Protein language model / Mutant fitness prediction

Cite this Article

Yuanxi Yu, Qianhui Wang, Yike Zou. PRIME, a temperature-guided language model revolutionizes protein engineering[J]. Acta Pharmaceutica Sinica B, 2025 , 15 (6) : 3343 -3345 . DOI: 10.1016/j.apsb.2025.04.010

Full Text

Less

Recently, a novel protein language model (PLM) was published by Liang Hong group in Science Advances¹, introducing PRIME (PRotein language model for Intelligent Masked pretraining and Environment prediction, Fig. 1). PRIME is a deep learning model designed to predict and improve protein stability and activity without relying on experimental mutagenesis data. This innovative approach leverages a vast dataset of 96 million protein sequences annotated with their host bacterial optimal growth temperatures (OGTs) to develop a model that effectively guides protein engineering across various applications.

Protein engineering for pharmaceutical and industrial applications faces several major challenges. Traditional methods, such as directed evolution and rational design, typically demand extensive experimental screening or deep mechanistic insights into protein structures and functions^2,3. In recent years, PLMs have emerged as promising tools for protein engineering⁴. However, many existing PLMs struggle to recommend mutations that enhance both stability and activity, two critical properties for engineered proteins.

PRIME successfully addressed these challenges by offering a data-driven approach that predicts promising mutations to increase both stability and activity without relying on experimental data. The model's architecture is built on a transformer-based encoder, augmented with two specialized modules: one for Masked Language Modeling (MLM⁵) and another for OGT prediction⁶. This setup enables the model to capture the fundamental relationship between sequences and temperature-related attributes that are crucial for the stability and function of proteins, making it particularly advantageous for engineering industrial enzymes or proteins that need high-temperature tolerance and resilience in practical applications.

One of the most notable strengths of PRIME lies in its “zero-shot” capability, which allows it to identify beneficial mutations for a given protein without any experimental data. The authors compared PRIME's zero-shot performance against several state-of-the-art models, including deep learning approaches such as, SaProt⁷ and Stability Oracle⁸, as well as traditional computational methods like GEMME⁹ and Rosetta¹⁰.

Across 283 protein assays, PRIME demonstrated superior performance in predicting changes in melting temperature (ΔT_m) and excelled in the ProteinGym benchmark¹¹, which encompasses diverse protein properties including catalytic activity, binding affinity, stability, and fluorescence intensity. Notably, PRIME achieved a score of 0.486 on the ProteinGym benchmark, significantly surpassing the second-best model, SaProt, which scored 0.457 (P = 1 × 10⁻⁴, Wilcoxon test).

To validate PRIME's efficacy, the authors conducted wet-lab experiments on five distinct proteins: LbCas12a, T7 RNA polymerase, creatinase, nonnatural nucleic acid polymerase, and the variable domain of the heavy chain of a nano-antibody against growth hormone (VHH). PRIME was used to select top-ranking single-site mutants for each protein. Remarkably, over 30% of these mutations demonstrated notable improvements in physicochemical properties, such as thermostability, catalytic activity, binding affinity, or resilience to extreme alkaline conditions and the ability to polymerize nonnatural nucleic acids.

The effectiveness of PRIME was further demonstrated through the engineering of LbCas12a and T7 RNA polymerase. For LbCas12a, a complex multidomain protein with 1228 amino acids, PRIME guided an iterative optimization process through three rounds of mutagenesis and experimental validation. In the final round, all 30 multisite mutants exhibited higher melting temperatures (T_m) than the wild type. The best-performing eight-site mutant achieved a Tm of 48.15 ℃, representing a significant 6.25 ℃ improvement over the wild type. The engineering of T7 RNA polymerase further showcased PRIME's capabilities. Aiming to enhance the enzyme's thermostability for applications such as mRNA vaccine production and isothermal amplification detection techniques, the team conducted the AI-guided mutagenesis and wet-lab validation of 95 mutants. This process successfully yielded a 12-site mutant with a melting temperature 12.8 ℃ higher than the wild type.

Notably, in both the LbCas12a and T7 RNA polymerase projects, PRIME demonstrated the ability to selectively combine certain individually negative single-site mutations into positive multi-site mutants. Such epistatic insights are typically elusive in conventional protein engineering but proved crucial for generating superior variants.

These case studies illustrate PRIME's efficiency in protein engineering. PRIME was able to guide the development of notable improved enzyme variants in just a few rounds of mutagenesis. This approach not only enhances the precision of protein engineering but also substantially reduces the time and resources required for experimental validation.

Still, several limitations warrant further exploration. The reliance of PRIME on bacterial OGTs may restrict its applicability to certain protein families. Additionally, integrating structural information or combining PRIME with other computational methods could expand its applications in drug development, enzyme design, and synthetic biology. As researchers continue to refine and adapt PRIME, it holds great promise for transforming how we discover, design, and optimize proteins in a growing range of industrial and pharmaceutical applications.

References

Less

Jiang

, Li

, Dong

, Yu

, Sun

, Wu

, et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Sc Adv 2024;10:eadr2641.

Jiang

, Yan

, Di Bernardo

, Sgrizzi

, Villiger

, Kayabolen

, et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2025;387:eadr6006.

Woolfson

. A brief history of de novo protein design: minimal, rational, and computational. J Mol Biol 2021;433:167160.

Ruffolo

, Madani

. Designing proteins with language models. Nat Biotechnol 2024;42:200—2.

Devlin

, Chang

, Lee

, outanova

. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 4171—86.

, Rabe

, Nielsen

, Engqvist

. Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima. ACS Synth Biol 2019;8:1411—20.

, Han

, Zhou

, Shan

, Zhou

, Yuan

. Saprot: protein language modeling with structure-aware vocabulary. bioRxiv 2023. https://doi.org/10.1101/2023.10.01.560349.

Diaz

, Gong

, Ouyang-Zhang

, Loy

, Wells

, Yang

, et al. Stability oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat Commun 2024;15:6170.

Laine

, Karami

, Carbone

. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol Biol Evol 2019;36:2604—19.

10.

Das

, Baker

. Macromolecular modeling with rosetta. Annu Rev Biochem 2008;77:363—82.

11.

Notin

, Dias

, Frazer

, Marchena-Hurtado

, Gomez

, Marks

, et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In: Proceedings of the 39th International Conference on Machine Learning, PMLR. 162; 2022. p. 16990—7017.

Appendix

Less

Year 2025 volume 15 Issue 6

PDF

Cite this Article

BibTeX

Article Info

doi: 10.1016/j.apsb.2025.04.010

Receive Date：2025-01-06
Online Date：2026-04-03

Article Data

Affiliations

History

Received：2025-01-06
Revised：2025-01-25
Accepted：2025-01-26

Affiliations

^aSchool of Pharmaceutical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China

^bZhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 201203, China

Corresponding:

^* Corresponding author.

References

Share

https://castjournals.cast.org.cn/joweb/apsb/EN/10.1016/j.apsb.2025.04.010

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House