/vnd/media/media_files/2025/11/14/glasgow-ai-story-2025-11-14-05-10-25.jpg)
Scientists at the University of Glasgow have developed an advanced machine learning system that promises to transform how researchers understand the “language” of proteins—arguably the most fundamental communication system in biology.
The new model, PLM-interact, is designed to predict how proteins interact, how mutations disrupt these interactions, and how viruses exploit human proteins to cause disease. The work marks a significant step forward for computational biology and could accelerate research into cancers, viral infections, and emerging pathogens.
Published in Nature Communications, the research combines cutting-edge protein language modelling with high-performance computing resources typically used by astrophysicists.
PLM-interact extends the capability of existing large language models (LLMs) by jointly encoding protein pairs rather than analysing them individually—a key limitation in most existing models. By enabling a protein to “read” its potential partner, the model reveals how these molecules cooperate, clash, or malfunction inside cells.
Supercomputer to Decode Molecular Behaviour
Training the model required extraordinary computing power. The team used the UK DiRAC Tursa supercomputer—usually deployed to simulate black holes and cosmic events—to process more than 421,000 human protein pairs. The final model contains over 650 million parameters, enabling it to capture fine-grained biological signals that govern how proteins “talk” to each other.
Researchers from Glasgow’s School of Cancer Sciences, the School of Computing Science, and the MRC–University of Glasgow Centre for Virus Research led the effort. Dr Ke Yuan, corresponding author, said the ability to repurpose a supercomputing system built for astrophysics to study molecular interactions illustrates the versatility of modern HPC platforms.
AI Model Outperforms Protein Mapping Tools
Early tests show PLM-interact surpasses leading prediction systems in both accuracy and generalisability. Benchmarked across five species—mouse, fly, worm, yeast, and E. coli—the model delivered the highest area under the precision–recall curve (AUPR), improving prediction quality between 2% and 21% depending on species.
In more evolutionarily distant species such as yeast and E. coli, PLM-interact delivered 10% and 7% higher AUPR scores respectively than the next-best model.
Notably, PLM-interact correctly identified crucial protein interactions in biological processes ranging from RNA polymerisation to mitochondrial protein transport—scenarios where competing AI tools, including AlphaFold3, failed.
It also demonstrated an ability to learn and predict positive interactions more accurately by assigning higher probabilities to true interacting protein pairs.
Understanding How Mutations Reshape Diseases
A key innovation is the model’s ability to predict how mutations alter protein behaviour. Using a dataset of 6,979 annotated mutation effects from the IntAct database, the team showed PLM-interact could determine whether a mutation strengthens or weakens a protein–protein interaction—a capability crucial to studying genetic diseases, cancer pathways, and drug resistance.
The study highlights, for instance, how the MCM7 Y600E mutation—known to enhance interaction activity during DNA replication and implicated in several cancers—was accurately identified by the model as increasing protein binding. It also shows that the FXN N151A mutation, associated with the neurodegenerative disorder Friedreich’s ataxia, was correctly predicted to weaken its interaction with partner proteins.
Fine-tuning the full model improved performance by 150% in AUPR and 36% in AUROC over zero-shot approaches, underscoring the benefits of retraining PLM-interact on mutation-specific tasks.
Predicting Virus–Human Protein Interactions
In another breakthrough, PLM-interact outperformed leading models in predicting virus–host protein interactions using a dataset of over 22,000 virus–human pairs. Against established models such as STEP and LSTM-PHV, PLM-interact achieved improvements of 5.7% in AUPR, 10.9% in F1 score, and 11.9% in MCC. These capabilities could support rapid analysis during future pandemics, helping scientists identify host targets exploited by emerging viruses.
The researchers also demonstrated accurate predictions for known virus–human complexes, such as herpesvirus glycoprotein D binding to human TNFRSF14, and the Nipah virus protein interacting with ephrin-B2—interactions documented in structural biology databases.
By showing that transformer-based language models can learn relationships between pairs of sequences—not just single proteins—the Glasgow team has opened new directions in computational biology. The authors note that future models could incorporate longer contexts, structural data, and multi-modal signals involving DNA, RNA, and other biomolecules.
The study also establishes a foundation for deeper exploration into disease mechanisms and pathogen emergence. As Dr David Robertson, co-corresponding author, notes, tools like PLM-interact could become invaluable in understanding virus–host dynamics, accelerating development of new therapeutics, and enhancing global pandemic preparedness.
/vnd/media/agency_attachments/bGjnvN2ncYDdhj74yP9p.png)