用 HelixFold-S1 的策略性构象探索重塑生物分子结构预测
Reshaping biomolecular structure prediction through strategic conformational exploration with HelixFold-S1
Subjects
Abstract
Generating large ensembles of candidate conformations is standard for improving biomolecular structure prediction. Yet aimless sampling is inefficient and costly, producing many redundant conformations with limited diversity, particularly for complex multimeric assemblies. Here we present HelixFold-S1, a guided planning approach specifically designed to enhance the structural prediction of biomolecular complexes by strategically targeting the most informative regions of conformational space to produce accurate conformations. For each complex, predicted interchain contact probabilities serve as a blueprint of the conformational space, guiding computational effort towards higher-probability, low-redundancy contacts that constrain structure generation. Across diverse biomolecular complex benchmarks, HelixFold-S1 achieves markedly higher structural accuracy than traditional unguided methods while reducing sampling requirements by an order of magnitude. Predicted contact probabilities also provide a rough indicator of prediction difficulty and sampling utility. These results demonstrate that guided planning reshapes conformational exploration and enables more efficient and accurate structural inference.
This is a preview of subscription content, access via your institution
Access options
Prices may be subject to local taxes which are calculated during checkout
Data availability
To train HF-S1, PDB can be downloaded at https://www.rcsb.org/docs/programmatic-access/file-download-services and the AlphaFold Protein Structure Database as the distillation dataset can be downloaded at https://ftp.ebi.ac.uk/pub/databases/alphafold/v2/. The test set are filtered and clustered from PDB with conditions detailed in Methods. The protein–antibody complexes for test can be downloaded at https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/. Detailed data processing procedures, including filtering, clustering and dataset construction, are described in Methods.
Code availability
The source code, trained weights and inference scripts for HF-S1 are publicly available via GitHub at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/HelixFold-S1. To ensure long-term availability, the version of the code at the time of acceptance is available via Zenodo at https://doi.org/10.5281/zenodo.8202943 (ref. 44).
References
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Evans, R. et al. Protein complex prediction with AlphaFold-multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Krishna, R. et al. Generalized biomolecular modeling and design with Rosettafold all-atom. Science 384, eadl2528 (2024).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Fang, X. et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat. Mach. Intell. 5, 1087–1096 (2023).
Fang, X. et al. HelixFold-Multimer: elevating protein complex structure prediction to new heights. Preprint at https://arxiv.org/abs/2404.10260 (2024).
Liu, L. et al. Technical report of HelixFold3 for biomolecular structure prediction. Preprint at https://arxiv.org/abs/2408.16975 (2024).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).
Bryant, P. & Noé, F. Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile. PLoS Comput. Biol. 20, 1012253 (2024).
Mirabello, C., Wallner, B., Nystedt, B., Azinas, S. & Carroni, M. Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes. Nat. Commun. 15, 8724 (2024).
Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein–protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).
Akiyama, Y., Zhang, Z., Mirdita, M., Steinegger, M. & Ovchinnikov, S. Scaling down protein language modeling with MSA Pairformer. Preprint at bioRxiv https://doi.org/10.1101/2025.08.02.668173 (2025).
Wallner, B. AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 39, 573 (2023).
Kalakoti, Y. & Wallner, B. AFsample2 predicts multiple conformations and ensembles with AlphaFold2. Commun. Biol. 8, 373 (2025).
Stein, R. A. & Mchaourab, H. S. Speach_af: sampling protein ensembles and conformational heterogeneity with AlphaFold2. PLoS Comput. Biol. 18, 1010483 (2022).
Yin, R. & Pierce, B. G. Evaluation of AlphaFold antibody–antigen modeling with implications for improving predictive accuracy. Protein Sci. 33, e4865 (2024).
Xing, E., Zhang, J., Wang, S. & Cheng, X. Leveraging sequence purification for accurate prediction of multiple conformational states with AlphaFold2. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-6087969/v1 (2025).
Silva, G., Cui, J. Y., Dalgarno, D. C., Lisi, G. P. & Rubenstein, B. M. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat. Commun. 15, 2464 (2024).
Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024).
Bryant, P. & Noé, F. Structure prediction of alternative protein conformations. Nat. Commun. 15, 7328 (2024).
Stahl, K., Graziadei, A., Dau, T., Brock, O. & Rappsilber, J. Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning. Nat. Biotechnol. 41, 1810–1819 (2023).
Stahl, K. et al. Modelling protein complexes with crosslinking mass spectrometry and deep learning. Nat. Commun. 15, 7866 (2024).
Heo, L. & Feig, M. Multi-state modeling of g-protein coupled receptors at experimental accuracy. Proteins Struct. Funct. Bioinform. 90, 1873–1885 (2022).
Burley, S. K. et al. Protein data bank (PDB): the single global macromolecular structure archive. Methods Mol. Biol. 1607, 627–641 (2017).
Steinegger, M. & Söding, J. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
ByteDance AML AI4Science Team et al. Protenix-advancing structure prediction through a comprehensive AlphaFold3 reproduction. Preprint at bioRxiv https://doi.org/10.1101/2025.01.08.631967 (2025).
Chai Discovery et al. Chai-1: decoding the molecular interactions of life. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.615955 (2024).
Passaro, S. et al. Boltz-2: towards accurate and efficient binding affinity prediction. Preprint at bioRxiv https://doi.org/10.1101/2025.06.14.659707 (2025).
Yan, C., Wu, F., Jernigan, R. L., Dobbs, D. & Honavar, V. Characterization of protein–protein interfaces. Protein J. 27, 59–70 (2008).
Kastritis, P. L. & Bonvin, A. M. J. J. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J. R. Soc. Interface 10, 20120835 (2013).
Zhou, H.-X. & Pang, X. Electrostatic interactions in protein structure, folding, binding, and condensation. Chem. Rev. 118, 1691–1741 (2018).
Seychell, B. C. & Beck, T. Molecular basis for protein–protein interactions. Beilstein J. Org. Chem. 17, 1–10 (2021).
Redrado-Hernández, S. et al. Broad protection against invasive fungal disease from a nanobody targeting the active site of fungal β-1,3-glucanosyltransferases. Angew. Chem. 136, 202405823 (2024).
Omura, S. N. et al. Mechanistic and evolutionary insights into a type vm CRISPR–Cas effector enzyme. Nat. Struct. Mol. Biol. 30, 1172–1182 (2023).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Schneider, C., Raybould, M. I. J. & Deane, C. M. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res. 50, 1368–1372 (2022).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Basu, S. & Wallner, B. DockQ: a quality measure for protein–protein docking models. PLoS ONE 11, e0161879 (2016).
Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lddt: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).
Landrum, G. RDKit: open-source cheminformatics. http://www.rdkit.org (2025).
Fang, X. et al. PaddleHelix: HelixFold-S1 source code and inference scripts. Zenodo https://doi.org/10.5281/zenodo.8202943 (2023).
Acknowledgements
This work was supported by the National Science and Technology Major Project (no. 2023ZD0120803).
Author information
These authors contributed equally: Lihang Liu, Yang Liu, Xianbin Ye.
Authors and Affiliations
Lihang Liu, Yang Liu, Xianbin Ye, Shanzhuo Zhang, Yuxin Li, Kunrui Zhu, Yang Xue, Jingbo Zhou, Xiaonan Zhang & Xiaomin Fang
Contributions
X.F. and X.Z. led the research. L.L. and X.F. contributed technical ideas. L.L., Y. Liu and X.Y. developed the proposed method. S.Z., Y. Li, K.Z., Y.X. and J.Z. developed the analytics. L.L., X.F., Y. Liu and X.Y. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
Peer review
Peer review information
Nature Machine Intelligence thanks Sonya M. Hanson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Reporting Summary (download PDF )
Peer Review File (download PDF )
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Liu, Y., Ye, X. et al. Reshaping biomolecular structure prediction through strategic conformational exploration with HelixFold-S1. Nat Mach Intell (2026). https://doi.org/10.1038/s42256-026-01264-2
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
这篇还没有中文全文
该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。
挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文