← Nature ML
论文Nature ML· 07-02

BoneCoT:由临床医生 Chain of Thought 指导的全身骨骼基础模型多中心验证

BoneCoT: multicentre validation of a whole-body skeleton foundation model for bone metastases guided by clinician-derived chain of thought

打开原文约 56 分钟读

Subjects

Abstract

Given the rising incidence of bone metastases, computed tomography is widely used worldwide as the initial imaging modality for their detection. Accurate diagnosis of bone metastases demands comprehensive evaluation, yet divergent interpretations among specialists can result in diagnostic discrepancies. In clinical practice, precision diagnosis of bone metastases necessitates multidisciplinary collaboration involving radiologists, pathologists and oncologists. Here, to meet the need for an automated tool that can deliver expert-level insights and predictions by jointly considering multidisciplinary information, we propose BoneCoT, a whole-body skeleton foundation model enhanced through a chain-of-thought (CoT) fine-tuning approach. We pretrained the model on 29.3 million computed tomography images from 30,267 patients across 12 skeletal sites and refined it over a graph of 26 clinically relevant tasks spanning diagnosis, complications, tumour type and biomarkers. Evaluated across 26 tasks and multicentre cohorts from 10 hospitals, BoneCoT outperformed state-of-the-art methods by 20% in area under the receiver operating characteristic curve. Critically, BoneCoT achieved a 40% area under the receiver operating characteristic curve improvement in distinguishing primary from metastatic lesions, significantly surpassing experienced radiologists. These findings show how clinician-derived reasoning can move artificial intelligence towards more integrated diagnostic assessment in complex disease.

This is a preview of subscription content, access via your institution

Access options

Prices may be subject to local taxes which are calculated during checkout

Data availability

Individual-level patient data are available upon reasonable request and with the approval of the data management committee at the respective institutions. Due to ethical and privacy considerations, these data are not publicly accessible. Source data are provided with this paper.

Code availability

The code for pretraining the BoneFM backbone and fine-tuning the BoneCoT and BoneFM models is available via GitHub at https://github.com/FrankZhangRp/BoneCoT.git.

References

Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).

Coleman, R. E. et al. Bone metastases. Nat. Rev. Dis. Prim. 6, 83 (2020).

Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

Lor Randall, R. (ed) Metastatic Bone Disease: An Integrated Approach to Patient Care (Springer Nature, 2024).

Coleman, R. E. Clinical features of metastatic bone disease and risk of skeletal morbidity. Clin. Cancer Res. 12, 6243s–6249s (2006).

Coleman, R. E., Brown, J. & Holen, I. in Abeloff’s Clinical Oncology 6th edn (eds Niederhuber, J. E. et al.) 809–830.e3 (Elsevier, 2020).

Zhang, J., Cai, D. & Hong, S. Prevalence and prognosis of bone metastases in common solid cancers at initial diagnosis: a population-based study. BMJ Open 13, e069908 (2023).

Hernandez, R. K. et al. Incidence of bone metastases in patients with solid tumors: analysis of oncology electronic medical records in the United States. BMC Cancer 18, 44 (2018).

Coleman, R. et al. Bone health in cancer: ESMO Clinical Practice Guidelines. Ann. Oncol. 31, 1650–1663 (2020).

Coleman, R. et al. Bone health in cancer patients: ESMO Clinical Practice Guidelines. Ann. Oncol. 25, iii124–iii137 (2014).

Jensen, A. Ø et al. Incidence of bone metastases and skeletal-related events in breast cancer patients: a population-based cohort study in Denmark. BMC Cancer 11, 29 (2011).

Nørgaard, M. et al. Skeletal related events, bone metastasis and survival of prostate cancer: a population based cohort study in Denmark (1999 to 2007). J. Urol. 184, 162–167 (2010).

Cetin, K., Christiansen, C. F., Jacobsen, J. B., Nørgaard, M. & Sørensen, H. T. Bone metastasis, skeletal-related events, and mortality in lung cancer patients: a Danish population-based cohort study. Lung Cancer 86, 247–254 (2014).

Boire, A. et al. Why do patients with cancer die?. Nat. Rev. Cancer 24, 578–589 (2024).

Oster, G. et al. Natural history of skeletal-related events in patients with breast, lung, or prostate cancer and metastases to bone: a 15-year study in two large US health systems. Support Care Cancer 21, 3279–3286 (2013).

von Moos, R. et al. Management of bone health in solid tumours: from bisphosphonates to a monoclonal antibody. Cancer Treat. Rev. 76, 57–67 (2019).

Hong, J. H. et al. Development and validation of a radiomics model for differentiating bone islands and osteoblastic bone metastases at abdominal CT. Radiology 299, 626–632 (2021).

Schulman, K. L. & Kohles, J. Economic burden of metastatic bone disease in the US. Cancer 109, 2334–2342 (2007).

DiCaprio, M. R., Murtaza, H., Palmer, B. & Evangelist, M. Narrative review of the epidemiology, economic burden, and societal impact of metastatic bone disease. Ann. Jt 7, 28 (2022).

Dong, X. et al. Artificial intelligence in skeletal metastasis imaging. Comput. Struct. Biotechnol. J. 23, 157–164 (2024).

Koike, Y. et al. Artificial intelligence-aided lytic spinal bone metastasis classification on CT scans. Int. J. Comput. Assist. Radio. Surg. 18, 1867–1874 (2023).

Hammon, M. et al. Automatic detection of lytic and blastic thoracolumbar spine metastases on computed tomography. Eur. Radio. 23, 1862–1870 (2013).

Burns, J. E. et al. Automated detection of sclerotic metastases in the thoracolumbar spine at CT. Radiology 268, 69–78 (2013).

Sun, W. et al. A CT-based radiomics nomogram for distinguishing between benign and malignant bone tumours. Cancer Imaging 21, 20 (2021).

Caloro, E. et al. Artificial intelligence in bone metastasis imaging: recent progresses from diagnosis to treatment—a narrative review. Crit. Rev. Oncog. 29, 77–90 (2024).

Lacroix, M. et al. Artificial intelligence in musculoskeletal oncology imaging: a critical review of current applications. Diagn. Inter. Imaging 104, 18–23 (2023).

Coleman, R. E., Fogelman, I., Habibollahi, F., North, W. R. & Rubens, R. D. Selection of patients with breast cancer for routine follow-up bone scans. Clin. Oncol. 2, 328–332 (1990).

Shen, T.-X. et al. CT imaging-based histogram features for prediction of EGFR mutation status of bone metastases in patients with primary lung adenocarcinoma. Cancer Imaging 19, 34 (2019).

Yao, G. et al. Value of combining PET/CT and clinicopathological features in predicting EGFR mutation in lung adenocarcinoma with bone metastasis. J. Cancer 11, 5511–5517 (2020).

Seeman, E. Bone quality: the material and structural basis of bone strength. J. Bone Min. Metab. 26, 1–8 (2008).

Hendriks, L. E. L. et al. Non-small-cell lung cancer. Nat. Rev. Dis. Prim. 10, 71 (2024).

Kim, C. & Giaccone, G. Precision oncology in non-small-cell lung cancer: opportunities and challenges. Nat. Rev. Clin. Oncol. 15, 348–349 (2018).

Lloyd, M. R., Jhaveri, K., Kalinsky, K., Bardia, A. & Wander, S. A. Precision therapeutics and emerging strategies for HR-positive metastatic breast cancer. Nat. Rev. Clin. Oncol. 21, 743–761 (2024).

Yin, J. J., Pollock, C. B. & Kelly, K. Mechanisms of cancer metastasis to the bone. Cell Res. 15, 57–62 (2005).

Alexander, R. et al. Mandating limits on workload, duty, and speed in radiology. Radiology 304, 274–282 (2022).

Taylor-Phillips, S. & Stinton, C. Fatigue in radiology: a fertile area for future research. Br. J. Radiol. 92, 20190043 (2019).

Li, M. D. et al. Artificial intelligence applied to musculoskeletal oncology: a systematic review. Skelet. Radio. 51, 245–256 (2022).

Deng, Z. et al. (eds) Foundation Models for General Medical AI: Second International Workshop, MedAGI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 6, 2024, Proceedings (Springer Nature, 2025).

Oquab, M. et al. DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Artif. Intell. https://openreview.net/forum?id=a68SUt6zFt (2024).

Bi, W. L. et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA Cancer J. Clin. 69, 127–157 (2019).

Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).

Ankush, A. VoxRad: building an open-source locally-hosted radiology reporting system. Clin. Imaging 119, 110414 (2025).

Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).

Zhang, S. & Metaxas, D. On the challenges and perspectives of foundation models for medical image analysis. Med. Image Anal. 91, 102996 (2024).

Zhou, H. Y., Acosta, J. N., Adithan, S. & Datta, S. MedVersa: a generalist foundation model for diverse medical imaging tasks. N. Engl. J. Med. AI https://doi.org/10.1056/aioa2500595 (2026).

Nath, V. et al. VILA-M3: enhancing vision-language models with medical expert knowledge. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 14788–14798 (IEEE, 2025).

von Schacky, C. E. et al. Multitask deep learning for segmentation and classification of primary bone tumors on radiographs. Radiology 301, 398–406 (2021).

Wang, H. et al. Deep learning models in classifying primary bone tumors and bone infections based on radiographs. NPJ Precis. Oncol. 9, 72 (2025).

He, Y. et al. Deep learning-based classification of primary bone tumors on radiographs: a preliminary study. EBioMedicine 62, 103121 (2020).

Yin, S. et al. A survey on multimodal large language models. Natl Sci. Rev. 11, nwae403 (2024).

Kamath, U., Keenan, K., Somers, G. & Sorenson, S. Large Language Models: A Deep Dive: Bridging Theory and Practice (Springer Nature, 2024).

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) 24824–24837 (Curran Associates, 2022).

Blankemeier, L. et al. Merlin: a computed tomography vision-language foundation model and dataset. Nature 652, 1318–1328 (2026).

Greenbaum, S. L., Thornhill, B. A. & Geller, D. S. Characterization and surgical management of metastatic disease of the tibia. Am. J. Orthop. 46, E423 (2017).

Kelly, C. M., Wilkins, R. M., Eckardt, J. J. & Ward, W. G. Treatment of metastatic disease of the tibia. Clin. Orthop. Relat. Res. 415 (Suppl.), S219–S229 (2003).

Li, Z. et al. Fibrinogen-albumin ratio index exhibits predictive value of neoadjuvant chemotherapy in osteosarcoma. Cancer Manag. Res. 14, 1671–1682 (2022).

Zhou, L. et al. Preoperative CT for prediction of local recurrence after curettage of giant cell tumor of bone. J. Bone Oncol. 29, 100366 (2021).

Zhao, Q. et al. Chondroblastoma: clinicopathological analyses of 307 cases from a single institution in China and the diagnostic value of the H3F3 K36M mutant antibody. J. Clin. Pathol. 76, 367–373 (2023).

Luo, Y. et al. Diagnostic value of H3F3A mutation and clinicopathological features of giant cell tumours in non-long bones. J. Bone Oncol. 38, 100467 (2023).

Schajowicz, F. & McGuire, M. H. Diagnostic difficulties in skeletal pathology. Clin. Orthop. Relat. Res. 240, 281–310 (1989).

Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K. & Gurevych, I. AdapterFusion: non-destructive task composition for transfer learning. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (eds Merlo, P. et al.) 503 – 487 (Association for Computational Linguistics, 2021).

Raisi, E. & Bach, S. H. Selecting auxiliary data using knowledge graphs for image classification with limited labels. In Proc 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 4026–4031 (IEEE, 2020); https://doi.org/10.1109/cvprw50498.2020.00473

Grand View Research. PET Scanners Market Size, Share & Trends Analysis Report, 2030 (Grand View Research, Inc., 2024); https://www.grandviewresearch.com/industry-analysis/pet-scanners-market-report

GlobalData. Diagnostic Imaging (DI) Market size, Share, Trends and Analysis by Product Type, Region and Segment Forecast to 2033 (GlobalData, 2023); https://www.globaldata.com/store/report/diagnostic-imaging-market-analysis/

Thakur, V. & Dey, K. Bone Scan Market Research Report Information: By Product (Radiopharmaceuticals, Imaging Devices), Application (Fractures, Arthritis, Paget's Disease of Bone, Cancer Originating in Bone), End User (Hospitals, Clinics, Diagnostic Centers)—Global Forecast till 2035 (Market Research Future, 2026); https://www.marketresearchfuture.com/reports/bone-scan-market-5027

De Chiffre, L., Carmignato, S., Kruth, J.-P., Schmitt, R. & Weckenmann, A. Industrial applications of computed tomography. CIRP Annals 63, 655–677 (2014).

Ogbole, G. I., Adeyomoye, A. O., Badu-Peprah, A., Mensah, Y. & Nzeh, D. A. Survey of magnetic resonance imaging availability in West Africa. Pan Afr. Med J. 30, 240 (2018).

Kritskiy, A. in Magnetic Materials and Technologies for Medical Applications (ed Tishin, A. M.) 613–623 (Elsevier, 2022).

Guo, X. et al. Synchronous bone metastasis in lung cancer: retrospective study of a single center of 15,716 patients from Tianjin, China. BMC Cancer 21, 613 (2021).

Hong, S., Youk, T., Lee, S. J., Kim, K. M. & Vajdic, C. M. Bone metastasis and skeletal-related events in patients with solid cancer: a Korean nationwide health insurance database study. PLoS ONE 15, e0234927 (2020).

Lower, E. E., Khan, S., Kennedy, D. & Baughman, R. P. Discordance of the estrogen receptor and HER-2/neu in breast cancer from primary lesion to first and second metastatic site. Breast Cancer 9, 515–520 (2017).

Skoulidis, F. & Heymach, J. V. Co-occurring genomic alterations in non-small-cell lung cancer biology and therapy. Nat. Rev. Cancer 19, 495–509 (2019).

Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations 45–67 (2021).

Darcet, T., Oquab, M., Mairal, J. & Bojanowski, P. Vision transformers need registers. In Proc. the Twelfth International Conference on Learning Representations https://openreview.net/pdf?id=2dnO3LLiJ1 (2024).

Acknowledgements

We thank all the investigators and study participants involved in this study. We thank J. Zhang from Shanghai General Hospital, Q. Lu from Shanghai East Hospital, S. Ai from Shanghai Ninth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, H. Sui from Pudong People’s Hospital, B. Song from Minhang Hospital, G. Wu from Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, W. Tan from Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, H. Jiang from Xuzhou Cancer Hospital, S. Chen from Yancheng No. 1 People’s Hospital and C. Wang from Yancheng Third People’s Hospital for their contributions to this study. Y.L. is supported by the National Natural Science Foundation of China (grant number 8225024), the Key Program of the National Natural Science Foundation of China (grant number 82530065), the Medicine and Engineering Interdisciplinary Program of Shanghai Jiao Tong University (grant number YG2024LC08), the National Key Research and Development Program of China (grant number 2023YFF1204804), the Shanghai Municipal Commission of Science and Technology Explorer Program (grant number 23TS1400400) and Shanghai Key Clinical Specialty (grant number shslczdzk03203). The funders had no role in study design, data collection, data analysis, data interpretation or writing of the paper.

Author information

These authors contributed equally: Hui Zhao, Ruipeng Zhang, Zhiyu Wang.

Authors and Affiliations

Metastatic Bone Tumor Clinical Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

Institute of Diagnostic and Interventional Radiology, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

Mailman School of Public Health, Columbia University, New York, NY, USA

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA

Contributions

H.Z., R.Z. and Z.W. contributed equally to this work. H.Z., Z.W., Y.G. and Y.L. conceived the medically oriented task design and organized data collection. Z.W. and Y.G. annotated the data. H.Z. and Y.L. performed final data curation and review. R.Z., S.W. and S.X. conceived and designed the algorithmic framework. R.Z. implemented the method and carried out the experiments. R.Z. and S.X. performed the statistical analysis. H.Z., R.Z., Z.W. and S.W. drafted the paper. H.Z., R.Z. and Z.W. prepared the figures. H.Z., S.W. and Y.L. supervised the study. All authors interpreted the results, critically revised the paper and approved the final version.

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Nature Biomedical Engineering thanks Fahmida Haque, Suk Hyun Lee and Suraj Pai for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cohort construction flowcharts for BoneCoT.

a, Flowchart illustrating the composition of the pre-training dataset used to develop BoneFM, the CT imaging backbone model of BoneCoT, assembled from retrospective skeletal CT examinations at Shanghai Sixth People’s Hospital. b, Flowchart depicting the construction of the internal fine-tuning/validation dataset used for downstream task training and 5-fold cross-validation, including eligibility criteria and exclusions. c, Flowchart depicting the construction of the external validation dataset, including harmonized eligibility criteria, reference-standard definition (pathology for tumor categories; follow-up confirmation for normal controls), exclusion causes, and participating hospitals from two regions: Region 1 (Shanghai Ninth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai East Hospital, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Pudong People’s Hospital, Minhang Hospital, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, and Shanghai General Hospital) and Region 2 (Xuzhou Cancer Hospital, Yancheng No. 1 People’s Hospital, and Yancheng Third People’s Hospital).

Extended Data Fig. 2 Inference architecture of BoneCoT and BoneFM.

Illustration of the inference structures for BoneCoT and BoneFM, both based on the ViT architecture. BoneFM exclusively accepts CT images as input for feature extraction and prediction. In contrast, BoneCoT integrates CT images with hidden features from related tasks as additional inputs, enabling multimodal reasoning through the incorporation of task interdependencies.

Extended Data Fig. 3 Supplementary diagnostic performance of BoneCoT.

a, AUPRC performance comparison of Merlin, DINOv2, and BoneCoT on the internal fine-tuning/validation dataset across 26 classification tasks, categorized into five types: BM diagnosis, BM quality classification, complications, tumor classification and biomarkers. The ‘Type of primary tumor’ result represents the average performance across predictions for nine original solid cancer types. b,c, AUROC and AUPRC results comparing BoneCoT’s reasoning inference with fine-tuning BoneFM independently for each task. Error bars for BoneFM represent the standard deviation across five folds.

Extended Data Fig. 4 Regional breakdown of core diagnostic tasks (AUROC).

Detailed results for three core diagnostic tasks, bone lesion detection, benign versus malignant classification, and primary versus metastatic classification, evaluated across four anatomical regions: vertebrae, rib, pelvis, and extremities.

Extended Data Fig. 5 Regional breakdown of core diagnostic tasks (AUPRC).

Detailed results for three core diagnostic tasks, bone lesion detection, benign versus malignant classification, and primary versus metastatic classification, evaluated across four anatomical regions: vertebrae, rib, pelvis, and extremities.

Extended Data Fig. 6 Results of external data evaluation.

a,b, AUPRC curves comparing the performance of BoneCoT, DINOv2 and Merlin on three core diagnostic tasks using multi-center external validation datasets from two distinct geographic regions.

Supplementary information

Supplementary Information (download PDF )

Supplementary Text Sections 1–5, Figs. 1–3, Tables 1–7 and References.

Reporting Summary (download PDF )

Supplementary Data 1 (download XLSX )

Supplementary Data 2 (download XLSX )

Supplementary Data 3 (download XLSX )

Source data

Source Data Fig. 1 (download XLSX )

Source Data Fig. 2 (download XLSX )

Source Data Fig. 3 (download XLSX )

Source Data Fig. 4 (download XLSX )

Source Data Extended Data Fig. 3 (download XLSX )

Source Data Extended Data Fig. 4 (download XLSX )

Source Data Extended Data Fig. 5 (download XLSX )

Source Data Extended Data Fig. 6 (download XLSX )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Cite this article

Zhao, H., Zhang, R., Wang, Z. et al. BoneCoT: multicentre validation of a whole-body skeleton foundation model for bone metastases guided by clinician-derived chain of thought. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01736-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

这篇还没有中文全文

该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。

挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文