Treffer: Evaluation of ChatGPT accuracy and reliability in answering questions about exercise recommendations for breast cancer survivors.

Title:
Evaluation of ChatGPT accuracy and reliability in answering questions about exercise recommendations for breast cancer survivors.
Authors:
Bernal-Utrera C; Departamento de Fisioterapia, Facultad de Enfermería, Fisioterapia y Podología, Universidad de Sevilla, Sevilla, Spain., Bravo-Vázquez A; Programa Doctoral en Ciencias de la Salud, Universidad de Sevilla, Sevilla, Spain., Montero-Bancalero FJ; Escuela Universitaria de Osuna (Centre Attached to the University of Seville), Sevilla, Spain., Suárez-Vega A; Emergency Department, Juan Ramón Jiménez University Hospital, Huelva, Spain., Casuso-Holgado MJ; Departamento de Fisioterapia, Facultad de Enfermería, Fisioterapia y Podología, Universidad de Sevilla, Sevilla, Spain; Instituto de Biomedicina de Sevilla-IBiS (Hospitales Universitarios Virgen del Rocío y Macarena/CSIC/Universidad de Sevilla), Sevilla, Spain; CTS 1110: UMSS Research Group, Spain. Electronic address: mcasuso@us.es., Anarte-Lazo E; Faculty of Health, UNIE University, Madrid, Spain.
Source:
Physiotherapy [Physiotherapy] 2026 Mar; Vol. 130, pp. 101838. Date of Electronic Publication: 2025 Aug 16.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Chartered Society of Physiotherapy London Country of Publication: England NLM ID: 0401223 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1873-1465 (Electronic) Linking ISSN: 00319406 NLM ISO Abbreviation: Physiotherapy Subsets: MEDLINE
Imprint Name(s):
Publication: London : Chartered Society of Physiotherapy London
Original Publication: London.
Contributed Indexing:
Keywords: Artificial intelligence; Breast cancer; ChatGPT; Therapeutic exercise
Entry Date(s):
Date Created: 20251121 Date Completed: 20260123 Latest Revision: 20260123
Update Code:
20260130
DOI:
10.1016/j.physio.2025.101838
PMID:
41270301
Database:
MEDLINE

Weitere Informationen

Objective: To assess the accuracy and reliability of an artificial intelligence (AI) chatbot (ChatGPT) in providing answers about exercise recommendations for breast cancer survivors.
Design: Cross-sectional study.
Methods: We extracted recommendations from recent systematic reviews of clinical practice guidelines (CPGs) on therapeutic physical exercise in breast cancer survivors. Clinical questions were developed and queried to ChatGPT-4. We evaluated the performance of ChatGPT-4 as a counseling tool by assessing the accuracy of responses (percentage of agreement with CPGs recommendations, weighted Cohen's kappa and the percentage of text wording similarity) and the intra-rater and inter-rater reliability in grading ChatGPT answers (kappa value).
Results: We tested 15 clinical questions. The accuracy of the AI chatbot´s responses was considered unacceptable, with only 7% of responses being reasonably accurate (1/14), 64% of responses being partially accurate (9/14) and 29% being completely incorrect (4/14). It was observed a low Kappa coefficient (k = 0.244, CI: 0.089 to 0.577) and the similarity of responses was also considered unacceptable, with 27.2% of overlapping text wording. Intra and inter-rater reliability showed moderate to good values in all cases.
Conclusions: ChatGPT does not appear to be an accurate counselling tool for answering questions about exercise recommendations for breast cancer survivors. Compared to CPG´s recommendations, the accuracy of ChatGPT responses was considered poor, with moderate to good reliability. It is important to make patients know that they should not only base their decisions on information coming from ChatGPT. CONTRIBUTION OF THE PAPER.
(Copyright © 2025 The Authors. Published by Elsevier Ltd.. All rights reserved.)

Conflict of interest The authors declare no conflict of interest.