Treffer: Evaluation of ChatGPT accuracy and reliability in answering questions about exercise recommendations for breast cancer survivors.
Original Publication: London.
Weitere Informationen
Objective: To assess the accuracy and reliability of an artificial intelligence (AI) chatbot (ChatGPT) in providing answers about exercise recommendations for breast cancer survivors.
Design: Cross-sectional study.
Methods: We extracted recommendations from recent systematic reviews of clinical practice guidelines (CPGs) on therapeutic physical exercise in breast cancer survivors. Clinical questions were developed and queried to ChatGPT-4. We evaluated the performance of ChatGPT-4 as a counseling tool by assessing the accuracy of responses (percentage of agreement with CPGs recommendations, weighted Cohen's kappa and the percentage of text wording similarity) and the intra-rater and inter-rater reliability in grading ChatGPT answers (kappa value).
Results: We tested 15 clinical questions. The accuracy of the AI chatbot´s responses was considered unacceptable, with only 7% of responses being reasonably accurate (1/14), 64% of responses being partially accurate (9/14) and 29% being completely incorrect (4/14). It was observed a low Kappa coefficient (k = 0.244, CI: 0.089 to 0.577) and the similarity of responses was also considered unacceptable, with 27.2% of overlapping text wording. Intra and inter-rater reliability showed moderate to good values in all cases.
Conclusions: ChatGPT does not appear to be an accurate counselling tool for answering questions about exercise recommendations for breast cancer survivors. Compared to CPG´s recommendations, the accuracy of ChatGPT responses was considered poor, with moderate to good reliability. It is important to make patients know that they should not only base their decisions on information coming from ChatGPT. CONTRIBUTION OF THE PAPER.
(Copyright © 2025 The Authors. Published by Elsevier Ltd.. All rights reserved.)
Conflict of interest The authors declare no conflict of interest.