Gunes, Yasin CelalCesur, TurayCamur, Eren2025-01-212025-01-2120242564-77842564-7040https://doi.org/10.58600/eurjther2225https://hdl.handle.net/20.500.12587/24366Objective: To evaluate and compare the abilities of Language Models (LLMs) in simplifying Turkish ultrasound (US) findings for patients. Methods: We assessed the simplification performance of four LLMs: ChatGPT 4, Gemini 1.5 Pro, Claude 3 Opus, and Perplexity, using fifty fictional Turkish US findings. Comparison was based on Ate man's Readability Index and word count. Three radiologists rated medical accuracy, consistency, and comprehensibility on a Likert scale from 1 to 5. Statistical tests (Friedman, Wilcoxon, and Spearman correlation) examined differences in LLMs' performance. Results: Gemini 1.5 Pro, ChatGPT-4, and Claude 3 Opus received high Likert scores for medical accuracy, consistency, and comprehensibility (mean: 4.7-4.8). Perplexity scored significantly lower (mean: 4.1, p<0.001). Gemini 1.5 Pro achieved the highest readability score (mean: 61.16), followed by ChatGPT-4 (mean: 58.94) and Claude 3 Opus (mean: 51.16). Perplexity had the lowest readability score (mean: 47.01). Gemini 1.5 Pro and ChatGPT-4 used significantly more words compared to Claude 3 Opus and Perplexity (p<0.001). Linear correlation analysis revealed a positive correlation between word count of fictional US findings and responses generated by Gemini 1.5 Pro (correlation coefficient = 0.38, p<0.05) and ChatGPT-4 (correlation coefficient = 0.43, p<0.001). Conclusion: This study highlights strong potential of LLMs in simplifying Turkish US and Claude 3 Opus performed well, highlighting their effectiveness in healthcare communication. Further research is required to fully understand the integration of making.eninfo:eu-repo/semantics/openAccessLarge Language Models; ChatGPT; Claude 3 Opus; Ultrasound; SimplifyComparative Analysis of Large Language Models in Simplifying Turkish Ultrasound Reports to Enhance Patient UnderstandingArticle10.58600/eurjther2225WOS:001289504400001N/A