A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS
[ X ]
Tarih
2024
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Istanbul Univ, Fac Medicine, Publ Off
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Objective: This study evaluated the effectiveness of various large language models (LLMs) in simplifying Turkish Computed Tomograpghy (CT) reports, a common imaging modality. Material and Method: Using fictional CT findings, we followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) and the Declaration of Helsinki. Fifty fictional Turkish CT findings were generated. Four LLMs (ChatGPT 4, ChatGPT-3.5, Gemini 1.5 Pro, and Claude 3 Opus) simplified reports using the prompt: Please explain them in a way that someone without a medical background can understand in Turkish. Evaluations were based on the Ate man & sacute; Readability Index and Likert scale for accuracy and readability. Results: Claude 3 Opus scored the highest in readability (58.9), followed by ChatGPT-3.5 (54.5), Gemini 1.5 Pro (53.7), and ChatGPT 4 (45.1). Likert scores for Claude 3 Opus (mean: 4.7) and ChatGPT 4 (mean: 4.5) showed no significant difference (p>0.05). ChatGPT 4 had the highest word count (96.98) compared to Claude 3 Opus (90.6), Gemini 1.5 Pro (74.4), and ChatGPT-3.5 (38.7) (p<0.001). Conclusion: This study shows that LLMs can simplify Turkish CT reports at a level that individuals without medical knowledge can understand and with high readability and accuracy. ChatGPT 4 and Claude 3 Opus produced the most comprehensible simplifications. Claude 3 Opus' simpler sentences may make it the optimal choice for simplifying Turkish CT reports.
Açıklama
Anahtar Kelimeler
Large language model; radiology reports; readability; computed tomography; Turkish; simplifying
Kaynak
Journal of Istanbul Faculty of Medicine-Istanbul Tip Fakultesi Dergisi
WoS Q Değeri
N/A
Scopus Q Değeri
Cilt
87
Sayı
4