A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS

[ X ]

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Istanbul Univ, Fac Medicine, Publ Off

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Objective: This study evaluated the effectiveness of various large language models (LLMs) in simplifying Turkish Computed Tomograpghy (CT) reports, a common imaging modality. Material and Method: Using fictional CT findings, we followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) and the Declaration of Helsinki. Fifty fictional Turkish CT findings were generated. Four LLMs (ChatGPT 4, ChatGPT-3.5, Gemini 1.5 Pro, and Claude 3 Opus) simplified reports using the prompt: Please explain them in a way that someone without a medical background can understand in Turkish. Evaluations were based on the Ate man & sacute; Readability Index and Likert scale for accuracy and readability. Results: Claude 3 Opus scored the highest in readability (58.9), followed by ChatGPT-3.5 (54.5), Gemini 1.5 Pro (53.7), and ChatGPT 4 (45.1). Likert scores for Claude 3 Opus (mean: 4.7) and ChatGPT 4 (mean: 4.5) showed no significant difference (p>0.05). ChatGPT 4 had the highest word count (96.98) compared to Claude 3 Opus (90.6), Gemini 1.5 Pro (74.4), and ChatGPT-3.5 (38.7) (p<0.001). Conclusion: This study shows that LLMs can simplify Turkish CT reports at a level that individuals without medical knowledge can understand and with high readability and accuracy. ChatGPT 4 and Claude 3 Opus produced the most comprehensible simplifications. Claude 3 Opus' simpler sentences may make it the optimal choice for simplifying Turkish CT reports.

Açıklama

Anahtar Kelimeler

Large language model; radiology reports; readability; computed tomography; Turkish; simplifying

Kaynak

Journal of Istanbul Faculty of Medicine-Istanbul Tip Fakultesi Dergisi

WoS Q Değeri

N/A

Scopus Q Değeri

Cilt

87

Sayı

4

Künye