Yazar "Cesur, Turay" seçeneğine göre listele
Listeleniyor 1 - 9 / 9
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS(Istanbul Univ, Fac Medicine, Publ Off, 2024) Camur, Eren; Cesur, Turay; Gunes, Yasin CelalObjective: This study evaluated the effectiveness of various large language models (LLMs) in simplifying Turkish Computed Tomograpghy (CT) reports, a common imaging modality. Material and Method: Using fictional CT findings, we followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) and the Declaration of Helsinki. Fifty fictional Turkish CT findings were generated. Four LLMs (ChatGPT 4, ChatGPT-3.5, Gemini 1.5 Pro, and Claude 3 Opus) simplified reports using the prompt: Please explain them in a way that someone without a medical background can understand in Turkish. Evaluations were based on the Ate man & sacute; Readability Index and Likert scale for accuracy and readability. Results: Claude 3 Opus scored the highest in readability (58.9), followed by ChatGPT-3.5 (54.5), Gemini 1.5 Pro (53.7), and ChatGPT 4 (45.1). Likert scores for Claude 3 Opus (mean: 4.7) and ChatGPT 4 (mean: 4.5) showed no significant difference (p>0.05). ChatGPT 4 had the highest word count (96.98) compared to Claude 3 Opus (90.6), Gemini 1.5 Pro (74.4), and ChatGPT-3.5 (38.7) (p<0.001). Conclusion: This study shows that LLMs can simplify Turkish CT reports at a level that individuals without medical knowledge can understand and with high readability and accuracy. ChatGPT 4 and Claude 3 Opus produced the most comprehensible simplifications. Claude 3 Opus' simpler sentences may make it the optimal choice for simplifying Turkish CT reports.Öğe Accuracies of large language models in answering radiation protection questions(Iop Publishing Ltd, 2024) Camur, Eren; Cesur, Turay; Gunes, Yasin Celal[Abstract No tAvailable]Öğe Accuracy of Large Language Models in ACR Manual on Contrast Media-Related Questions(Elsevier Science Inc, 2024) Gunes, Yasin Celal; Cesur, Turay[Abstract No tAvailable]Öğe Comparative Analysis of Large Language Models in Simplifying Turkish Ultrasound Reports to Enhance Patient Understanding(Pera Yayincilik Hizmetleri, 2024) Gunes, Yasin Celal; Cesur, Turay; Camur, ErenObjective: To evaluate and compare the abilities of Language Models (LLMs) in simplifying Turkish ultrasound (US) findings for patients. Methods: We assessed the simplification performance of four LLMs: ChatGPT 4, Gemini 1.5 Pro, Claude 3 Opus, and Perplexity, using fifty fictional Turkish US findings. Comparison was based on Ate man's Readability Index and word count. Three radiologists rated medical accuracy, consistency, and comprehensibility on a Likert scale from 1 to 5. Statistical tests (Friedman, Wilcoxon, and Spearman correlation) examined differences in LLMs' performance. Results: Gemini 1.5 Pro, ChatGPT-4, and Claude 3 Opus received high Likert scores for medical accuracy, consistency, and comprehensibility (mean: 4.7-4.8). Perplexity scored significantly lower (mean: 4.1, p<0.001). Gemini 1.5 Pro achieved the highest readability score (mean: 61.16), followed by ChatGPT-4 (mean: 58.94) and Claude 3 Opus (mean: 51.16). Perplexity had the lowest readability score (mean: 47.01). Gemini 1.5 Pro and ChatGPT-4 used significantly more words compared to Claude 3 Opus and Perplexity (p<0.001). Linear correlation analysis revealed a positive correlation between word count of fictional US findings and responses generated by Gemini 1.5 Pro (correlation coefficient = 0.38, p<0.05) and ChatGPT-4 (correlation coefficient = 0.43, p<0.001). Conclusion: This study highlights strong potential of LLMs in simplifying Turkish US and Claude 3 Opus performed well, highlighting their effectiveness in healthcare communication. Further research is required to fully understand the integration of making.Öğe Comparison of the performance of large language models and general radiologist on Ovarian-Adnexal Reporting and Data System (O-RADS)-related questions(Ame Publishing Company, 2024) Camur, Eren; Cesur, Turay; Gunes, Yasin Celal[Abstract No tAvailable]Öğe Correspondence on 'Evaluation of ChatGPT in knowledge of newly evolving neurosurgery: middle meningeal artery embolization for subdural hematoma management' by Koester et al(Bmj Publishing Group, 2024) Gunes, Yasin Celal; Camur, Eren; Cesur, Turay[Abstract No tAvailable]Öğe Large Language Models: Could They Be the Next Generation of Clinical Decision Support Systems in Cardiovascular Diseases?(Kare Publ, 2024) Gunes, Yasin Celal; Cesur, Turay[Abstract No tAvailable]Öğe Letter to Editor Regarding Assessing the Capability of ChatGPT, Google Bard, and Microsoft Bing in Solving Radiology Case Vignettes(Thieme Medical Publ Inc, 2024) Gunes, Yasin Celal; Cesur, Turay[Abstract No tAvailable]Öğe Optimizing Diagnostic Performance of ChatGPT: The Impact of Prompt Engineering on Thoracic Radiology Cases(Springernature, 2024) Cesur, Turay; Gunes, Yasin CelalBackground Recent studies have highlighted the diagnostic performance of ChatGPT 3.5 and GPT-4 in a text -based format, demonstrating their radiological knowledge across different areas. Our objective is to investigate the impact of prompt engineering on the diagnostic performance of ChatGPT 3.5 and GPT-4 in diagnosing thoracic radiology cases, highlighting how the complexity of prompts influences model performance. Methodology We conducted a retrospective cross-sectional study using 124 publicly available Case of the Month examples from the Thoracic Society of Radiology website. We initially input the cases into the ChatGPT versions without prompting. Then, we employed five different prompts, ranging from basic task -oriented to complex role-specific formulations to measure the diagnostic accuracy of ChatGPT versions. The differential diagnosis lists generated by the models were compared against the radiological diagnoses listed on the Thoracic Society of Radiology website, with a scoring system in place to comprehensively assess the accuracy. Diagnostic accuracy and differential diagnosis scores were analyzed using the McNemar, Chisquare, Kruskal-Wallis, and Mann -Whitney U tests. Results Without any prompts, ChatGPT 3.5's accuracy was 25% (31/124), which increased to 56.5% (70/124) with the most complex prompt ( P < 0.001). GPT-4 showed a high baseline accuracy at 53.2% (66/124) without prompting. This accuracy increased to 59.7% (74/124) with complex prompts ( P = 0.09). Notably, there was no statistical difference in peak performance between ChatGPT 3.5 (70/124) and GPT-4 (74/124) ( P = 0.55). Conclusions This study emphasizes the critical influence of prompt engineering on enhancing the diagnostic performance of ChatGPT versions, especially ChatGPT 3.5.