Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.isci.2023.108163
Title: Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
Authors: Pushpanathan, K 
Lim, ZW
Er Yew, SM 
Chen, DZ
Hui'En Lin, HA 
Lin Goh, JH
Wong, WM
Wang, X
Jin Tan, MC 
Chang Koh, VT 
Tham, YC 
Keywords: Artificial intelligence
Ophthalmology
Issue Date: 17-Nov-2023
Citation: Pushpanathan, K, Lim, ZW, Er Yew, SM, Chen, DZ, Hui'En Lin, HA, Lin Goh, JH, Wong, WM, Wang, X, Jin Tan, MC, Chang Koh, VT, Tham, YC (2023-11-17). Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 26 (11) : 108163-. ScholarBank@NUS Repository. https://doi.org/10.1016/j.isci.2023.108163
Abstract: In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
Source Title: iScience
URI: https://scholarbank.nus.edu.sg/handle/10635/245924
ISSN: 2589-0042
DOI: 10.1016/j.isci.2023.108163
Appears in Collections:Staff Publications
Elements

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
main.pdfPublished version2.95 MBAdobe PDF

OPEN

PublishedView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.