Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.isci.2023.108163

Title:	Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
Authors:	Pushpanathan, K Lim, ZW Er Yew, SM Chen, DZ Hui'En Lin, HA Lin Goh, JH Wong, WM Wang, X Jin Tan, MC Chang Koh, VT Tham, YC
Keywords:	Artificial intelligence Ophthalmology
Issue Date:	17-Nov-2023
Citation:	Pushpanathan, K, Lim, ZW, Er Yew, SM, Chen, DZ, Hui'En Lin, HA, Lin Goh, JH, Wong, WM, Wang, X, Jin Tan, MC, Chang Koh, VT, Tham, YC (2023-11-17). Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 26 (11) : 108163-. ScholarBank@NUS Repository. https://doi.org/10.1016/j.isci.2023.108163
Abstract:	In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
Source Title:	iScience
URI:	https://scholarbank.nus.edu.sg/handle/10635/245924
ISSN:	2589-0042
DOI:	10.1016/j.isci.2023.108163
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
main.pdf	Published version	2.95 MB	Adobe PDF	OPEN	Published	View/Download

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.