Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.isci.2023.108163
DC FieldValue
dc.titlePopular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
dc.contributor.authorPushpanathan, K
dc.contributor.authorLim, ZW
dc.contributor.authorEr Yew, SM
dc.contributor.authorChen, DZ
dc.contributor.authorHui'En Lin, HA
dc.contributor.authorLin Goh, JH
dc.contributor.authorWong, WM
dc.contributor.authorWang, X
dc.contributor.authorJin Tan, MC
dc.contributor.authorChang Koh, VT
dc.contributor.authorTham, YC
dc.date.accessioned2023-11-14T04:22:48Z
dc.date.available2023-11-14T04:22:48Z
dc.date.issued2023-11-17
dc.identifier.citationPushpanathan, K, Lim, ZW, Er Yew, SM, Chen, DZ, Hui'En Lin, HA, Lin Goh, JH, Wong, WM, Wang, X, Jin Tan, MC, Chang Koh, VT, Tham, YC (2023-11-17). Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 26 (11) : 108163-. ScholarBank@NUS Repository. https://doi.org/10.1016/j.isci.2023.108163
dc.identifier.issn2589-0042
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/245924
dc.description.abstractIn light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
dc.sourceElements
dc.subjectArtificial intelligence
dc.subjectOphthalmology
dc.typeArticle
dc.date.updated2023-11-11T11:46:12Z
dc.contributor.departmentDEAN'S OFFICE (DUKE-NUS MEDICAL SCHOOL)
dc.contributor.departmentOPHTHALMOLOGY
dc.description.doi10.1016/j.isci.2023.108163
dc.description.sourcetitleiScience
dc.description.volume26
dc.description.issue11
dc.description.page108163-
dc.published.stateUnpublished
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
main.pdfPublished version2.95 MBAdobe PDF

OPEN

PublishedView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.