Please use this identifier to cite or link to this item:
https://doi.org/10.1016/j.isci.2023.108163
DC Field | Value | |
---|---|---|
dc.title | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries | |
dc.contributor.author | Pushpanathan, K | |
dc.contributor.author | Lim, ZW | |
dc.contributor.author | Er Yew, SM | |
dc.contributor.author | Chen, DZ | |
dc.contributor.author | Hui'En Lin, HA | |
dc.contributor.author | Lin Goh, JH | |
dc.contributor.author | Wong, WM | |
dc.contributor.author | Wang, X | |
dc.contributor.author | Jin Tan, MC | |
dc.contributor.author | Chang Koh, VT | |
dc.contributor.author | Tham, YC | |
dc.date.accessioned | 2023-11-14T04:22:48Z | |
dc.date.available | 2023-11-14T04:22:48Z | |
dc.date.issued | 2023-11-17 | |
dc.identifier.citation | Pushpanathan, K, Lim, ZW, Er Yew, SM, Chen, DZ, Hui'En Lin, HA, Lin Goh, JH, Wong, WM, Wang, X, Jin Tan, MC, Chang Koh, VT, Tham, YC (2023-11-17). Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 26 (11) : 108163-. ScholarBank@NUS Repository. https://doi.org/10.1016/j.isci.2023.108163 | |
dc.identifier.issn | 2589-0042 | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/245924 | |
dc.description.abstract | In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use. | |
dc.source | Elements | |
dc.subject | Artificial intelligence | |
dc.subject | Ophthalmology | |
dc.type | Article | |
dc.date.updated | 2023-11-11T11:46:12Z | |
dc.contributor.department | DEAN'S OFFICE (DUKE-NUS MEDICAL SCHOOL) | |
dc.contributor.department | OPHTHALMOLOGY | |
dc.description.doi | 10.1016/j.isci.2023.108163 | |
dc.description.sourcetitle | iScience | |
dc.description.volume | 26 | |
dc.description.issue | 11 | |
dc.description.page | 108163- | |
dc.published.state | Unpublished | |
Appears in Collections: | Staff Publications Elements |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
main.pdf | Published version | 2.95 MB | Adobe PDF | OPEN | Published | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.