Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.patter.2020.100129
DC FieldValue
dc.titleExtensions of the External Validation for Checking Learned Model Interpretability and Generalizability
dc.contributor.authorHo, S.Y.
dc.contributor.authorPhua, K.
dc.contributor.authorWong, L.
dc.contributor.authorBin Goh, W.W.
dc.date.accessioned2021-08-25T14:16:27Z
dc.date.available2021-08-25T14:16:27Z
dc.date.issued2020
dc.identifier.citationHo, S.Y., Phua, K., Wong, L., Bin Goh, W.W. (2020). Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability. Patterns 1 (8) : 100129. ScholarBank@NUS Repository. https://doi.org/10.1016/j.patter.2020.100129
dc.identifier.issn26663899
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/199384
dc.description.abstractWe discuss the validation of machine learning models, which is standard practice in determining model efficacy and generalizability. We argue that internal validation approaches, such as cross-validation and bootstrap, cannot guarantee the quality of a machine learning model due to potentially biased training data and the complexity of the validation procedure itself. For better evaluating the generalization ability of a learned model, we suggest leveraging on external data sources from elsewhere as validation datasets, namely external validation. Due to the lack of research attractions on external validation, especially a well-structured and comprehensive study, we discuss the necessity for external validation and propose two extensions of the external validation approach that may help reveal the true domain-relevant model from a candidate set. Moreover, we also suggest a procedure to check whether a set of validation datasets is valid and introduce statistical reference points for detecting external data problems. External validation is critical for establishing machine learning model quality. To improve rigor and introduce structure into external validation processes, we propose two extensions, convergent and divergent validation. Using a case study, we demonstrate how convergent and divergent validations are set up and also discuss technical considerations for gauging performance, including establishment of statistical rigor, how to acquire valid external data, determining the number of times an external validation needs to be performed, and what to do when multiple external validations disagree with each other. Finally, we highlight that external validation remains and will be highly relevant, even to new machine learning paradigms. External validation is an important step for confirming potential for practical deployments of a machine learning or artificial intelligence algorithm. Unfortunately, this process is not well structured. We discuss how to make external validations more robust and systematic and also introduce two new extensions: convergent and divergent validations. © 2020 The Authors
dc.publisherCell Press
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.sourceScopus OA2020
dc.subjectcomputational biology
dc.subjectdata science
dc.subjectdescriptive statistics
dc.subjectDSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
dc.subjectexploratory data analysis
dc.subjectscientific method
dc.typeReview
dc.contributor.departmentDEPARTMENT OF COMPUTER SCIENCE
dc.description.doi10.1016/j.patter.2020.100129
dc.description.sourcetitlePatterns
dc.description.volume1
dc.description.issue8
dc.description.page100129
Appears in Collections:Elements
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1016_j_patter_2020_100129.pdf980.36 kBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons