Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.gsf.2020.03.017
Title: Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning
Authors: Zheng, Shuo
Zhu, Yu-Xin
Li, Dian-Qing
Cao, Zi-Jun
Deng, Qin-Xuan
Phoon, Kok-Kwang 
Keywords: Bayesian machine learning
Mahalanobis distance
Outlier detection
Resampling by half-means
Site investigation
Sparse multivariate data
Issue Date: 1-Jan-2021
Publisher: Elsevier B.V.
Citation: Zheng, Shuo, Zhu, Yu-Xin, Li, Dian-Qing, Cao, Zi-Jun, Deng, Qin-Xuan, Phoon, Kok-Kwang (2021-01-01). Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning. Geoscience Frontiers 12 (1) : 425-439. ScholarBank@NUS Repository. https://doi.org/10.1016/j.gsf.2020.03.017
Rights: Attribution-NonCommercial-NoDerivatives 4.0 International
Abstract: Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances (i.e., outliers) that do not conform with the expected pattern of regular data instances. With sparse multivariate data obtained from geotechnical site investigation, it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity. This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation. The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5. It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts, rationally, for the statistical uncertainty by Bayesian machine learning. Moreover, the proposed approach also suggests an exclusive method to determine outlying components of each outlier. The proposed approach is illustrated and verified using simulated and real-life dataset. It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner. It can significantly reduce the masking effect (i.e., missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty). It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification. This emphasizes the necessity of data cleaning process (e.g., outlier detection) for uncertainty quantification based on geoscience data. © 2020 China University of Geosciences (Beijing) and Peking University
Source Title: Geoscience Frontiers
URI: https://scholarbank.nus.edu.sg/handle/10635/232564
ISSN: 1674-9871
DOI: 10.1016/j.gsf.2020.03.017
Rights: Attribution-NonCommercial-NoDerivatives 4.0 International
Appears in Collections:Elements
Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1016_j_gsf_2020_03_017.pdf5.09 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons