Publication

AUTOMATED PIPELINES FOR ENHANCED ENERGY DATA QUALITY: ANOMALY DETECTION, DATA IMPUTATION, AND GENERATIVE MODELING

FU CHUN
Citations
Altmetric:
Alternative Title
Abstract
With the proliferation of smart meters and sensors, enormous volumes of energy data are being generated. However, anomalies and missing values frequently compromise data quality, undermining the accuracy of machine learning models for tasks like forecasting and optimization. Although techniques ranging from statistical methods to neural networks have been applied for data imputation and anomaly detection, they have limitations in scalability, robustness, and leveraging contextual patterns. This thesis introduces an automated three-phase pipeline to enhance energy data quality and completeness. First, a generalizable tree-based ensemble model is proposed for anomaly detection, achieving AUC of 0.98 on a benchmark dataset. Second, by transforming one-dimensional energy data into two-dimensional images, state-of-art image inpainting techniques are shown to reduce MSE in imputing missing values, outperforming the baselines. Third, a novel meta-driven generative model is developed to generate customizable, high-fidelity synthetic energy data by incorporating building metadata. Upon validation on the dataset with 3,053 meters, the proposed diffusion model outperforms competing models in both diversity and fidelity, evidenced by significant reductions in both FID score and KL-Divergence. The thesis presents a generalizable pipeline to improve energy data quality, enhancing downstream tasks and paving the way for smarter energy management and sustainability.
Keywords
Machine learning, Building energy, Smart meter, Anomaly detection, Data imputation, Generative model
Source Title
Publisher
Series/Report No.
Organizational Units
Organizational Unit
Rights
Date
2023-12-15
DOI
Type
Thesis
Additional Links
Related Datasets
Related Publications