Date of Award
3-10-2025
Document Type
Doctoral Thesis
Degree Name
Doctor of Philosophy
First Advisor
Dr. Robert Sheehy
Second Advisor
Dr. Ted Scully
Third Advisor
Dr. Pat Doody
Abstract
Anomaly detection is an important area of research within data mining and has applications in many domains, including cybersecurity, finance, and healthcare. Conventional methods depend on labelled training data and assume an even class distribution, restricting their 'real-world' application, where rare anomalies and unlabelled data are the norm. This research addresses these limitations by investigating how unlabelled data can support anomaly detection in imbalanced, data-scarce, complex environments. It addresses three research questions: how to integrate unlabelled data into anomaly detection models, how to improve performance when labelled anomalies are scarce, and how graph-based methods capture complex relational anomalies. Three empirical studies were conducted. The first study applies semi-supervised learning and Variational Autoencoders (VAEs) to improve detection in class-imbalanced datasets. The second combines Positive Unlabelled (PU) learning with Generative Adversarial Networks (GANs) to enhance detection in sparse data scenarios. The third study introduces a graph-based framework using knowledge graph embeddings, ensemble methods, and pseudo-labelling to detect anomalies in structured graph data. Each study is evaluated on synthetic, benchmark, and real-world datasets. Statistical tests assess performance over baseline models, and results show that incorporating unlabelled data improves detection accuracy and robustness. This research adds methodological and theoretical novelties to existing methods, indicating the potential for sophisticated and flexible anomaly detection using unlabelled data and structures. It provides actionable insights for optimising scalable, flexible systems and outlines areas for future exploration to enhance generalisability and computational performance.
Recommended Citation
Shields, Andrew, "Uncovering the efficacy of unlabelled data in anomaly detection" (2025). Theses [online].
Available at: https://sword.mtu.ie/allthe/840
Access Level
info:eu-repo/semantics/openAccess