Generative Modeling for Interpretable Anomaly Detection in Medical Imaging: Applications in Failure Detection and Data Curation

McKell E. Woodland, Mais Altaie, Caleb S. O’Connor, Austin H. Castelo, Olubunmi C. Lebimoyo, Aashish C. Gupta, Joshua P. Yung, Paul E. Kinahan, Clifton D. Fuller, Eugene J. Koay, Bruno C. Odisio, Ankit B. Patel, Kristy K. Brock

Research output: Contribution to journalArticlepeer-review

Abstract

This work aims to leverage generative modeling-based anomaly detection to enhance interpretability in AI failure detection systems and to aid data curation for large repositories. For failure detection interpretability, this retrospective study utilized 3339 CT scans (525 patients), divided patient-wise into training, baseline test, and anomaly (having failure-causing attributes—e.g., needles, ascites) test datasets. For data curation, 112,120 ChestX-ray14 radiographs were used for training and 2036 radiographs from the Medical Imaging and Data Resource Center for testing, categorized as baseline or anomalous based on attribute alignment with ChestX-ray14. StyleGAN2 networks modeled the training distributions. Test images were reconstructed with backpropagation and scored using mean squared error (MSE) and Wasserstein distance (WD). Scores should be high for anomalous images, as StyleGAN2 cannot model unseen attributes. Area under the receiver operating characteristic curve (AUROC) evaluated anomaly detection, i.e., baseline and anomaly dataset differentiation. The proportion of highest-scoring patches containing needles or ascites assessed anomaly localization. Permutation tests determined statistical significance. StyleGAN2 did not reconstruct anomalous attributes (e.g., needles, ascites), enabling the unsupervised detection of these attributes: mean (±standard deviation) AUROCs were 0.86 (±0.13) for failure detection and 0.82 (±0.11) for data curation. 81% (±13%) of the needles and ascites were localized. WD outperformed MSE on CT (p < 0.001), while MSE outperformed WD on radiography (p < 0.001). Generative models detected anomalous image attributes, demonstrating promise for model failure detection interpretability and large-scale data curation.

Original languageEnglish (US)
Article number1106
JournalBioengineering
Volume12
Issue number10
DOIs
StatePublished - Oct 14 2025

Keywords

  • anomaly detection
  • data curation
  • failure detection
  • generative adversarial network
  • generative modeling

ASJC Scopus subject areas

  • Bioengineering

Fingerprint

Dive into the research topics of 'Generative Modeling for Interpretable Anomaly Detection in Medical Imaging: Applications in Failure Detection and Data Curation'. Together they form a unique fingerprint.

Cite this