k-POD: A Method for k-Means Clustering of Missing Data

Jocelyn T. Chi, Eric C. Chi, Richard G. Baraniuk

Research output: Contribution to journalArticle

37 Scopus citations

Abstract

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data. [Received November 2014. Revised August 2015.]

Original languageEnglish (US)
Pages (from-to)91-99
Number of pages9
JournalAmerican Statistician
Volume70
Issue number1
DOIs
StatePublished - Jan 2 2016

Keywords

  • Clustering
  • Imputation
  • Majorization-minimization
  • Missing data
  • k-means

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'k-POD: A Method for k-Means Clustering of Missing Data'. Together they form a unique fingerprint.

Cite this