TY - GEN
T1 - NO MORE THAN 6FT APART
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
AU - Humayun, Ahmed Imtiaz
AU - Balestriero, Randall
AU - Kyrillidis, Anastasios
AU - Baraniuk, Richard
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Centroid based clustering methods such as k-means, kmedoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest imbalanced clustering. We propose to remedy such a scenario by introducing a maximal radius constraint r on the clusters formed by the centroids, i.e., samples from the same cluster should not be more than 2r apart in terms of 2 distance. We achieve this constraint by solving a semi-definite program, followed by a linear assignment problem with quadratic constraints. Through qualitative results, we show that our proposed method is robust towards dataset imbalances and sampling artifacts. To the best of our knowledge, ours is the first constrained k-means clustering method with hard radius constraints.
AB - Centroid based clustering methods such as k-means, kmedoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest imbalanced clustering. We propose to remedy such a scenario by introducing a maximal radius constraint r on the clusters formed by the centroids, i.e., samples from the same cluster should not be more than 2r apart in terms of 2 distance. We achieve this constraint by solving a semi-definite program, followed by a linear assignment problem with quadratic constraints. Through qualitative results, we show that our proposed method is robust towards dataset imbalances and sampling artifacts. To the best of our knowledge, ours is the first constrained k-means clustering method with hard radius constraints.
KW - clustering
KW - constrained optimization
KW - data imbalance
KW - radius constraint
KW - robust k-means
UR - http://www.scopus.com/inward/record.url?scp=85131262043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131262043&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746288
DO - 10.1109/ICASSP43922.2022.9746288
M3 - Conference contribution
AN - SCOPUS:85131262043
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4433
EP - 4437
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 May 2022 through 27 May 2022
ER -