TY - GEN
T1 - Deep Imbalanced Attribute Classification Using Visual Attention Aggregation
AU - Sarafianos, Nikolaos
AU - Xu, Xiang
AU - Kakadiaris, Ioannis A.
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - For many computer vision applications, such as image description and human identification, recognizing the visual attributes of humans is an essential yet challenging problem. Its challenges originate from its multi-label nature, the large underlying class imbalance and the lack of spatial annotations. Existing methods follow either a computer vision approach while failing to account for class imbalance, or explore machine learning solutions, which disregard the spatial and semantic relations that exist in the images. With that in mind, we propose an effective method that extracts and aggregates visual attention masks at different scales. We introduce a loss function to handle class imbalance both at class and at an instance level and further demonstrate that penalizing attention masks with high prediction variance accounts for the weak supervision of the attention mechanism. By identifying and addressing these challenges, we achieve state-of-the-art results with a simple attention mechanism in both PETA and WIDER-Attribute datasets without additional context or side information.
AB - For many computer vision applications, such as image description and human identification, recognizing the visual attributes of humans is an essential yet challenging problem. Its challenges originate from its multi-label nature, the large underlying class imbalance and the lack of spatial annotations. Existing methods follow either a computer vision approach while failing to account for class imbalance, or explore machine learning solutions, which disregard the spatial and semantic relations that exist in the images. With that in mind, we propose an effective method that extracts and aggregates visual attention masks at different scales. We introduce a loss function to handle class imbalance both at class and at an instance level and further demonstrate that penalizing attention masks with high prediction variance accounts for the weak supervision of the attention mechanism. By identifying and addressing these challenges, we achieve state-of-the-art results with a simple attention mechanism in both PETA and WIDER-Attribute datasets without additional context or side information.
KW - Deep imbalanced learning
KW - Visual attention
KW - Visual attributes
UR - http://www.scopus.com/inward/record.url?scp=85055086597&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055086597&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01252-6_42
DO - 10.1007/978-3-030-01252-6_42
M3 - Conference contribution
AN - SCOPUS:85055086597
SN - 9783030012519
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 708
EP - 725
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Weiss, Yair
A2 - Hebert, Martial
PB - Springer-Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -