TY - JOUR
T1 - Mad Max
T2 - Affine Spline Insights into Deep Learning
AU - Balestriero, Randall
AU - Baraniuk, Richard G.
N1 - Funding Information:
Manuscript received August 5, 2020; revised November 19, 2020; accepted November 26, 2020. Date of publication December 17, 2020; date of current version April 30, 2021. This work was supported in part by the K2I Graduate Fellowship (BP) from Rice University; in part by NSF under Grant CCF-1911094, Grant IIS-1838177, and Grant IIS-1730574; in part by Army Research Office (ARO) under Grant W911NF-15-1-0316; in part by Air Force Office of Scientific Research (AFOSR) under Grant FA9550-14-1-0088 and Grant FA9550-18-1-0478; in part by Office of Naval Research (ONR) under Grant N00014-17-1-2551 and Grant N00014-18-12571; in part by Defense Advanced Research Projects Agency (DARPA) under Grant G001534-7500; and in part by the DOD Vannevar Bush Faculty Fellowship of ONR under Grant N00014-18-1-2047. (Corresponding author: Randall Balestriero.) The authors are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005 USA (e-mail: randallbalestriero@gmail.com; richb@rice.edu).
Publisher Copyright:
© 1963-2012 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/5
Y1 - 2021/5
N2 - We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs) that provide a powerful portal through which we view and analyze their inner workings. For instance, conditioned on the spline partition region containing the input signal, the output of an MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by an MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenues to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantify the difference between their VQ encodings.
AB - We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs) that provide a powerful portal through which we view and analyze their inner workings. For instance, conditioned on the spline partition region containing the input signal, the output of an MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by an MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenues to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantify the difference between their VQ encodings.
KW - Classification
KW - continuous piecewise affine
KW - deep neural networks
KW - input space partition
KW - max affine splines
KW - template matching; Voronoi diagram
UR - http://www.scopus.com/inward/record.url?scp=85098754076&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098754076&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2020.3042100
DO - 10.1109/JPROC.2020.3042100
M3 - Article
AN - SCOPUS:85098754076
VL - 109
SP - 704
EP - 727
JO - Proceedings of the IEEE
JF - Proceedings of the IEEE
SN - 0018-9219
IS - 5
M1 - 9296823
ER -