Supervised learning, the task of finding a decision rule to predict a label Y from features X using a set of training examples, is a central task in machine learning. Minimizing the expected loss averaged over training samples, called empirical risk minimization (ERM), provides a standard approach to the supervised learning problem. Despite numerous applications of ERM algorithms, their interpretation and robustness properties are still poorly understood, which highlights the need for novel frameworks for analyzing ERM problems.
In this presentation, we introduce an information theoretic framework for supervised learning tasks. According to this framework, we formulate a min-max optimization problem to find the optimal decision rule robust against an ambiguity set matching certain moments with those of the empirical distribution of data. We interpret the proposed minimax problem as the maximization of a generalized entropy function, and demonstrate that this problem is dual to a convex ERM problem. This duality result provides an information theoretic understanding of widely-used ERM algorithms as well as novel ERM formulations.
Applying the proposed framework to logarithmic loss, we obtain the maximum likelihood problem for well-known generalized linear models. For 0-1 loss, the minimax approach results in a novel ERM algorithm for classification tasks which we call the maximum entropy machine (MEM). We numerically show MEM can outperform other ERM-based classification algorithms such as the support vector machines (SVMs). Furthermore, we demonstrate the application of the minimax approach to non-linear hypothesis sets including kernel spaces and deep neural nets. We also utilize the proposed supervised learning framework to understand the role of the discriminator player in generative adversarial nets (GANs), generalizing an existing interpretation of GANs to more practical settings.