Tse Lab

Machine Learning & Statistics

We take real-world problems, abstract mathematical models from them, and develop algorithms using first principle approaches. We consider a wide range of topics in machine learning and statistics, including classification, clustering, multi-armed bandits, deep learning, empirical Bayes, multiple hypothesis testing.

Adaptive Monte Carlo Computation

In modern data science, datasets are growing at a tremendous speed, making many commonly-used algorithms computationally expensive. To accelerate existing algorithms, we consider a general idea called adaptive Monte Carlo computation. At a high level, it contains two steps: 1) convert the computational problem into a statistical estimation problem; 2) accelerate the estimation by designing an adaptive sampling procedure via multi-armed bandits. In a few applications, applying such an idea brings an improvement of several orders of magnitude in wall clock time.

Deep Learning

Deep learning algorithms have achieved state-of-the-art performance over a wide range of machine learning tasks. However, the current theoretical understanding of their success cannot explain the robustness and generalization behavior of deep learning models. We seek to analyze and improve approximation, generalization, and stability properties of deep learning algorithms.

Maximum Entropy Framework for Supervised Learning

Minimizing empirical 0-1 loss for classification tasks is known to be an NP-hard problem. In this project, we formulate a minimax learning problem which optimizes the worst-case 0-1 loss over an ambiguity set around the empirical distribution of data. We provide interpretation and efficiently solve the formulated minimax problem by connecting its dual problem to maximizing the entropy function.

Generative Adversarial Nets (GANs)

Generative adversarial network (GAN) is a minimax game between the generator and discriminator players. While generative models learned by GANs achieve great success in practice, GANs’ approximation, generalization, and optimization properties are still poorly understood. We focus on analyzing these theoretical aspects of GANs, providing both a comprehensive analysis addressing all the three aspects for a simplistic GAN architecture and a more specific analysis addressing only the approximation aspect for general GAN problems.

Covariate-adaptive multiple hypothesis testing

In multiple hypothesis testing, the data for each test boils down to a p-value while additional information is often available for each test, e.g., the functional annotation for genetic testing. Such information may inform how likely the small p-values are to be due to noise. Covariate-adaptive multiple testing methods utilize such information to improve detection power while controlling false positives.