Session I: 12:00-2:00 pm, Thursday March 23
Compression in deep generative models
Presenters: Aditya Grover, Neal Jean, Jesse Zhang
Abstract: Deep learning techniques have led to massive improvements in many applications of machine learning in recent years. Generative models, in particular, have quickly advanced the field of representation learning. The goal of representation learning is to learn data representations that are simultaneously informative and concise - this goal is intimately related to fundamental concepts from information theory such as the information bottleneck principle. In this paper, we review the relevant literature and discuss experimental results examining related phenomena.
Information Theoretic Limits of Compressed Sensing in Magnetic Resonance Imaging
Presenters: Kevin Chen, Rachel Luo, David Zeng
Abstract: Compressed sensing allows for high probability recovery of a signal with sub-Nyquist sampling. We first establish the connection between compressed sensing theory and channel capacity. We then establish the channel capacity of a magnetic resonance imaging (MRI) system, as limited by Maxwell's equations. MRI data with successively increasing undersampling (thus higher rates) are reconstructed and reflect the strong converse to the noisy channel coding theorem.
The Efficient Coding Hypothesis and Visual Neuroscience: Modelling Efficient Source Encoding in Retinal Ganglion Cells
Presenters: Nora Brackbill, Brett Larsen
Abstract: A guiding principle in visual neuroscience is the efficient coding hypothesis: the idea that the nervous system evolved to efficiently represent naturalistic stimuli. Indeed, the first step in the visual system must be efficient, as it maps information from 100 million photoreceptors down to 1 million optic nerve fibers. Here, we summarize recent developments in this area, and use simulation to investigate the apparent redundancy of having overlapping ON and OFF populations of cells.
Compressed Sensing and Wireless Sensor Networks
Presenters: Saleil Bhat, Nicole Grimwood, Tong Mu
Abstract: Compressed sensing is a method which exploits the sparsity of a signal to reduce the sampling rate required to reconstruct it. Such techniques have great potential in the field of wireless sensor networks (WSNs). In this paper, we consider the use of compressed sensing for event-detection in a large WSN. Via simulation, we verify the information-theoretic limits for compressed sensing on both the compression ratio and the received SNR in this problem setting; namely, that with O(klog(n/k)) measurements and Olog(n) receive SNR, the input signal can be fully reconstructed. We use Rayleigh fading as the channel model and the approximate message passing algorithm as the signal recovery mechanism.
Spatially Coupled Low-Density Parity Check Codes
Presenters: Geng Zhao, Bradley Emi
Abstract: Low-Density Parity Check (LDPC) coding is a simple, but powerful linear block coding scheme which has been proven to approach the Shannon channel capacity in the large block size limit. In the LDPC scheme, redundant parity checks between bits in the block are sent over the channel. Parity checks and bit values can be viewed as nodes in a probabilistic graphical model called a factor graph, where parity checks are connected to the bit values that contribute to the check. A decoder based on the belief propagation algorithm on the factor graph can be used to determine the maximum likelihood codeword. Spatially coupled LDPCs, inspired by the supercooling phenomenon in statistical physics, are a special case of LDPCs which introduce a structured irregularity into the adjacency matrix. This structured irregularity acts as a seed for the belief propagation algorithm to converge with high probability in the same way a crystal can act as a seed for freezing a supercooled liquid. We implement and analyze spatially-coupled codes and discuss their strengths and weaknesses.
Information theoretic regularity lemma: interpretation and application
Presenter: Huy Pham
Abstract: In this project, we discuss information theoretic aspects of the regularity lemma, starting from the information theoretic proof of this classical combinatorial result by Terrence Tao. In particular, we derive and provide the intuition for proofs of other versions of the regularity lemma and the counting lemma using information theoretic language and interpretation. We also use the information theoretic setting to obtain regularity results for binary sequences, and explain the tradeoff between simplicity of adversary measurements and computational efficiency. This gives an operational meaning of the regularity lemma in terms of data compression: when we only need to distinguish data approximately with respect to some metric that is not too strong, we can achieve compression with a very low number of bits.
Information Theory-Based Analysis in Double Thompson Sampling for Dueling Multi-Armed Bandit Problems
Presenters: Dennis Wu, David Wugofski, Tung-Yu Wu
Abstract: The existing algorithms to solve multi-armed bandit(MAB) problems focus primarily in conventional settings of quantified feedback. This project aims toward providing information theory based regret bound analysis for a dueling setting. Double Thompson Sampling(DTS) will be our main algorithm for investigation and additional comparison and analysis will also be carried out in the paper.
Information Theoretic Co-Clustering
Presenter: Ryan Wong
Abstract: Co-clustering is the simultaneous clustering across both dimensions of a data matrix and has many advantages over traditional one-way clustering. It has applications across data mining, unsupervised learning and bioinformatics. Bregman co-clustering is an information-theoretic inspired approach and applies to a large class of loss functions, including KL-Divergence and squared Euclidean distance. This project analysed Bregman co-clustering and evaluated its performance on both synthetic and real-world data sets.
Maximum Entropy Feature Selection for Sound Texture
Classification
Presenters: David Lindell, Nitish Padmanaban, Daniel Villamizar
Abstract: Many naturally occurring sounds, such as wind or waves, take the form of semi-stationary “textures” which lend themselves more readily to classification via Fourier-domain methods than arbitrary sounds. The statistics necessary for realistically synthesizing such sounds number in the thousands and are computationally expensive to compute, especially for low-power applications. We expect that classification, a simpler problem, will require far fewer features (the statistics) to achieve a reasonable performance. In reducing the size of the feature set, however, we inevitably run into the “winner’s curse,” in which the selection process results in biased feature values and a sub-optimal feature set. Sub-optimality results from randomness within the training data that causes sub-optimal features to score better on some selection metric than the optimal ones. In order to combat this, we use maximum entropy feature selection to randomize the selection process to optimally reduce the bias inherent in the winner’s curse and to retain high classification accuracy.
Self-Improving Algorithms and Information Theory
Presenters: Kevin Fry, Kate Pregler, Vihan Lakshman
Abstract: Worst-case analysis, the dominant paradigm for analyzing algorithms, sometimes reports overly pessimistic outlooks on the performance of certain computational procedures. One response to this shortcoming developed in the past decade is the notion of self-improving algorithms that take advantage of the structure in the distribution of possible inputs to beat worst-case bounds. Information theory is central to the analysis of such self-improving algorithms, both in identifying fine-grained lower bounds and, on the positive side, in encoding arguments to show that self-improving algorithms hit these bounds. In our presentation, we will highlight the role of information theory in the analysis of self-improving algorithms for bread-and-butter computational problems such as sorting, clustering, and low-dimensional geometry.
Empirical Investigation of Information-Optimal Community Detection Algorithms in the Symmetric SBM
Presenters: Benji Pastel, Mark Nishimura, Reese Pathak, Sagnik Majumder
Abstract: Given a graph G, the task of community detection is to identify (or “recover”) subsets (or “communities”) of nodes of G that are more closely connected with each other than to the rest of the nodes of G. The stochastic block model (SBM) is a random graph model that simulates a graph with communities by first generating communities of nodes and then placing edges between nodes with probability depending on whether those two nodes are in the same community or not. Recent thresholds for recovery and detection in the SBM have made the SBM an intriguing avenue of research. In this work, we perform an empirical investigation of some recent algorithms that efficiently achieve the information-theoretic threshold for strong recovery on the SBM.
Applying Information Theoretic Measures to Multi-Arm Bandit Problems
Presenters: Connor Brinton, Thomas Ayoul
Abstract: Applying Information Theoretic Measures to Multi-Arm Bandit Problems
Abstract: In both the casino and reinforcement learning problems, an agent is faced with the challenge of taking actions maximizing the resulting reward per action. However, in order to discover which actions will achieve this goal, the agent must balance exploration of possibly sub-optimal actions and exploitation of historically well-performing actions. Concepts from information theory provide valuable methods to solve this problem such as Information-Directed Sampling (IDS). We will investigate some problems, such as the sparse linear bandit, where IDS finds the optimal action before Thompson Sampling (TS).
Real World Limitations and Applications of Global Caching
Presenters: George Herring, Richard Wan
Abstract: Maddah-ali et. al. propose a coding scheme to utilize a “global” cache instead of just a local cache to further decrease peak network loading. The global cache uses information about the number of users and number of files to distribute unique information to different users such that later simulcast information can be decoded more efficiently when a file is requested. The limitations of this coding scheme mean that it has a direct applicability in only a few network environments. We focus on applying this global caching scheme to InFight Entertainment. This is a fairly ideal application because it deals well with the limitations of the scheme. There are a fixed number of users, a central network, a spike in network demand, and a limited number of files. We demonstrate the computational complexity which this “global” caching scheme entails and discuss possible ways to address this challenge.
Review of Coded Caching and its Applications
Presenters: Liangcheng Tao, Burak Bartan, Shiv Kaul
Abstract: The caching problem aims to find the optimal scheme of delivering requested files from a single server to multiple users over a shared link. We are interested in the online version of this problem, in which users have no knowledge of each other's requests and can only recall data from their most recent request. We simulate this model with more realistic constraints, and explore its applications and limitations in practice.
Information-Directed Sampling in Bandit Problems: Controlling the Exploration-Exploitation Tradeoff
Presenters: Anna Thomas, John Cherian, Leonard Bronner
Abstract: In this project, we review several papers in which information theory is used to determine the appropriate balance between exploration and exploitation in online decision making problems. In particular, we focus on information-directed sampling (IDS), proposed by Russo et al. (2014). Here, we explore several variations of the objective in IDS, and also consider extensions to the more general Markov decision process (MDP) setting. We evaluate our methods on widely used bandit models, namely the beta-Bernoulli, independent Gaussian, and linear Gaussian models.
Session II: 2:00-4:00 pm, Thursday March 23
The Generalized Information Bottleneck Method and its Applications to Clustering
Presenters: Adam Jaffe, Jack Lindsey, Eric Luxenberg, Jaydeep Singh
Abstract: The information bottleneck method and its more recently developed generalized form are used to compress a random variable into a form that retains information about another variable. Among the many applications of this general method is clustering data. The original IB method has received much study over the years; here we explore the theoretical and empirical properties of its generalization. We focus in particular on the clustering problem, using k-means as a benchmark.
An Information-Theoretic Perspective on Adaptive Compressed Sensing
Presenters: Shawn Hu, Stanley Lin Sen, Edward Huang Renyong
Abstract: We introduce and motivate some basic results in the field of adaptive compressed sensing. In particular, we recast the compressed sensing problem in information-theoretic terms, and evaluate some fundamental lower-bounds in the number of adaptive sensing measurements in terms of information-theoretic quantities.
Least-Informative Distribution in a Strategic Context
Presenters: Junjie Qin, Yizheng Liao
Abstract: Engineering applications involve cost functions that might not be those commonly used in statistics and machine learning. In this poster, we provide examples where the maximum entropy principle fails to provide the worst case distribution. We then discuss implications for the application as well as ideas to incorporate the actual cost functions into the maximum entropy procedure.
Polar Codes for Flash Memory Applications
Presenters: Fei Liu, Xin Zheng
Abstract: Polar codes are powerful coding schemes. However, when applied to flash memories, there are several issues to be dealt with. The first is that, polar codes require the block length to be a power of 2, which does not fit in flash pages of different sizes. Second, the flash channel may degrade gradually and the channel statistics may change over time, so the coding scheme needs to adapt to the channel statistics as the channel degrades. In this presentation, we review the applications of polar codes to flash memories, with emphases on the special requirements as mentioned above.
Application of Compressed Sensing Algorithms on Various Signal Sources
Presenters: Nipun Agarwala, Yuki Inoue, Chayakorn Pongsiri
Abstract: Compressed-sensing allows for signal recovery of signals sampled at sub-Nyquist rate. Though there have been many different proposed algorithm to accomplish the task, most publications have only focused on signal sources with a well defined probability distribution, making it hard to understand and compare their performance. In this project, we implement and apply major compressed-sensing algorithms on “real” signal sources, such as image and audio, to compare the pros/cons of the algorithms.
Rate-Distortion Performance of Deep Autoencoders
Presenters: Yash Savani, Russell Kaplan, Michael Xie
Abstract: This work presents an information theoretic analysis of the performance of deep autoencoders under the choice of encoding bottleneck size and different input distributions. We compare the rate-distortion performance of autoencoders and other baseline compression schemes to theoretical limit, and we provide a distortion upper bound for the IID Gaussian case. We test the performance for synthetic data drawn from simple IID Gaussian, multivariate Gaussian, and more complex XOR-related distributions, as well as real-world data from a reduced MNIST handwritten digits dataset.
An information theoretic view of deep generative models
Presenter: Mojtaba Tefagh
Abstract: This work involves showing connections between the concepts of three different but related fields of deep learning, statistical mechanics, and information theory. We overview the variational learning of deep generative models by theorems like SDPI and IB from information theory, and then give it an interpretation of bifurcations in the phase transitions of an annealing process. At the end, these connections are applied to the special case of convolutional neural networks to examine the potential practical implications.
Information Theoretical Approaches for Pure Exploration in Linear Bandit Problems
Presenters: Cem Kalkanli, Aykan Ozturk
Abstract: In this work, we analyze how information theory can be used to devise efficient pure exploration strategies in multi-armed bandit problems. We first start by analyzing information-directed sampling (IDS), and discussing how it can be extended to the pure exploration problem. We use information theoretical tools such as Fano's inequality to show bounds on their performance. We then introduce some novel exploration methods inspired by the Gaussian channel communication problem. Our experiments on the linear bandit problem suggest that these methods are competitive with other well known exploration strategies.
Towards information theoretic time-space lower bounds in learning theory
Presenters: Rishi Sharma, Daria Reshetova
Abstract: We look into information theoretic lower bounds in computer science. We provide toy examples and reasoning for application of information theoretic techniques to lower-bound the time-space complexity of algorithms. We also introduce the natural problem of “learning a little bit” and conjecture its time-space hardness. Although we don't manage to prove our conjecture, we discuss several potentially promising approaches to it. We go over some partial proofs which light the way towards the result and illuminate further directions for interesting work.
Generalizing Information Directed Methods from Multi-armed Bandit Problem to Reinforcement Learning
Presenters: Martin Zhang, Daniel Liu, Nima Hamidi
Abstract: In the multi-armed bandit setting, Thompson sampling is one of the most popular algorithms to balance between exploration and exploitation when learning to optimize the actions. Here, we study an information-theoretic analysis of Thompson sampling, which in turn leads to another algorithm that uses mutual information to find the best arm. We generalize such approach to the Markov Decision Processes (MDP), resulting in a new information-directed reinforcement learning method. Simulations are used to study the performance of such method.
Optimal Encodings for Single-Mode Bosonic Gaussian Channels
Presenters: Edwin Ng, Tatsuhiro Onodera
Abstract: In modern communication systems, the representation of information as optical signals is ubiquitous, but in the limit of low power, where the particle nature of light (i.e., the photon) becomes manifest, quantum physics is needed to properly describe the effects of the environment on the communication channel. For optical systems, these quantum channels are the single-mode bosonic channels, and the simplest class of them, corresponding to conventional laser optical systems, are the bosonic Gaussian channels (BGCs). We study the recently obtained results regarding optimal encodings for quantum-classical communication using BGCs (so-called “Gaussian encodings”), restricting our attention on the pure-loss BGC. We describe our efforts to strengthen these results by asking whether Gaussian encodings are uniquely optimal. Whereas uniqueness is often irrelevant in designing classical codes, optimality in the quantum regime actually has the interesting corollary that the use of entangled states – often touted as a key resource in quantum information – necessarily hurts the achievable rates of a quantum-classical code for pure-loss BGCs.
The Information Bottleneck Method, with applications to machine learning
Presenters: Stephen Barnes, Varun Vijay
Abstract: The Information Bottleneck Method, with applications to machine learning
Abstract: Our presentation will explain the information bottleneck method. This method involves producing a summary of input data subject to certain information-theoretic constraints. The method has several possible applications in machine learning, as a means of regularization, preprocessing, clustering, or a defense against adversarial inputs.
Information Bottleneck Regularization for Stochastic Neural Networks
Presenters: Allen Nie, Gabriel Maher, Mihir Mongia
Abstract: The Information Bottleneck Method, with applications to machine learning
Abstract: We derived methods to approximate mutual information as a measure to investigate the generalizability of a stochastic neural network. By adding mutual information as a form of loss regularization, we can train neural networks much more efficiently
|