Tse Lab

PhD Oral Defense: Martin Jinye Zhang

Topic: Adaptive Monte Carlo Testing: Where Multiple Testing Meets Multi-armed Bandits

  • Advisors: David Tse
  • Date: Thursday, April 25, 2019
  • Time: 10:00 am (refreshments at 9:45 am)
  • Location: Packard 202

Abstract

Progress in modern computational biology has been largely driven by increases in computing power and dataset sizes. However, genomic data is growing exponentially faster than computing power, imposing a tremendous challenge for data analysis.

I will first briefly present a new algorithm that improves the complexity of computing medoid, used in k-medoids clustering, from quadratic to almost-linear time. Here, the medoid is the point among n points with the smallest mean distance to other points. The core idea is to adaptively estimate the mean distances instead of exactly computing them, where the adaptation is via multi-armed bandits.

Then, we further extend this idea to accelerate the workflow of Monte Carlo (MC) multiple testing. This workflow is considered the gold standard in statistical hypothesis testing but is rarely used in large-scale problems due to its prohibitive computational cost. We develop a new algorithm, Adaptive MC Multiple Testing (AMT), to estimate MC p-values and control the false discovery rate (FDR) in multiple testing. We also prove that it achieves optimal computational complexity. On a Parkinson genome-wide association study (GWAS) dataset, this algorithm reduces the wall clock time from 2 months for the standard workflow to an hour.