Tse Lab

PhD Oral Defense: Jesse Zhang

Topic: Rethinking Single-Cell RNA-Seq Analysis

  • Advisor: David N. Tse
  • Date: Monday, June 3, 2019
  • Time: 2:00 pm (refreshments at 1:45 pm)
  • Location: Allen 101x

Abstract

Since the Human Genome Project was completed in 2003, scientists have developed technologies for measuring the RNA content of a single cell. In the last decade, the number of individual cells profiled per study has grown exponentially to over 1,000,000 cells. In this talk, we discuss some of the computational and statistical challenges associated with the analysis of such large single-cell datasets. First, we introduce a more powerful way of extracting information from raw sequencing data. Next, with a focus on the biologist end user, we introduce an interpretable clustering method that organizes cells based on the definition of cell type. Finally, we talk about the recently-formulated post-selection inference problem, which we observe in the single-cell setting. State-of-the-art single-cell computational pipelines perform differential analysis after clustering on the same dataset. Because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem.