Computational Biology Ph.D. Candidate Rebecca Elyanow Thesis Proposal:
Improving RNA-seq data analysis using prior knowledge of gene and cell relationships
Genes are expressed in a cell through the synthesis of mRNA molecules and the current state of a cell can be summarized by a gene expression profile. RNA-seq is a technology to measure gene expression of a bulk sample of cells. The recent advent of single-cell sequencing (scRNA-seq) facilitates the measurement of gene expression of a single cell, opening doors to new insights into cellular heterogeneity. However, measurements of gene expression, particularly from scRNA-seq data, are often noisy and sparse. In this talk I will propose several methods for incorporating prior knowledge to aid in the analysis of gene expression data. First, I will discuss netNMF-sc, an algorithm for analyzing scRNA-seq data with high rates of missing data. netNMF-sc takes advantage of prior knowledge of conserved and cell-type-specific gene coexpression from bulk tissue. It uses this prior knowledge, in the form of a gene interaction network, to perform network-regularized matrix factorization for dimensionality reduction and imputation of sparse scRNA-seq data. We demonstrate that incorporating gene interaction networks improves recovery of cell-type and gene-gene correlations.
I will also introduce ongoing work to identify copy number alterations (CNAs) and build phylogenies from spatial RNA-seq of tumors, using the spatial relationships of cells as prior knowledge. Identification of CNAs will facilitate a better understanding of how a tumor developed over time. Finally, I will present an application of a new algorithm, NetMix, to 945 differential gene expression experiments from the Expression Atlas database. We show that NetMix identifies biologically meaningful subsets of differentially expressed genes using gene interaction networks as prior knowledge.