Deep Learning Methods for Fine Mapping and Discovery in Genomic Association Studies

Nonlinear genetic effects have been proposed as key contributors to missing heritability – the proportion of heritability is a trait that is not explained by the top associated additive variants in genome-wide association (GWA) studies. To this end, probabilistic machine learning approaches have been shown to be useful tools that exhibit great performance gains in genomic selection-based analyses. This is often attributed to the fact that popular kernel regression functions and deep neural networks offer scalable implementations that implicitly enumerate all possible polynomial interaction effects for all variables in the data. Recently, however, these same algorithms have also become criticized as “black box” techniques. There is a fundamental interpretability issue where understanding how genetic features are being ranked within machine learning methods is an important, yet open, problem. Here, we propose to develop a suite of novel methodological approaches that make probabilistic machine learning and deep neural networks fully amenable for fine mapping and discovery in genomic sequencing studies (i.e. opening up the black box). Our efforts will lead to unified frameworks that produce interpretable summaries detailing associations on multiple genomic scales (e.g. SNPs, genes, signaling pathways). The first aim of this project is to develop an interpretable significance measure for probabilistic machine learning. The second aim is to develop a unified deep learning framework for gene-level and pathway enrichment analysis in genome-wide association studies. The third aim is to create distributable software and use it to characterize nonlinear genetic effects at multiple genomic scales in real data applications.

 This study will provide the first unified deep learning frameworks for association mapping in array- and sequence-based GWA studies. These methods are essential for a more comprehensive understanding of the genetic architecture of human traits and diseases -- a question of central importance to human health.