Marginal Epistasis Tests for Dichotomous Traits Using Generalized Linear Models

Epistasis, commonly defined as the interaction between genetic loci, has long been hypothesized to play a key role in defining the genetic architecture underlying complex traits. However, despite the recent strong evidence of pervasive epistasis in many array- and sequence-based genome-wide association studies, statistical methods for powerfully mapping epistatic effects remain in their infancy. Existing epistatic mapping methods explicitly search over all pairwise or higher-order interactions when identifying significant nonlinear effects among genome-wide variants. Consequently, due to a lack of a priori knowledge of epistatic loci and the extremely large combinatory search space these methods have to go through, existing computational approaches often suffer from low statistical power. Here, we propose to further develop upon an alternative statistical strategy for detecting epistatic effects known as the “MArginal ePistasis Test”, or MAPIT. Instead of focusing on identifying pairwise or higher-order interactions, MAPIT estimates and tests for marginal epistatic effects – the combined pairwise interaction effects between a given variant and all other variants. By testing the marginal epistatic effects, MAPIT can identify variants that are involved in epistasis without the need to explicitly search over all possible interactions. This greatly alleviates much of the statistical burden associated with epistasis mapping. The first aim of this project is to integrate the marginal epistasis method within a generalized linear model framework to analyze dichotomous traits. Here, preliminary simulation results suggest that our approach is more powerful than standard exhaustive search methods when detecting epistatic SNPs in case-control studies. The second aim is to make the model amendable to the use of summary statistics. This will allow our method to be applied to many consortium studies where individual-level genotypes and phenotypes are not accessible. The third aim is to perform rigorous data analyses on several large-scale association studies. To maximize our method's impact on the research community, we will also produce, test, document, and distribute user-friendly software for its implementation.