DELISHUS: Deletion Inference from SNP Data

DELISHUS: A Fast and Accurate Algorithm for Computing Genomic Deletion Variation

The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A substantial portion of autism appears to be correlated with copy number variation which is not directly probed by single nucleotide polymorphism (SNP) array technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem due, in part, to the inability of algorithms to detect them.

We present an algorithmic framework, which we term DELISHUS, that implements three highly efficient algorithms for inferring genomic deletions of all sizes and frequencies in SNP array data. We implement a polynomial-time backtracking algorithm – that finishes on a 1 billion entry genome-wide association study (GWAS) SNP matrix in a few minutes – to compute all potential deletions in a dataset. Given a set of called deletions, we also give a polynomial time algorithm for detecting regions that contain multiple recurrent deletions. Finally, we give an algorithm for detecting de novo deletions. Because our algorithms consider all individuals in the sample at once, they achieve significantly lower false positive rates and higher power when compared to previously published single individual algorithms. Our method may be used to identify the deletion spectrum for GWAS where deletion polymorphism was previously not analyzed.

This material is based upon work supported by the National Science Foundation under Grant Number 1048831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.