by Bjarni Halldorsson, Derek Aguiar, Ryan Tarpine, Sorin Istrail
Abstract:
Abstract A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.
Reference:
Bjarni Halldorsson, Derek Aguiar, Ryan Tarpine, Sorin Istrail, "The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS", In Journal of Computational Biology, vol. 18, no. 3, pp. 323-333, 2011.
Bibtex Entry:
@ARTICLE{Halldorsson2011a,
author = {Halldorsson, Bjarni and Aguiar, Derek and Tarpine, Ryan and Istrail,
Sorin},
title = {The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss
of Heterozygosity in GWAS},
journal = {Journal of Computational Biology},
year = {2011},
volume = {18},
pages = {323--333},
number = {3},
abstract = {Abstract A phase transition is taking place today. The amount of data
generated by genome resequencing technologies is so large that in
some cases it is now less expensive to repeat the experiment than
to store the information generated by the experiment. In the next
few years, it is quite possible that millions of Americans will have
been genotyped. The question then arises of how to make the best
use of this information and jointly estimate the haplotypes of all
these individuals. The premise of this article is that long shared
genomic regions (or tracts) are unlikely unless the haplotypes are
identical by descent. These tracts can be used as input for a Clark-like
phasing method to obtain a phasing solution of the sample. We show
on simulated data that the algorithm will get an almost perfect solution
if the number of individuals being genotyped is large enough and
the correctness of the algorithm grows with the number of individuals
being genotyped. We also study a related problem that connects copy
number variation with phasing algorithm success. A loss of heterozygosity
(LOH) event is when, by the laws of Mendelian inheritance, an individual
should be heterozygote but, due to a deletion polymorphism, is not.
Such polymorphisms are difficult to detect using existing algorithms,
but play an important role in the genetics of disease and will confuse
haplotype phasing algorithms if not accounted for. We will present
an algorithm for detecting LOH regions across the genomes of thousands
of individuals. The design of the long-range phasing algorithm and
the loss of heterozygosity inference algorithms was inspired by our
analysis of the Multiple Sclerosis (MS) GWAS dataset of the International
Multiple Sclerosis Genetics Consortium. We present similar results
to those obtained from the MS data.},
doi = {citeulike-article-id:9029749},
owner = {Derek},
timestamp = {2012.05.08},
url = {http://www.brown.edu/Research/Istrail_Lab/papers/clarkphaseablejournal.pdf},
category = {Haplotype Phasing, Haplotype Analysis, Deletion Inference}
}