Monday, May 16, 2016 12:00pm - 1:00pm
CIT, Room 477 (Lubrano Room)
Demography-Aware Inference of the Strength of Natural Selection
Current studies predict that the majority of amino-acid mutations are deleterious. Levels of genetic and possibly phenotypic variation in a population are influenced by how negative selection acts to reduce the frequency of those deleterious mutations. However, the demographic history of a population influences the efficacy of negative selection on keeping deleterious variants at low frequencies. I will present two projects where we study how negative selection works in the context of different demographic histories. On the first project, I study how population bottlenecks, inbreeding and artificial selection have influenced levels of deleterious genetic variation in dogs using 90 whole-genome sequences from breed dogs, village dogs and gray wolves. We used the ratio of heterozygosity at amino-acid changing variants over silent variants to show how bottlenecks associated with domestication and breed formation in dogs have affected the efficacy of negative selection. We found that dogs have, on average, 2-3% more derived deleterious alleles than wolves. We show multiple lines of evidence indicating that bottlenecks, and not inbreeding, are driving the patterns of deleterious genetic variation we observed in dogs. Furthermore, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. On the second project, I develop a novel likelihood-based method that uses the lengths of pairwise haplotype identity by state among rare-variant carrying haplotypes. Our method conditions on the present-day frequency of the allele and is based on theory predicting that, under constant population sizes, the alleles under negative selection are on average younger than neutral alleles and should have higher average levels of haplotype identity among variant carriers. We developed a computational framework to obtain the probability distribution of the lengths of pairwise haplotype identity given a certain selection coefficient, demographic scenario and present-day allele frequency. Simulations indicate that our method provides unbiased estimates of selection under constant population sizes and realistic demographic scenarios. We show how our method can also be used to estimate the parameters that define the distribution of selective coefficients of a set of rare variants. We provide an example of how to apply this method to estimate the distribution of selective coefficients of a set of amino-acid changing variants in the UK10K, a large genomic dataset of British individuals.