|
CCMB Seminar
Series 2009-2010
_______________________________________________________ Events
To receive CCMB seminar
announcements by email, sign up for the computational
biology mailing list by sending email to listserv@listserv.brown.edu with
the message body "subscribe computational-biology"
CCMB
Lecture Series |
Art Covert
Michigan State University
The Hidden Lives of Deleterious Mutations: Transiting fitness valleys via sign-epistatic stepping stones |
|
The role of deleterious mutations in evolution has been much debated. While many researchers believe that any mutation that reduces fitness must impede adaptive evolution, recent studies have shown that this is not always the case. Deleterious mutations may have their fitness effects reversed by a second, sign-epistatic mutation, which can also allow populations to pass through fitness valleys. It is unknown if these sign-epistatic recoveries are fortuitous accidents, or a driving force behind evolution. Using digital organisms, I compared the progress of adaptive evolution when all deleterious mutations were immediately reverted with control treatments in which they were allowed to enter the population. Deleterious mutations reduce fitness over the short term, by definition, and they comprise the majority of mutations in populations of digital organisms, as in biological ones. In my experiments, long-term adaptive evolution was accelerated in those populations in which deleterious mutations were allowed to remain, because some of them served as stepping stones across otherwise impassible fitness valleys, thereby facilitating the evolution of complex features.
Wednesday, January 27, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will be served at 3:45 pm
Joint CCMB/MPPB/Psychiatry Seminar |
Jason Moore
Dartmouth Medical School
Bioinformatics Challenges for Genome-Wide Association Studies |
|
Human genetics is currently dominated by the genome-wide association study (GWAS) that measures and evaluates one million or more single nucleotide polymorphisms (SNPs) for their disease associations. The current biostatistical paradigm is to analyze each SNP individually without regard to the rest of the genome or environmental exposure.
This agnostic or unbiased approach has not been successful for identifying SNPs with moderate or large effects on disease susceptibility. We present here an alternative bioinformatics strategy for GWAS analysis that focuses on gene-gene and gene-environment interactions and their context in biochemical pathways.
Wednesday, November 18, 2009
3:00pm
LMM - 70 Ship Street, Room 107
Refreshments will be served at 2:45 pm
CCMB
Lecture Series |
Eli Stahl
Brigham and Women's Hospital
The Present and Future of Genome-wide Association Studies in Rheumatoid Arthritis |
|
Results and current progress of a large-scale case-control genome-wide association study (GWAS) of rheumatoid arthritis (RA) shed further light on this autoimmune disease, and help to frame a broad perspective on mapping complex traits. Genotypes at over 2.5 million common single nucleotide polymorphisms (SNPs) were tested for association with RA in 5539 cases and 20169 controls of European descent. Eleven new RA risk alleles replicate in additional samples. Conditional and haplotype analyses refine the association signal in several loci with evidence for multiple independent effects in autoimmunity. Still, all common variant associations validated to date together explain relatively little of the additive genetic variance for RA, and suggest major contributions of (1) many more common variants of very small effect, (2) copy number or other kinds of variants, (3) rare variants, and/or (4) non-additive genetic, epigenetic or non-genetic effects. A polygenic risk score analysis can allow inference of the remaining effect due to common variants en masse (scenario 1, with some implications for scenarios 2 and 3). The direct benefit of current and future common-variant GWAS is limited under all of these scenarios, but GWAS certainly inform complimentary approaches including deep re-sequencing in case-control cohorts, and integrated clinical/functional and genetic analyses.
Wednesday, October 28, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Yosef E. Maruvka
Bar-Ilan University
Genetic polymorphism and demography: a statistical mechanics approach |
|
The recent progress in sequencing techniques has been followed by an exponential growth in the amount of available genetic data. Traditional methods of analysis require exact reconstruction of the phylogenetic tree, and therefore cannot deal with these immense databases. Given the efficiency of "mean field" approximation in physical systems with many particles, we are applying the same techniques and concepts to genetic problems (where it turns out that 50 can be many).
The inferring of past demographic parameters from current polymorphism data will be discussed for two examples:
1. Retrieval of the effective population size and its growth rate using the number of lineages as a function of time. Here the mean-field method has been found to be an unbiased estimator, unlike the existing methods, and with a smaller error range.
2. The difference between additive noise and multiplicative noise, a basic concept in statistical mechanics, can be used to determine the ongoing debate between the adaptive and the neutral (Hubbell's) theories of biodiversity.
Wednesday, October 21st, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Mark A. DePristo, Ph.D.
Broad Institute of Harvard and MIT
Discovering genetic variation in 1000 Genomes: from mapping reads to putative de novo mutations |
|
The 1000 genomes project aims to discover and characterize all common human genetic variation with a minor allele frequency (MAF) = 0.5%. The pilot phase of the project was completed in June producing five terabases of Illumina/Solexa, SOLiD, and Roche/454 sequences in ~180 individuals sequenced to ~4x average depth genome-wide in three populations, 30-60x whole-genome sequence for two mother, father, daughter trios, and ~800 individuals with 50x+ coverage using hybrid capture in 1000 randomly-selected genes.
Here we describe the sequence calibration, realignment, and analysis tools we developed at the Broad to discover with high sensitivity and specificity single-nucleotide (SNPs) and short (< 20bp) insertion/ deletion (indels) polymorphisms in all three wings of the pilot phase of the 1000 genomes project. We assess our approach by comparing discovered variation among technologies, across pilot arms, to population genetic expectations and to complementary efforts from other groups participating the 1000 genomes project. Finally, we subject a randomly selected subset of SNP and indel calls to experimental validation to estimate project- wide specificity rates. We highlight best practices and lessons learned on the production and analysis of next-generation sequencer data.
Wednesday, October 14, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Lee A. Newberg, Ph.D.
Wadsworth Institute
Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, with pairwise alignment of
nucleotide sequences as an example.
|
|
Hidden Markov models and score-maximizing dynamic programming algorithms are employed for the evaluation of sequential data in a variety of scientific fields, including linguistics, vision, and computational biology. Given a hidden Markov model, efficient "Viterbi" and "forward" algorithms are used to evaluate the probability that the model would generate a given sequence of observations, and similar approaches are employed in the dynamic programming algorithms where the focus is on finding high scores instead of high probabilities. Here we present modifications to the "forward" algorithm that allow additional computations. We can efficiently estimate statistical significance: what is the probability that a randomly generated sequence will score at least as high as the observed sequence does? (We've computed answers down to 1e-4000.) We can also compute how typical a sequence is: for every whole number d, what is the probability that a sequence generated by the hidden Markov model will have exactly d
differences from the observed sequence?
Wednesday, October 7, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles (Chip) Lawrence
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Alexandros Stamatakis
Technische Universität München
Department of Computer Science
Mapping the Phylogenetic Likelihood Kernel to Emerging Parallel
Computer Architectures |
|
The phylogenetic likelihood function is the by far most compute- intensive part
of every ML-based phylogenetic inference algorithm. I will present several
solutions for appropriately adapting this computational kernel to a variety of
accelerator and supercomputer architectures ranging from FPGAs up to
massively parallel machines like the BG/L. I will also address load-balancing
problems in the kernel and a study on single- versus double-precision
arithmetics trade-offs. Moreover, I will introduce a basic categorization of
input datasets into well-shaped and badly-shaped alignments that require
distinct algorithmic and parallelization approaches.
Finally, I will address an algorithm for rapid phylogenetic placement/
identification of short reads from environmental samples.
Wednesday, June 10, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Casey Dunn
Refreshments will be served at 3:45 pm
_______________________________________________________ Events
|