Brown University | Center for Computational Molecular Biology

Brown University

Center Home

The Center

Research Areas

Bioinformatics Tools

Courses

Events

Affiliated Programs

Executive Committee

Publications

Undergraduate Study

Graduate Study

Lecture Videos

News Archive

Open Positions

CCMB Seminar Series 2009-2010

_______________________________________________________ Events

To receive CCMB seminar announcements by email, sign up for the computational biology mailing list by sending email to listserv@listserv.brown.edu with the message body "subscribe computational-biology"

CCMB Lecture Series
Peter Olofsson Associate Professor Trinity University, San Antonio, Texas Modeling Growth and Telomere Dynamics in Yeast

Telomeres are regions at the ends of chromosomes, serving as protective buffers against DNA damage. As chromosomes divide telomeres shorten progressively, a process which is counteracted by the enzyme telomerase which adds telomeric DNA to chromosomal ends. In the absence of telomerase, cells eventually stop dividing and in the presence of telomerase, cells divide indefinitely. Telomere biology is a very active and fruitful field of research with relevance to problems regarding aging and cancer research. Its importance was highlighted by the 2009 Nobel Prize in Physiology or Medicine which was awarded to three telomere biologists.

Telomeres have been extensively studied in the yeast Saccharomyces cerevisiae which has given much insight into eukaryotic genetics. One particular observation that has been made is that some yeast cells that lack telomerase and would therefore normally eventually stop dividing, keep dividing nevertheless, indicating that they develop alternative ways of maintaining telomere length. A general branching process is proposed to model a population of yeast cells following loss of telomerase. The model takes into account random variation in individual cell cycle times, telomere length, finite lifespan of mother cells, and survivorship. We identify and estimate crucial parameters such as the probability of an individual cell becoming a survivor, and compare our model predictions to experimental data.

Wednesday, May 19, 2010 (Note the changed date!)
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Suzanne Sindi
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Cenk Sahinalp Simon Fraser University School of Computing Science Structural Variation Discovery in High Throughput Sequenced Genomes and Transcriptomes

Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural differences contribute significantly to human genetic diversity. The realization of new ultra-high-throughput sequencing platforms has made it feasible to detect the full spectrum of genomic variation among many individual genomes, including those between healthy tissues and those susceptible to disease with genomic origin. Conventional algorithms for identifying structural variation (SV) have not been designed to handle the short read lengths and the errors implied by the available and future high throughput sequencing technologies. In this talk we will provide combinatorial formulations for the SV detection between a reference genome and a high throughput paired-end sequenced individual genome. We will provide efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all currently available sequencing methods and traditional capillary sequencing technology.

Tuesday April, 20, 2010
12:00 pm (PLEASE NOTE SPECIAL DAY AND TIME!)
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Ben Raphael
Refreshments will be served at 11:45 am

CCMB Lecture Series
Juliette de Meaux Max Planck Institute for Plant Breeding Research Molecular Underpinning of Life-History Evolution in Arabidopsis thaliana

Abstract TBA

Monday, March 15, 2010 (Note the changed date, time, and place!)
2:00pm
505 BioMed Center

Hosted by: Dan Weinreich
Refreshments will be served at 1:45 pm

CCMB Lecture Series
Eleazar Eskin. Ph.D. University of California, Los Angeles Department of Computer Science Leveraging linkage disequilibrium structure in genome-wide association studies

Variation in human DNA sequences account for a significant amount of genetic risk factors for common disease such as hypertension, diabetes, Alzheimer's disease, and cancer. Identifying the human sequence variation that makes up the genetic basis of common disease will have a tremendous impact on medicine in many ways. Recent efforts to identify these genetic factors through large scale association studies which compare information on variation between a set of healthy and diseased individuals have been remarkably successful. However, despite the success of these initial studies, many challenges and open questions remain on how to design and analyze the results of association studies. Many of these challenges involving taking advantage of linkage disequilibrium or correlation structure of human variation. In this talk, I will discuss a few of the computational and statistical challenges in the design and analysis of association studies.

Wednesday, March 3, 2010
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Ben Raphael
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Art Covert Michigan State University The Hidden Lives of Deleterious Mutations: Transiting fitness valleys via sign-epistatic stepping stones

The role of deleterious mutations in evolution has been much debated. While many researchers believe that any mutation that reduces fitness must impede adaptive evolution, recent studies have shown that this is not always the case. Deleterious mutations may have their fitness effects reversed by a second, sign-epistatic mutation, which can also allow populations to pass through fitness valleys. It is unknown if these sign-epistatic recoveries are fortuitous accidents, or a driving force behind evolution. Using digital organisms, I compared the progress of adaptive evolution when all deleterious mutations were immediately reverted with control treatments in which they were allowed to enter the population. Deleterious mutations reduce fitness over the short term, by definition, and they comprise the majority of mutations in populations of digital organisms, as in biological ones. In my experiments, long-term adaptive evolution was accelerated in those populations in which deleterious mutations were allowed to remain, because some of them served as stepping stones across otherwise impassible fitness valleys, thereby facilitating the evolution of complex features.

Wednesday, January 27, 2010
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 3:45 pm

Joint CCMB/MPPB/Psychiatry Seminar
Jason Moore Dartmouth Medical School Bioinformatics Challenges for Genome-Wide Association Studies

Human genetics is currently dominated by the genome-wide association study (GWAS) that measures and evaluates one million or more single nucleotide polymorphisms (SNPs) for their disease associations. The current biostatistical paradigm is to analyze each SNP individually without regard to the rest of the genome or environmental exposure.

This agnostic or unbiased approach has not been successful for identifying SNPs with moderate or large effects on disease susceptibility. We present here an alternative bioinformatics strategy for GWAS analysis that focuses on gene-gene and gene-environment interactions and their context in biochemical pathways.

Wednesday, November 18, 2009
3:00pm
LMM - 70 Ship Street, Room 107

Refreshments will be served at 2:45 pm

CCMB Lecture Series
Eli Stahl Brigham and Women's Hospital The Present and Future of Genome-wide Association Studies in Rheumatoid Arthritis

Results and current progress of a large-scale case-control genome-wide association study (GWAS) of rheumatoid arthritis (RA) shed further light on this autoimmune disease, and help to frame a broad perspective on mapping complex traits. Genotypes at over 2.5 million common single nucleotide polymorphisms (SNPs) were tested for association with RA in 5539 cases and 20169 controls of European descent. Eleven new RA risk alleles replicate in additional samples. Conditional and haplotype analyses refine the association signal in several loci with evidence for multiple independent effects in autoimmunity. Still, all common variant associations validated to date together explain relatively little of the additive genetic variance for RA, and suggest major contributions of (1) many more common variants of very small effect, (2) copy number or other kinds of variants, (3) rare variants, and/or (4) non-additive genetic, epigenetic or non-genetic effects. A polygenic risk score analysis can allow inference of the remaining effect due to common variants en masse (scenario 1, with some implications for scenarios 2 and 3). The direct benefit of current and future common-variant GWAS is limited under all of these scenarios, but GWAS certainly inform complimentary approaches including deep re-sequencing in case-control cohorts, and integrated clinical/functional and genetic analyses.

Wednesday, October 28, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Yosef E. Maruvka Bar-Ilan University Genetic polymorphism and demography: a statistical mechanics approach

The recent progress in sequencing techniques has been followed by an exponential growth in the amount of available genetic data. Traditional methods of analysis require exact reconstruction of the phylogenetic tree, and therefore cannot deal with these immense databases. Given the efficiency of "mean field" approximation in physical systems with many particles, we are applying the same techniques and concepts to genetic problems (where it turns out that 50 can be many).

The inferring of past demographic parameters from current polymorphism data will be discussed for two examples:
1. Retrieval of the effective population size and its growth rate using the number of lineages as a function of time. Here the mean-field method has been found to be an unbiased estimator, unlike the existing methods, and with a smaller error range.
2. The difference between additive noise and multiplicative noise, a basic concept in statistical mechanics, can be used to determine the ongoing debate between the adaptive and the neutral (Hubbell's) theories of biodiversity.

Wednesday, October 21st, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Mark A. DePristo, Ph.D. Broad Institute of Harvard and MIT Discovering genetic variation in 1000 Genomes: from mapping reads to putative de novo mutations

The 1000 genomes project aims to discover and characterize all common human genetic variation with a minor allele frequency (MAF) = 0.5%. The pilot phase of the project was completed in June producing five terabases of Illumina/Solexa, SOLiD, and Roche/454 sequences in ~180 individuals sequenced to ~4x average depth genome-wide in three populations, 30-60x whole-genome sequence for two mother, father, daughter trios, and ~800 individuals with 50x+ coverage using hybrid capture in 1000 randomly-selected genes.

Here we describe the sequence calibration, realignment, and analysis tools we developed at the Broad to discover with high sensitivity and specificity single-nucleotide (SNPs) and short (< 20bp) insertion/ deletion (indels) polymorphisms in all three wings of the pilot phase of the 1000 genomes project. We assess our approach by comparing discovered variation among technologies, across pilot arms, to population genetic expectations and to complementary efforts from other groups participating the 1000 genomes project. Finally, we subject a randomly selected subset of SNP and indel calls to experimental validation to estimate project- wide specificity rates. We highlight best practices and lessons learned on the production and analysis of next-generation sequencer data.

Wednesday, October 14, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Lee A. Newberg, Ph.D. Wadsworth Institute Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, with pairwise alignment of nucleotide sequences as an example.

Hidden Markov models and score-maximizing dynamic programming algorithms are employed for the evaluation of sequential data in a variety of scientific fields, including linguistics, vision, and computational biology. Given a hidden Markov model, efficient "Viterbi" and "forward" algorithms are used to evaluate the probability that the model would generate a given sequence of observations, and similar approaches are employed in the dynamic programming algorithms where the focus is on finding high scores instead of high probabilities. Here we present modifications to the "forward" algorithm that allow additional computations. We can efficiently estimate statistical significance: what is the probability that a randomly generated sequence will score at least as high as the observed sequence does? (We've computed answers down to 1e-4000.) We can also compute how typical a sequence is: for every whole number d, what is the probability that a sequence generated by the hidden Markov model will have exactly d differences from the observed sequence?

Wednesday, October 7, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Charles (Chip) Lawrence
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Alexandros Stamatakis Technische Universität München Department of Computer Science Mapping the Phylogenetic Likelihood Kernel to Emerging Parallel Computer Architectures

The phylogenetic likelihood function is the by far most compute- intensive part of every ML-based phylogenetic inference algorithm. I will present several solutions for appropriately adapting this computational kernel to a variety of accelerator and supercomputer architectures ranging from FPGAs up to massively parallel machines like the BG/L. I will also address load-balancing problems in the kernel and a study on single- versus double-precision arithmetics trade-offs. Moreover, I will introduce a basic categorization of input datasets into well-shaped and badly-shaped alignments that require distinct algorithmic and parallelization approaches. Finally, I will address an algorithm for rapid phylogenetic placement/ identification of short reads from environmental samples.

Wednesday, June 10, 2009
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Casey Dunn
Refreshments will be served at 3:45 pm

_______________________________________________________ Events