Brown University | Center for Computational Molecular Biology

Brown University

Center Home

The Center

Research Areas

Bioinformatics Tools

Courses

Events

Affiliated Programs

Executive Committee

Publications

Undergraduate Study

Graduate Study

Lecture Videos

News Archive

Open Positions

CCMB Seminar Series 2010-2011

_______________________________________________________ Events

To receive CCMB seminar announcements by email, sign up for the computational biology mailing list by sending email to listserv@listserv.brown.edu with the message body "subscribe computational-biology" Announcements are also made on the Facebook group page CCMB at Brown as well as on Twitter: @BrownCCMB.

CCMB Lecture Series
Jeffrey Jensen University of Massachusetts Medical School The Population Genetics of Adaptation

Quantifying the relative roles of adaptive and non-adaptive processes in the evolution of natural populations is perhaps the most fundamental question in evolutionary biology, and has been studied for well over a century. However, this characterization remains remarkably elusive owing to difficulties associated with uncoupling genomic signatures of selection from patterns produced by demographic factors (e.g., population size change and structure). I will summarize recent theoretical and statistical advances on this topic, as well as highlight three recent applications of this research, each utilizing a different data class: 1. Divergence: understanding human evolution in the age of Neandertal genomics 2. Polymorphism: characterizing the evolution of adaptive coat color in wild mice 3. Experimental: inferring the distribution of fitness effects by considering all possible point mutations in yeast.

Wednesday, November 10, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Dan Weinreich
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Michael Zhang Cold Spring Harbor Laboratory Defining splicing-regulatory networks of the tissue-specific splicing factors Fox-1/Fox-2

Dr. Zhang’s research interests include computational biology and bioinformatics with a special focus on genomic and epigenomic regulation networks in normal and disease states. Since 1991, his lab specialized in gene structure and regulation, gene control of development, cellular responses to environmental signals and cell differentiation. He is well-known in the international bioinformatics community, and serves as a distinguished guest chair professor at Tsinghua University and scientific advisory board member for the CAS-Max Planck Joint Institute of Computational Biology in China. He recently left Cold Spring Harbor to University of Texas at Dallas where he currently holds the Cecil H. and Ida Green Distinguished Chair of Systems Biology Science.

Wednesday November 17, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: William Fairbrother
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Scott Roy Stanford Medical School Comparative and Population Genomics of Introns and Splicing

The spliceosomal system of the eukaryotic nucleus is ubiquitous, with a vast spliceosomal machinery responsible for the splicing of up to hundreds of thousands of introns. Splicing is a major source of protein and regulatory novelty and has connections to a host of cellular processes. Splicing also shows tremendous diversity: species show striking differences in intron number (<10 to 200,000 per genome), intron length (median from 19 to >2000 nucleotides), intron sequence homogeneity (from minimal dinucleotide splice sites to strict extended motifs), and frequency and type of alternative splicing. I will discuss a variety of studies of the evolutionary history and phylogenetic diversity of spliceosomal introns, including: (i) reconstruction of early eukaryotic intron-exon structures; (ii) the molecular mechanisms of intron creation; (iii) novel classes of introns in the protists Giardia lamblia and Trichomonas vaginalis; (iv) the population genomics of intron loss and gain. I will discuss preliminary population genomic data on Prasinophtes green algae, a promising new model organism for studying intron evolution and differences in intron-exon structures.

Wednesday October 20, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Alfredo Ferro Alessandro Lagana University of Catania, Italy Computational Techniques for Graph Matching and miRNA Targeting with applications in Biology and Biomedicine

PART 1. Exact and Inexact Graph Matching with applications in Bioinformatics.
We briefly present two methods. The first one for quickly finding all exact matches of a given query graph in a (large) database of graphs (or in a single large graph). An efficient filtering technique based on localized features will be illustrated. The second method deals with the more challenging problem of finding all the "approximate" matches of a query graph in a database of graphs. In particular we will allow a limited number of edge deletions. An efficient solution based on a version of the multi-set multi-cover problem will be presented.
Applications to PPI networks and drug design will be outlined.

PART 2. microRNAs (miRNAs) are small non coding RNAs responsible of post-transcriptional gene regulation. Their crucial role in several physiological and pathological processes has been demonstrated, but their mechanisms of action and functions still remain unclear.
Here we present three computational tools for miRNA analysis: miRiam, miR-Synth and miRò.
miRiam is a tool for predicting potential miRNA binding sites on a given target, which makes use of empiriacal constraints and thermodynamics.
miR-Synth is a tool for designing highly-specific synthetic miRNAs for regulating a set of target genes.
miRò is a web based system which provides users with miRNA-phenotype associations in humans. The main goal of miRò is to provide users with
powerful query tools for finding non-trivial associations among heterogeneous data and thereby to allow the identification of relationships among genes, processes, functions and diseases at the miRNA level.

Tuesday, September 28, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Charles Lawrence
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Stephen M. Mount, Ph.D. Dept. of Cell Biology and Molecular Genetics Center for Bioinformatics and Computational Biology University of Maryland SplicePort: a transparent box for splice site prediction

I am a molecular geneticist interested in splice site selection during pre-mRNA splicing in eukaryotes. My laboratory takes genetic and computational approaches to this problem using Arabidopsis thaliana and Drosophila melanogaster. I am in the Dept. of Cell Biology and Molecular Genetics at the University of Maryland. I am also affiliated with the Center for Bioinformatics and Compuational Biology, with the Dept. of Biology and with graduate programs in Molecular and Cell Biology (MOCB) and Behavior, Ecology, Evolution and Systematics (BEES).

Wednesday, September 22, 2010
4:00pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Will Fairbrother
Refreshments will be served at 3:45 pm

CCMB Lecture Series
Bjarni Halldorsson Reykjavik University Haplotype Phasing by Multi-Assembly of Shared Haplotypes

We consider the problem of haplotype phasing by multi-assembly of shared haplotypes. Specifically, we consider four types of results which together provide a comprehensive workflow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype assembly based on conflict graphs of sequencing reads (3) inference of pedigree structure through haplotype sharing via tract finding algorithms and (4) multi-assembly of shared haplotypes. The input for the workflows that we consider are any of the combination of: (A) genotype data (B) next generation sequencing (NGS) (C) pedigree information.

We give a polynomial time algorithm for haplotype assembly for a pair of individuals that share a haplotype and heuristics for the simultanuous haplotype phasing of multiple individuals. Finally we give statistics for the coverage of the genome with haplotypes.

Joint work with Derek Aguiar and Sorin Istrail.

Wednesday December 8, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm

CCMB Seminar Series
Sorin Draghici Wayne State University A systems biology approach for the steady-state analysis of gene signaling networks

Once heralded as the holy grail, the capability of obtaining a comprehensive list of genes, proteins or
metabolites that are different between disease and normal phenotypes is routine today. And yet, the holy grail of high-throughput has not delivered so far. Even though such high-throughput comparisons have become relatively easy to perform, understanding the phenomena that determine the measured changes is as challenging as ever, if not more so. We conjecture that part of the problem is related to the limitations of the current approach used for the analysis of signaling pathways. A statistical approach using various models is universally used to identify the most relevant pathways in a given experiment. In this talk, we show that despite its general adoption, this statistical analysis is unsatisfactory, and can often provide incorrect results. Using a systems biology approach, we have developed an impact analysis that includes the classical statistics, but also considers other crucial factors such as the magnitude of each gene’s expression change, their type and position in the given pathways, their interactions, etc. On several illustrative data sets, the classical analysis produces both false positives and false negatives while the impact analysis provides biologically meaningful results.

Thursday February 3, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm

CCMB Seminar Series
Shyam Gopalakrishnan University of Michigan Site Frequency Spectrum estimation from low coverage sequence data

Large scale sequencing projects, such as the 1000 Genomes project, allow us to re-examine populations genetics questions with sequence data. Sequencing study designs involve a trade-off between sample size and per sample sequencing depth. Low coverage sequencing study designs are well suited for medical sequencing projects but lead to high uncertainty in genotype calls: For low-frequency and common variants, population and LD based genotype callers alleviate this problem by combining information across samples and markers. For rare variants, this uncertianty results in biased genotype calls, leading to a biased estimate of the Site Frequency Spectrum (SFS). Biased estimates of SFS lead to incorrect estimates of piopulation genetics parameters. We investigate the bias in SFS estimated using called genotypes. We present a method to recover the true SFS by integrating out the genotype uncertainty using an Expectation-Maximization framework. We evaluate the performance of our algorithm using a simulation study and compare our method to the genotype caller based SFS estimation.

Wednesday February 9, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sohini Ramachandran
Refreshments will be served at 3:45 pm

CCMB Seminar Series
Matteo Fumagalli Politecnico di Milano University Natural Selection in the Human Genome and Implications for Susceptibility to Disease

Detecting natural selection in the human genome as a double meaning. First, we can make inferences about past processes that occurred during human evolutionary history. Second, signals of non neutral evolution may unveil important functional information, especially when targeted genes are associated with disease.

In the first part, I will show an example of how population genetics approaches represent a valuable opportunity to address evolutionary history of modern complex diseases. I will describe the evolutionary pattern in five human populations of an antiviral response gene. Data provide robust evidences of long-standing balancing selection with a coding trans-specific polymorphism being the putative causal variant. I will then look for the underlying mechanism responsible for the maintenance of the coding SNP and retrieve a significant difference of heterozygosity between subjects suffering from Multiple Sclerosis and healthy individuals, supporting a model whereby heterozygotes are protected from the disease.

In the second part, I will describe a statistical framework to search for signatures of genetic adaptation to local environments. I will formally show that pathogens have been the main selective pressure through human evolution. Results also indicates an enrichment of pathogen-driven selected genes which have been previously associated with autoimmune diseases, in line with the view whereby a portion of susceptibility alleles might be maintained in human populations due to past selective processes.

Thursday February 24, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sohini Ramachandran
Refreshments will be served at 3:45 pm

CCMB Seminar Series
David Kelley University of Maryland Quality-aware detection and correction of sequencing errors

Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. To improve analysis of such environments, I developed SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method that achieves greater accuracy than previous unsupervised approaches. I will also examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available.

Massively parallel DNA sequencing has become a prominent tool in biological research. The high-throughput and low cost of second-generation sequencing technologies has allowed researchers to address an ever-larger set of biological and biomedical problems. Although sequence fidelity is high, the primary errors are substitution errors that rise in frequency at the 3' ends of reads. Sequencing errors complicate analysis, which normally requires that reads be aligned to each other (for genome assembly) or to a reference genome (for detection of mutations). I developed a program called Quake to detect and correct errors in DNA sequencing reads based on the coverage of k-mers. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake.

Monday February 28, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Ben Raphael
Refreshments will be served at 3:45 pm

_______________________________________________________ Events