CCMB Seminar
Series 2010-2011
_______________________________________________________ Events
To receive CCMB seminar
announcements by email, sign up for the computational
biology mailing list by sending email to listserv@listserv.brown.edu with
the message body "subscribe computational-biology"
Announcements are also made on the Facebook group page CCMB at Brown as well as on Twitter: @BrownCCMB.
CCMB
Lecture Series |
Jeffrey Jensen
University of Massachusetts Medical School
The Population Genetics of Adaptation |
|
Quantifying the relative roles of adaptive and non-adaptive processes in the evolution of natural populations is perhaps the most fundamental question in evolutionary biology, and has been studied for well over a century. However, this characterization remains remarkably elusive owing to difficulties associated with uncoupling genomic signatures of selection from patterns produced by demographic factors (e.g., population size change and structure). I will summarize recent theoretical and statistical advances on this topic, as well as highlight three recent applications of this research, each utilizing a different data class:
1. Divergence: understanding human evolution in the age of Neandertal genomics
2. Polymorphism: characterizing the evolution of adaptive coat color in wild mice
3. Experimental: inferring the distribution of fitness effects by considering all possible point mutations in yeast.
Wednesday, November 10, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Dan Weinreich
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Michael Zhang
Cold Spring Harbor Laboratory
Defining splicing-regulatory networks of the tissue-specific splicing factors Fox-1/Fox-2 |
|
Dr. Zhang’s research interests include computational biology and bioinformatics with a special focus on genomic and epigenomic regulation networks in normal and disease states. Since 1991, his lab specialized in gene structure and regulation, gene control of development, cellular responses to environmental signals and cell differentiation. He is well-known in the international bioinformatics community, and serves as a distinguished guest chair professor at Tsinghua University and scientific advisory board member for the CAS-Max Planck Joint Institute of Computational Biology in China. He recently left Cold Spring Harbor to University of Texas at Dallas where he currently holds the Cecil H. and Ida Green Distinguished Chair of Systems Biology Science.
Wednesday November 17, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: William Fairbrother
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Scott Roy
Stanford Medical School
Comparative and Population Genomics of Introns and Splicing |
|
The spliceosomal system of the eukaryotic nucleus is ubiquitous, with
a vast spliceosomal machinery responsible for the splicing of up to
hundreds of thousands of introns. Splicing is a major source of
protein and regulatory novelty and has connections to a host of
cellular processes. Splicing also shows tremendous diversity: species
show striking differences in intron number (<10 to 200,000 per
genome), intron length (median from 19 to >2000 nucleotides), intron
sequence homogeneity (from minimal dinucleotide splice sites to strict
extended motifs), and frequency and type of alternative splicing. I
will discuss a variety of studies of the evolutionary history and
phylogenetic diversity of spliceosomal introns, including: (i)
reconstruction of early eukaryotic intron-exon structures; (ii) the
molecular mechanisms of intron creation; (iii) novel classes of
introns in the protists Giardia lamblia and Trichomonas vaginalis;
(iv) the population genomics of intron loss and gain. I will discuss
preliminary population genomic data on Prasinophtes green algae, a
promising new model organism for studying intron evolution and
differences in intron-exon structures.
Wednesday October 20, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Daniel Weinreich
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Alfredo Ferro
Alessandro Lagana
University of Catania, Italy
Computational Techniques for Graph Matching and miRNA Targeting with applications in Biology and Biomedicine |
|
PART 1. Exact and Inexact Graph Matching with applications in Bioinformatics.
We briefly present two methods. The first one for quickly finding all exact matches of a given query graph in a (large) database of graphs (or in a single large graph). An efficient filtering technique based on localized features will be illustrated. The second method deals with the more challenging problem of finding all the "approximate" matches of a query graph in a database of graphs. In particular we will allow a limited number of edge deletions. An efficient solution based on a version of the multi-set multi-cover problem will be presented.
Applications to PPI networks and drug design will be outlined.
PART 2. microRNAs (miRNAs) are small non coding RNAs responsible of post-transcriptional gene regulation. Their crucial role in several physiological and pathological processes has been demonstrated, but their mechanisms of action and functions still remain unclear.
Here we present three computational tools for miRNA analysis: miRiam, miR-Synth and miRò.
miRiam is a tool for predicting potential miRNA binding sites on a given target, which makes use of empiriacal constraints and thermodynamics.
miR-Synth is a tool for designing highly-specific synthetic miRNAs for regulating a set of target genes.
miRò is a web based system which provides users with miRNA-phenotype associations in humans. The main goal of miRò is to provide users with
powerful query tools for finding non-trivial associations among heterogeneous data and thereby to allow the identification of relationships among genes, processes, functions and diseases at the miRNA level.
Tuesday, September 28, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles Lawrence
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Stephen M. Mount, Ph.D.
Dept. of Cell Biology and Molecular Genetics
Center for Bioinformatics and Computational
Biology University of Maryland
SplicePort: a transparent box for splice site prediction |
|
I am a molecular geneticist interested in splice site selection during pre-mRNA splicing in eukaryotes. My laboratory takes genetic and computational approaches to this problem using Arabidopsis thaliana and Drosophila melanogaster. I am in the Dept. of Cell Biology and Molecular Genetics at the University of Maryland. I am also affiliated with the Center for Bioinformatics and Compuational Biology, with the Dept. of Biology and with graduate programs in Molecular and Cell Biology (MOCB) and Behavior, Ecology, Evolution and Systematics (BEES).
Wednesday, September 22, 2010
4:00pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Will Fairbrother
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Bjarni Halldorsson
Reykjavik University
Haplotype Phasing by Multi-Assembly of Shared Haplotypes |
|
We consider the problem of haplotype phasing by multi-assembly of shared haplotypes. Specifically, we
consider four types of results which together provide a comprehensive workflow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype
assembly based on conflict graphs of sequencing reads (3) inference of pedigree structure through
haplotype sharing via tract finding algorithms and (4) multi-assembly of shared haplotypes.
The input for the workflows that we consider are any of the combination of: (A)
genotype data (B) next generation sequencing (NGS) (C) pedigree information.
We give a polynomial time algorithm for haplotype assembly for a pair of individuals that share a haplotype and
heuristics for the simultanuous haplotype phasing of multiple individuals. Finally we give statistics for the coverage of the genome with haplotypes.
Joint work with Derek Aguiar and Sorin Istrail.
Wednesday December 8, 2010
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm
CCMB
Seminar Series |
Sorin Draghici
Wayne State University
A systems biology approach for the steady-state analysis of gene signaling networks |
|
Once heralded as the holy grail, the capability of obtaining a comprehensive list of genes, proteins or
metabolites that are different between disease and normal phenotypes is routine today. And yet, the holy grail of high-throughput has not delivered so far. Even though such high-throughput comparisons have become relatively easy to perform, understanding the phenomena that determine the measured changes is as challenging as ever, if not more so. We conjecture that part of the problem is related to the limitations of the current approach used for the analysis of signaling pathways. A statistical approach using various models is universally used to identify the most relevant pathways in a given experiment. In this talk, we show that despite its general adoption, this statistical analysis is unsatisfactory, and can often provide incorrect results. Using a systems biology approach, we have developed an impact analysis that includes the classical statistics, but also considers other crucial factors such as the magnitude of each gene’s expression change, their type and position in the given pathways, their interactions, etc. On several illustrative data sets, the classical analysis produces both false positives and false negatives while the impact analysis provides biologically meaningful results.
Thursday February 3, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm
CCMB
Seminar Series |
Shyam Gopalakrishnan
University of Michigan
Site Frequency Spectrum estimation from low coverage sequence data |
|
Large scale sequencing projects, such as the 1000 Genomes project, allow us to re-examine populations genetics questions with sequence data. Sequencing study designs involve a trade-off between sample size and per sample sequencing depth. Low coverage sequencing study designs are well suited for medical sequencing projects but lead to high uncertainty in genotype calls: For low-frequency and common variants, population and LD based genotype callers alleviate this problem by combining information across samples and markers. For rare variants, this uncertianty results in biased genotype calls, leading to a biased estimate of the Site Frequency Spectrum (SFS). Biased estimates of SFS lead to incorrect estimates of piopulation genetics parameters. We investigate the bias in SFS estimated using called genotypes. We present a method to recover the true SFS by integrating out the genotype uncertainty using an Expectation-Maximization framework. We evaluate the performance of our algorithm using a simulation study and compare our method to the genotype caller based SFS estimation.
Wednesday February 9, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sohini Ramachandran
Refreshments will be served at 3:45 pm
CCMB
Seminar Series |
Matteo Fumagalli
Politecnico di Milano University
Natural Selection in the Human Genome and Implications for Susceptibility to Disease |
|
Detecting natural selection in the human genome as a double meaning. First, we can make inferences about past processes that occurred during human evolutionary history. Second, signals of non neutral evolution may unveil important functional information, especially when targeted genes are associated with disease.
In the first part, I will show an example of how population genetics approaches represent a valuable opportunity to address evolutionary history of modern complex diseases. I will describe the evolutionary pattern in five human populations of an antiviral response gene. Data provide robust evidences of long-standing balancing selection with a coding trans-specific polymorphism being the putative causal variant. I will then look for the underlying mechanism responsible for the maintenance of the coding SNP and retrieve a significant difference of heterozygosity between subjects suffering from Multiple Sclerosis and healthy individuals, supporting a model whereby heterozygotes are protected from the disease.
In the second part, I will describe a statistical framework to search for signatures of genetic adaptation to local environments. I will formally show that pathogens have been the main selective pressure through human evolution. Results also indicates an enrichment of pathogen-driven selected genes which have been previously associated with autoimmune diseases, in line with the view whereby a portion of susceptibility alleles might be maintained in human populations due to past selective processes.
Thursday February 24, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sohini Ramachandran
Refreshments will be served at 3:45 pm
CCMB
Seminar Series |
David Kelley
University of Maryland
Quality-aware detection and correction of sequencing errors |
|
Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. To improve analysis of such environments, I developed SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method that achieves greater accuracy than previous unsupervised approaches. I will also examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available.
Massively parallel DNA sequencing has become a prominent tool in biological research. The high-throughput and low cost of second-generation sequencing technologies has allowed researchers to address an ever-larger set of biological and biomedical problems. Although sequence fidelity is high, the primary errors are substitution errors that rise in frequency at the 3' ends of reads. Sequencing errors complicate analysis, which normally requires that reads be aligned to each other (for genome assembly) or to a reference genome (for detection of mutations). I developed a program called Quake to detect and correct errors in DNA sequencing reads based on the coverage of k-mers. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake.
Monday February 28, 2011
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Ben Raphael
Refreshments will be served at 3:45 pm
_______________________________________________________ Events
|