Brown University Center for Computational Molecular Biology

Events

CCMB Seminar Series 2008-2009

_______________________________________________________ Events

To receive CCMB seminar announcements by email, sign up for the computational biology mailing list by sending email to listserv@listserv.brown.edu with the message body "subscribe computational-biology"

CCMB Seminar Series

Solomon Marcus


Professor Emeritus
University of Bucharest

Brown University Lectures – August 2008

Solomon Marcus
  1. THE LONELINESS OF THE MATHEMATICIAN
  2. TO PROVE OR NOT TO PROVE, THAT IS THE QUESTION!
  3. SCIENCE TODAY VERSUS SCIENCE YESTERDAY
  4. WE ARE SURROUNDED BY HIDDEN CONFLICTS
  5. INFORMATION: WHAT DOES IT MEAN?
  6. MATHEMATICAL MISTAKES AS A SOURCE OF CREATIVITY
  7. THE BIOLOGICAL CELL IN SPECTACLE
  8. TWO NEIGHBORS IGNORING EACH OTHER: INFINITE WORDS AND FORMAL LANGUAGES

The first two lectures, followed by a reception will take place in the CIT SWIG Boardroom, 1:00pm - 4:30pm, August 14, 2008. The time and place of the rest for the lectures (TBA).

Professor Solomon Marcus is a member of the Romanian Academy of Sciences and Emeritus Professor of the University of Bucharest. His publications have been in the field of mathematical analysis, mathematical and computational linguistics, computer science, poetics, linguistics, semiotics, philosophy and history of science, education, relations between science, humanities, philosophy and religion. Marcus published about 50 books in Romanian, English, French, German, Italian, Spanish, Russian, Greek, Hungarian, Czech, Serbo-Croatian, and about 400 research articles in specialized journals in almost all European countries, in the United States, Canada, South America, Japan, India, and New Zealand among others; more than 1,000 authors have quoted his works.

He is recognized as one of the initiators of mathematical linguistics and of mathematical poetics, and is a member of the editorial board of several international scientific journals.

Marcus wrote a paper together with Paul Erd?s ("Sur la décomposition de l'espace euclidien en ensembles homogènes", Acta Math. Acad. Sci. Hungar 8 (1957), 443–452); this gives him an Erd?s number of 1. If we denote "x -> y" as "x had as PhD advisor y" then here is his PhD advisor phylogeny: he has Euler Number 8 and Leibnitz Number 11: Solomon Marcus -> Miron Nicolescu -> Paul Montel -> Emil Borel and Henri Lebesque -> Gaston Darboux -> Michel Chasles -> Simeon Poisson -> Joseph Lagrange -> Leonard Euler -> Johann Bernoulli -> Jacob Bernoulli -> Gottfied Leibniz

CCMB Lecture Series

T.M. Murali

Virginia Polytechnic Institute and State University

Nework Legos: Building Blocks of Cellular Wiring Diagrams

T.M. Murali

Molecular interaction networks generated by high-throughput whole-genome biological assays are highly intricate and difficult to interpret. Since cellular functions are carried out by modules of interacting molecules, reverse-engineering the modular structure of cellular interaction networks has the promise of significantly easing their analysis.

We develop a top-down computational approach to identify building blocks of molecular interaction networks by
(i) integrating gene expression measurements for a particular disease state (e.g., leukaemia) or experimental condition (e.g., treatment with growth serum) with molecular interactions to reveal an active network, which is the network of interactions perturbed in the cell in that disease state or condition and
(ii) systematically combining active networks computed for different experimental conditions using set-theoretic formulae to reveal network legos, which are modules of coherently interacting genes and gene products in the wiring diagram.

We propose methods to compute active networks, systematically mine candidate legos, assess the statistical significance of these candidates, arrange them in a directed acyclic graph (DAG), and exploit the structure of the DAG to identify true network legos. We assess the stability of our computations to changes in the input and our ability to recover active networks by composing network legos.

We analyse two human datasets using our method. A comparison of three leukaemias demonstrates how a biologist can use our system to identify specific differences between these diseases. A larger-scale analysis of 13 distinct stresses illustrates our ability to compute the building blocks of the interaction networks activated in response to these stresses and to use these building blocks to identify differences in the response of fibroblasts and HeLa cells to endoplasmic reticulum stress.

Friday, September 19th, 2008
12:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Will Fairbrother
Refreshments will be served at 11:45 am

CCMB Lecture Series

Daniel Aalberts

Williams College
Department of Computer Science

Finding with Binding and Ranking: splicing mRNA

Daniel Aalberts

Abstract: Gene expression is often regulated by the binding of small RNAs or proteins to messenger RNA, and splicing of mRNA is one important example. We have developed physical-chemical models of binding which can be computed efficiently with our new oligo-binding algorithm BINDIGO. We have also developed a statistical method (Primary Sequence Ranking) which outperforms other tools for identifying splice sites.

Daniel Aalberts is currently Associate Professor of Physics at Williams College. He has the distinction of having supervised two of the past ten Apker Award winners for outstanding undergraduate physics research. His RNA splicing research is supported by NIH; RNA pseudoknots, by NSF. Aalberts received his SB and PhD degrees from MIT, and did postdoctoral research at Leiden Univ. in the Netherlands and at the Center for Studies in Physics and Biology at Rockefeller Univ. in NYC.

Wednesday, September 24th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Will Fairbrother
Refreshments will be served at 3:45 pm

CCMB Lecture Series

Simon Kasif

Boston University
Children's Hospital, Boston
Center for Advanced Genomic Technology
Computational Genomics Laboratory


Systems Macro-Biology: From Parts and Genomes to Network Signatures of Disease

Simon Kasif

Abstract: Computer systems specification, analysis and diagnosis are arguably one of the most practically important and essential topics in computer science and engineering. This research resulted in tools and environments that allow engineers to specify and analyze hardware and networks as well as recognize and correct unexpected behaviors, network intrusion or anomalies.

Medicine and biology, on surface present similar problems. Our cells are provided with basic instructions that when executed properly enable living organisms to function properly. As a result of genetic, epigenetic or environmental perturbations our cells exhibit aberrations in their functional behaviors leading to major diseases such as cancer or diabetes, causing inordinate suffering for patients and their families.

We focus on insulin signaling and related processes such as inflammation and glucose metabolism. We describe our on-going projects aimed towards identification of the full dictionary of parts and their cellular interactions that are involved in these important biological functions using both network and evolutionary approaches. This work leads to better genomic annotation of newly sequenced genes and a number of novel predictions. Manipulations of several genes in these pathways have been shown to extend life in model organisms or have predicted associations with diabetes in the human population.

Biology is very complex and it is often difficult to formalize the entire repertoire of "normal" or "abnormal" clinical phenotypes of age associated diseases such as Diabetes or Alzheimer's. We describe the new paradigm of network signatures of disease that allow us to recognize anomalies leading to disease at the molecular network level. Specifically, we introduce several concepts including Gene Network Enrichment Analysis (GNEA) and show how it enables biomedical researchers to identify and confirm disregulated molecular processes in diabetes and insulin resistance that elude recognition by standard methods. This work has the potential to lead to new diagnostic or prognostic biomarkers as well as new drug targets.

This work describes joint research performed at Boston University, Harvard Medical School, Joslin Diabetes Center, Harvard School of Public Health and the National Center for Biomedical Computing (I2B2) at Harvard Partners.

Wednesday, October 22nd, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 3:45 pm

CCMB Lecture Series

Roded Sharan

Tel-Aviv University
School of Computer Science

"A systems-level approach for mapping the telomere length maintenance gene circuitry"

Roded Sharan

Abstract: The ends of eukaryotic chromosomes are protected by telomeres, nucleoprotein structures that are essential for chromosomal stability and integrity. Understanding how telomere length is controlled has significant medical implications, especially in the fields of aging and cancer. Two recent systematic genome-wide surveys measuring the telomere length of deleted mutants in the yeast Saccharomyces cerevisiae have identified hundreds of telomere length maintenance (TLM) genes, which span a large array of functional categories and different localizations within the cell.

In my talk I will describe two recent works on integrating large-scale screening mutant data with protein-protein interaction information to rigorously chart the cellular subnetwork underlying the function investigated. I will show their application to the yeast telomere length control data, identifying pathways that connect the TLM proteins to the telomere-processing machinery, and predicting new TLM genes and their effect on telomere length.

Wednesday, October 29th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

CCMB Lecture Series

Sebastien Roch

Microsoft Research

“Phylogeny reconstruction: Are distance methods as accurate as maximum likelihood?”

Sabastien Roch

Abstract: Among the many popular techniques for reconstructing evolutionary trees from molecular sequences, distance-matrix methods such as UPGMA and Neighbor Joining are known to be the fastest. This speed stems from a straightforward, intuitive approach: the repeated agglomeration of the closest clusters of species. However, unlike more elaborate techniques such as maximum likelihood, distance methods only exploit correlations between pairs of sequences (a.k.a. the distance matrix). This limited use of the data is often cited as a serious weakness, as it is thought to affect the convergence rate in the large-sample limit. In this talk, I will discuss recent surprising results shedding some light on this question.

Bio: Sebastien Roch is a postdoctoral researcher at Microsoft Research. He earned his Ph.D. from the University of California, Berkeley under the guidance of Elchanan Mossel. His research interests include Markov models on trees, Markov chains, interacting particle systems, random graphs, and randomized algorithms -- with an emphasis on biological applications.

Wednesday, November 5th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sorin Istrail

Refreshments will be served at 3:45 pm

CCMB Lecture Series

Jerome Waldispuhl and Charles W. O'Donnell

MIT

“Modeling Structural Ensembles of Transmembrane Proteins and Beyond”

Jerome WaldispuhlCharles O'Donnell

Abstract: Computational protein structure modeling plays an important role in molecular biological research. In addition to well-established algorithms for the interpretation of experimental data (such as X-ray crystal diffraction and NMR), homology-based protein structure prediction tools have become accurate enough to significantly contribute to the understanding of a protein's structure, function, and interactions. Unfortunately, many protein families, such as transmembrane beta-barrels (found in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts), are experimentally difficult to study using crystallography or NMR, and few homologues are fully characterized, rendering existing methods insufficient. In this talk, Jerome Waldispuhl and Charles O'Donnell introduce a new family of algorithms, implemented as the tool "partiFold," for investigating the folding landscape of transmembrane beta-barrel proteins based only on sequence information, broad investigator knowledge, and a statistical-mechanical approach using the Boltzmann partition function. This provides predictions of all possible structural conformations that might arise in-vivo, along with their relative likelihood of occurrence. Using a parameterizable grammatical model, these algorithms incorporate high-level information, such as membrane thickness, with an energy function based on stacked amino-acid pair statistical potentials to predicted ensemble properties, such as the likelihood of two residues pairing in a beta-sheet, or the per-residue X-ray crystal structure B-value. Complete conformations can also be sampled from the ensemble, providing a good picture of the subset of low-energy structures [1]. This framework has also been extended in more recent work to combine these same ensemble prediction with classical sequence alignment algorithms to obtain high-quality alignments for non-homologous transmembrane beta-barrel protein pairs [2]. To conclude the talk, ongoing research is presented which generalizes this methodology to incorporate more expansive sets of beta-sheet forming proteins, such as amyloid fibrils and prions. By broadening the grammatical models, and incorporating additional energetic functions and features, this work better incorporates experimentalist knowledge, and provides more tangible hypotheses that can help guide experimentation in a semi-automated manner.

Thursday, November 6th, 2008
1:30 pm
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 1:15 pm

CCMB Lecture Series

Christine Heitsch, Ph.D.
Georgia Institute of Technology

“Analysis, prediction, and design of viral RNA secondary structures”

Christine Heitsch

Abstract: Understanding how biological sequences encode structural and functional information is a fundamental scientific challenge. For RNA viral genomes, the information encoded in the sequence extends well-beyond their protein coding role to the role of intra-sequence base pairing in viral packaging, replication, and gene expression. Working with the Pariacoto virus as a model sequence, we investigate the compatibility of predicted base pairings with the dodecahedral cage known from crystallographic studies. To build a putative secondary structure, we first analyze different possible configurations using a combinatorial model of RNA folding. We give results on the trade-offs among types of loop structures, the asymptotic degree of branching in typical configurations, and the characteristics of stems in "well-determined" substructures. These mathematical results yield insights into the interaction of local and global constraints in RNA secondary structures, and suggest new directions in understanding the folding of RNA viral genomes.

Wednesday, November 12th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Benjamin Raphael

Refreshments will be served at 3:45 pm

CCMB Lecture Series

Iuliana Ionita-Laza

Harvard University
Department of Biostatistics

“Estimating the number of unseen genetic variants in the human genome: a capture-recapture approach”

Abstract: The various genetic variation discovery projects (The SNP Consortium, The HapMap, The 1000 Genomes Projects etc.) aim to identify as much as possible of the underlying genetic variation in different human populations. The question we address in this talk is how many new (not yet seen) genetic variants are yet to be found.

We regard this question as an instance of the species problem in ecology, where the goal is to estimate the number of unseen species in a closed population. Using a parametric empirical Bayes model, we propose a method to estimate the number of new variants with a desired minimum frequency to be discovered in future studies, based on observed sequence data for a small number of individuals. The approach can also be used to predict the number of individuals necessary to sequence in order to capture all (or a fraction of) the variation with a specified minimum frequency.

We show results based on sequence data from four human populations, and discuss applications of these methods in the context of disease association studies with rare variants.

4:30pm, Thursday, Feb. 12th 2009
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 4:15 pm

CCMB Lecture Series

Elhanan Borenstein

Postdoctoral Fellow
Stanford University and Santa Fe Institute

“Reverse Ecology: From Large-Scale Analysis of Metabolic Networks, Growth Environments and Seed Sets to Species Interaction and Metagenomics”

Abstract: The topology of metabolic networks may provide important insights not only into the metabolic capacity of species, but also into the habitats in which they evolved. In this talk I will present several analyses of metabolic networks and show how various ecological insights can be obtained from genomic-based data.

I will first introduce various factors that affect the structure of metabolic networks, and specifically the environmental and genetic determinants that affect network modularity. I will then present the first large-scale computational reconstruction of metabolic growth environments, analyzing the metabolic networks of hundreds of species and using a graph-theory based algorithm to identify for each species a set of seed compounds that must be exogenously acquired. Such seed sets form ecological "interfaces" between metabolic networks and their surroundings, approximating the effective biochemical environment of each species. The seed sets' composition significantly correlates with several properties characterizing the species' environments and agrees with biological observations concerning major adaptations. Computational reconstruction of metabolic networks of ancestral species and phylogenetic analysis of the seed sets reveal the complex dynamics governing gain and loss of biosynthetic capacity across the phylogenetic tree.

I will further present an extension of this framework, accounting for interactions between species, by introducing a pair-wise, topology-based measure of biosynthetic support, which reflects the extent to which the nutritional requirements of one species could be satisfied by the biosynthetic capacity of another. I will show that this measure is aligned with host-parasite interactions and facilitates successful prediction of such interactions on a large-scale.

Finally, I will discuss the application of this approach to the analysis of microbial communities and metagenomic data of the human microbiota and outline future research directions; The "reverse ecology" approach demonstrated in these analyses lays the foundations for further studying the complex web of interactions characterizing various ecosystems and the evolutionary interplay between organisms and their habitats on a large scale.

4:00pm, Wednesday, Feb. 18th 2009
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 3:45 pm

CCMB Lecture Series

Sohini Ramachandran

Harvard University

“The spatial distribution of human genetic variation across continents and chromosomes”

Abstract: Theoretical population genetics offers us the ability to make inferences about past evolutionary forces from genetic variation observed in natural populations. However, the models we apply to data often make very simplistic assumptions about both demographic processes and the strength of natural selection over time. Such assumptions can lead to spurious conclusions about the likelihood of past evolutionary events.

I will discuss two spatial aspects of human genetic variation: the geographic distribution of genetic variation, and genetic differences within the genome due to chromosomal location on either the X chromosome or the autosomes. These studies highlight the role human demographic history has played in shaping the distribution of human genetic variability, and lead to interesting questions about the appropriate spatial and temporal scales at which we might study the signatures of evolutionary forces in the genomic era.

4:00pm, Wednesday, Feb. 25th 2009
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 3:45 pm

CCMB Lecture Series

Ting Wang

University of California, Santa Cruz

“Mobile elements shape transcriptional regulatory networks and the UCSC Cancer Genomics Browser”

Abstract: Mobile elements shape transcriptional regulatory networks

The evolutionary forces that establish and hone the network of target genes for transcription factors are largely unknown. We report here new evidence that mobile element mediated transcription regulation is an ongoing process. Recent, species-specific mobile elements, in particular, endogenous retroviruses (ERVs), can actively shape transcriptional networks in a species-specific manner. Using tumor suppressor and transcription factor p53 as an example, we show that its binding sites are highly enriched in Long Terminal Repeats (LTRs) of a few ERV subfamilies active only in primates. These p53 site-containing LTRs are in vivo binding sites for p53 and account for more than 30% of p53 genomic sites. Experimental validation confirmed that LTRs with a p53 site possess p53-dependent regulatory potential and regulate nearby gene expression. Similar phenomenon is observed for transcription factor Stat1. Our study indicates that invaluable treasures are hidden in the “junk” DNA, that these once foreign genetic materials are important evolutionary forces that impact our genome in a systematic manner, and that the connections one can make between mis-regulated mobile elements and human diseases provide a new avenue for disease investigation.

The UCSC Cancer Genomics Browser

We describe the UCSC Cancer Genomics Browser, a suite of web-based tools to integrate, visualize and analyze cancer genomics and clinical data. This browser displays a whole-genome-oriented view of genome- wide experimental measurements for individual and sets of samples alongside their associated clinical information. The browser also enables investigators to order, filter, aggregate, classify and display data interactively based on any given feature set including clinical features, annotated biological pathways, and user-edited collections of genes. Standard statistical tools are integrated to provide quantitative analysis of whole genomic data or any of its subsets. We demonstrate the capability of this browser with data from several published large cancer genomics studies. The browser is also being used on confidential, prepublication data by multiple groups. This browser is an extension of the UCSC Genome Browser; thus it inherits and integrates the Genome Browser’s existing rich set of human biology and genetics data to enhance the interpretability of the cancer genomics data.

4:00pm, Wednesday, March. 25th 2009
CIT Bldg, Room 241, SWIG Boardroom

Refreshments will be served at 3:45 pm

Chalk Talk: Noon Thursday, March 26th 2009
CIT Bldg, Room 241, SWIG Boardroom

_______________________________________________________ Events

Brown Homepage Brown University