CCMB Seminar
Series 2008-2009
_______________________________________________________ Events
To receive CCMB seminar
announcements by email, sign up for the computational
biology mailing list by sending email to listserv@listserv.brown.edu with
the message body "subscribe computational-biology"
CCMB
Seminar Series |
Solomon
Marcus
Professor Emeritus
University of Bucharest
Brown University
Lectures – August 2008
|
|
- THE LONELINESS
OF THE MATHEMATICIAN
- TO
PROVE OR NOT TO PROVE, THAT IS THE QUESTION!
- SCIENCE
TODAY VERSUS SCIENCE YESTERDAY
- WE
ARE SURROUNDED BY HIDDEN CONFLICTS
- INFORMATION:
WHAT DOES IT MEAN?
- MATHEMATICAL
MISTAKES AS A SOURCE OF CREATIVITY
- THE
BIOLOGICAL CELL IN SPECTACLE
- TWO
NEIGHBORS IGNORING EACH OTHER: INFINITE WORDS
AND FORMAL LANGUAGES
The first two lectures, followed
by a reception will take place in the CIT SWIG
Boardroom, 1:00pm - 4:30pm, August 14, 2008. The
time and place of the rest for the lectures (TBA).
Professor Solomon Marcus is a member of the Romanian
Academy of Sciences and Emeritus Professor of the
University of Bucharest. His publications have
been in the field of mathematical analysis, mathematical
and computational linguistics, computer science,
poetics, linguistics, semiotics, philosophy and
history of science, education, relations between
science, humanities, philosophy and religion. Marcus
published about 50 books in Romanian, English,
French, German, Italian, Spanish, Russian, Greek,
Hungarian, Czech, Serbo-Croatian, and about 400
research articles in specialized journals in almost
all European countries, in the United States, Canada,
South America, Japan, India, and New Zealand among
others; more than 1,000 authors have quoted his
works.
He is recognized as one of the initiators of mathematical
linguistics and of mathematical poetics, and is
a member of the editorial board of several international
scientific journals.
Marcus wrote a paper together with Paul Erd?s
("Sur la décomposition de l'espace
euclidien en ensembles homogènes",
Acta Math. Acad. Sci. Hungar 8 (1957), 443–452);
this gives him an Erd?s number of 1. If we denote "x
-> y" as "x had as PhD advisor y" then
here is his PhD advisor phylogeny: he has Euler
Number 8 and Leibnitz Number 11: Solomon Marcus
-> Miron Nicolescu -> Paul Montel -> Emil
Borel and Henri Lebesque -> Gaston Darboux -> Michel
Chasles -> Simeon Poisson -> Joseph Lagrange
-> Leonard Euler -> Johann Bernoulli -> Jacob
Bernoulli -> Gottfied Leibniz
CCMB
Lecture Series |
T.M.
Murali
Virginia Polytechnic Institute and State University
Nework Legos: Building
Blocks of Cellular Wiring Diagrams |
|
Molecular interaction networks generated by high-throughput
whole-genome biological assays are highly intricate
and difficult to interpret. Since cellular functions
are carried out by modules of interacting molecules,
reverse-engineering the modular structure of cellular
interaction networks has the promise of significantly
easing their analysis.
We develop a top-down computational approach to
identify building blocks of molecular interaction
networks by
(i) integrating gene expression measurements for
a particular disease state (e.g., leukaemia) or
experimental condition (e.g., treatment with growth
serum) with molecular interactions to reveal an
active network, which is the network of interactions
perturbed in the cell in that disease state or
condition and
(ii) systematically combining active networks computed
for different experimental conditions using set-theoretic
formulae to reveal network legos, which are modules
of coherently interacting genes and gene products
in the wiring diagram.
We propose methods to compute active networks,
systematically mine candidate legos, assess the
statistical significance of these candidates, arrange
them in a directed acyclic graph (DAG), and exploit
the structure of the DAG to identify true network
legos. We assess the stability of our computations
to changes in the input and our ability to recover
active networks by composing network legos.
We analyse two human datasets using our method.
A comparison of three leukaemias demonstrates how
a biologist can use our system to identify specific
differences between these diseases. A larger-scale
analysis of 13 distinct stresses illustrates our
ability to compute the building blocks of the interaction
networks activated in response to these stresses
and to use these building blocks to identify differences
in the response of fibroblasts and HeLa cells to
endoplasmic reticulum stress.
Friday, September
19th, 2008
12:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Will Fairbrother
Refreshments will be served at 11:45 am
CCMB
Lecture Series |
Daniel
Aalberts
Williams College
Department of Computer Science
Finding with Binding
and Ranking: splicing mRNA |
|
Abstract: Gene
expression is often regulated by the binding of
small RNAs or proteins to messenger RNA, and splicing
of mRNA is one important example. We have developed
physical-chemical models of binding which can be
computed efficiently with our new oligo-binding
algorithm BINDIGO. We have also developed a statistical
method (Primary Sequence Ranking) which outperforms
other tools for identifying splice sites.
Daniel Aalberts is
currently Associate Professor of Physics at Williams
College. He has the distinction of having supervised
two of the past ten Apker Award winners for outstanding
undergraduate physics research. His RNA splicing
research is supported by NIH; RNA pseudoknots,
by NSF. Aalberts received his SB and PhD degrees
from MIT, and did postdoctoral research at Leiden
Univ. in the Netherlands and at the Center for
Studies in Physics and Biology at Rockefeller Univ.
in NYC.
Wednesday,
September 24th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Will Fairbrother
Refreshments will be served at 3:45 pm
CCMB
Lecture Series |
Simon
Kasif
Boston University
Children's Hospital, Boston
Center for Advanced Genomic Technology
Computational Genomics Laboratory
Systems Macro-Biology:
From Parts and Genomes to Network Signatures
of Disease |
|
Abstract: Computer
systems specification, analysis and diagnosis are
arguably one of the most practically important
and essential topics in computer science and engineering.
This research resulted in tools and environments
that allow engineers to specify and analyze hardware
and networks as well as recognize and correct unexpected
behaviors, network intrusion or anomalies.
Medicine and biology, on surface present similar
problems. Our cells are provided with basic instructions
that when executed properly enable living organisms
to function properly. As a result of genetic, epigenetic
or environmental perturbations our cells exhibit
aberrations in their functional behaviors leading
to major diseases such as cancer or diabetes, causing
inordinate suffering for patients and their families.
We focus on insulin signaling and related processes
such as inflammation and glucose metabolism. We
describe our on-going projects aimed towards identification
of the full dictionary of parts and their cellular
interactions that are involved in these important
biological functions using both network and evolutionary
approaches. This work leads to better genomic annotation
of newly sequenced genes and a number of novel
predictions. Manipulations of several genes in
these pathways have been shown to extend life in
model organisms or have predicted associations
with diabetes in the human population.
Biology is very complex and it is often difficult
to formalize the entire repertoire of "normal" or "abnormal" clinical
phenotypes of age associated diseases such as Diabetes
or Alzheimer's. We describe the new paradigm of
network signatures of disease that allow us to
recognize anomalies leading to disease at the molecular
network level. Specifically, we introduce several
concepts including Gene Network Enrichment Analysis
(GNEA) and show how it enables biomedical researchers
to identify and confirm disregulated molecular
processes in diabetes and insulin resistance that
elude recognition by standard methods. This work
has the potential to lead to new diagnostic or
prognostic biomarkers as well as new drug targets.
This work describes joint research performed at
Boston University, Harvard Medical School, Joslin
Diabetes Center, Harvard School of Public Health
and the National Center for Biomedical Computing
(I2B2) at Harvard Partners.
Wednesday,
October 22nd, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 3:45 pm
CCMB
Lecture Series |
Roded
Sharan
Tel-Aviv University
School of Computer Science
"A systems-level
approach for mapping the telomere length maintenance
gene circuitry" |
|
Abstract: The
ends of eukaryotic chromosomes are protected by
telomeres, nucleoprotein structures that are essential
for chromosomal stability and integrity. Understanding
how telomere length is controlled has significant
medical implications, especially in the fields
of aging and cancer. Two recent systematic genome-wide
surveys measuring the telomere length of deleted
mutants in the yeast Saccharomyces cerevisiae have
identified hundreds of telomere length maintenance
(TLM) genes, which span a large array of functional
categories and different localizations within the
cell.
In my talk I will describe two recent works on
integrating large-scale screening mutant data with
protein-protein interaction information to rigorously
chart the cellular subnetwork underlying the function
investigated. I will show their application to
the yeast telomere length control data, identifying
pathways that connect the TLM proteins to the telomere-processing
machinery, and predicting new TLM genes and their
effect on telomere length.
Wednesday,
October 29th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
CCMB
Lecture Series |
Sebastien
Roch
Microsoft Research
“Phylogeny
reconstruction: Are distance methods as accurate
as maximum likelihood?” |
|
Abstract: Among
the many popular techniques for reconstructing
evolutionary trees from molecular sequences, distance-matrix
methods such as UPGMA and Neighbor Joining are
known to be the fastest. This speed stems from
a straightforward, intuitive approach: the repeated
agglomeration of the closest clusters of species.
However, unlike more elaborate techniques such
as maximum likelihood, distance methods only exploit
correlations between pairs of sequences (a.k.a.
the distance matrix). This limited use of the data
is often cited as a serious weakness, as it is
thought to affect the convergence rate in the large-sample
limit. In this talk, I will discuss recent surprising
results shedding some light on this question.
Bio: Sebastien Roch is a postdoctoral
researcher at Microsoft Research. He earned his
Ph.D. from the University of California, Berkeley
under the guidance of Elchanan Mossel. His research
interests include Markov models on trees, Markov
chains, interacting particle systems, random graphs,
and randomized algorithms -- with an emphasis on
biological applications.
Wednesday,
November 5th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will
be served at 3:45 pm
CCMB
Lecture Series |
Jerome Waldispuhl
and Charles W. O'Donnell
MIT
“Modeling
Structural Ensembles of Transmembrane Proteins
and Beyond” |
|
Abstract: Computational
protein structure modeling plays an important role
in molecular biological research. In addition to
well-established algorithms for the interpretation
of experimental data (such as X-ray crystal diffraction
and NMR), homology-based protein structure prediction
tools have become accurate enough to significantly
contribute to the understanding of a protein's
structure, function, and interactions. Unfortunately,
many protein families, such as transmembrane beta-barrels
(found in the outer membrane of Gram-negative bacteria,
mitochondria, and chloroplasts), are experimentally
difficult to study using crystallography or NMR,
and few homologues are fully characterized, rendering
existing methods insufficient. In this talk, Jerome
Waldispuhl and Charles O'Donnell introduce a new
family of algorithms, implemented as the tool "partiFold," for
investigating the folding landscape of transmembrane
beta-barrel proteins based only on sequence information,
broad investigator knowledge, and a statistical-mechanical
approach using the Boltzmann partition function.
This provides predictions of all possible structural
conformations that might arise in-vivo, along with
their relative likelihood of occurrence. Using
a parameterizable grammatical model, these algorithms
incorporate high-level information, such as membrane
thickness, with an energy function based on stacked
amino-acid pair statistical potentials to predicted
ensemble properties, such as the likelihood of
two residues pairing in a beta-sheet, or the per-residue
X-ray crystal structure B-value. Complete conformations
can also be sampled from the ensemble, providing
a good picture of the subset of low-energy structures
[1]. This framework has also been extended in more
recent work to combine these same ensemble prediction
with classical sequence alignment algorithms to
obtain high-quality alignments for non-homologous
transmembrane beta-barrel protein pairs [2]. To
conclude the talk, ongoing research is presented
which generalizes this methodology to incorporate
more expansive sets of beta-sheet forming proteins,
such as amyloid fibrils and prions. By broadening
the grammatical models, and incorporating additional
energetic functions and features, this work better
incorporates experimentalist knowledge, and provides
more tangible hypotheses that can help guide experimentation
in a semi-automated manner.
Thursday,
November 6th, 2008
1:30 pm
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 1:15 pm
CCMB
Lecture Series |
Christine Heitsch,
Ph.D.
Georgia Institute of Technology
“Analysis, prediction,
and design of viral RNA secondary structures” |
|
Abstract: Understanding
how biological sequences encode structural and
functional information is a fundamental scientific
challenge. For RNA viral genomes, the information
encoded in the sequence extends well-beyond their
protein coding role to the role of intra-sequence
base pairing in viral packaging, replication, and
gene expression. Working with the Pariacoto virus
as a model sequence, we investigate the compatibility
of predicted base pairings with the dodecahedral
cage known from crystallographic studies. To build
a putative secondary structure, we first analyze
different possible configurations using a combinatorial
model of RNA folding. We give results
on the trade-offs among types of loop structures,
the asymptotic degree of branching in typical configurations,
and the characteristics of stems in "well-determined" substructures.
These mathematical results yield insights into
the interaction of local and global constraints
in RNA secondary structures, and suggest new directions
in understanding the folding of RNA viral genomes.
Wednesday,
November 12th,
2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Benjamin
Raphael
Refreshments will
be served at 3:45 pm
CCMB
Lecture Series |
Iuliana Ionita-Laza
Harvard University
Department of Biostatistics
“Estimating the number of unseen genetic variants in the human genome: a capture-recapture approach” |
|
Abstract: The various genetic variation discovery projects (The SNP Consortium, The HapMap, The 1000 Genomes Projects etc.) aim to identify as much as possible of the underlying genetic variation in different human populations. The question we address in this talk is how many new (not yet seen) genetic variants are yet to be found.
We regard this question as an instance of the species problem in ecology, where the goal is to estimate the number of unseen species in a closed population. Using a parametric empirical Bayes model, we propose a method to estimate the number of new variants with a desired minimum frequency to be discovered in future studies, based on observed sequence data for a small number of individuals. The approach can also be used to predict the number of individuals necessary to sequence in order to capture all (or a fraction of) the variation with a specified minimum frequency.
We show results based on sequence data from four human populations, and discuss applications of these methods in the context of disease association studies with rare variants.
4:30pm, Thursday, Feb. 12th 2009
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 4:15 pm
CCMB
Lecture Series |
Elhanan Borenstein
Postdoctoral Fellow
Stanford University and Santa Fe Institute
“Reverse Ecology: From Large-Scale Analysis of Metabolic Networks, Growth Environments and Seed Sets to Species Interaction and Metagenomics” |
|
Abstract: The topology of metabolic networks may provide important insights not only into the metabolic capacity of species, but also into the habitats in which they evolved. In this talk I will present several analyses of metabolic networks and show how various ecological insights can be obtained from genomic-based data.
I will first introduce various factors that affect the structure of metabolic networks, and specifically the environmental and genetic determinants that affect network modularity. I will then present the first large-scale computational reconstruction of metabolic growth environments, analyzing the metabolic networks of hundreds of species and using a graph-theory based algorithm to identify for each species a set of seed compounds that must be exogenously acquired. Such seed sets form ecological "interfaces" between metabolic networks and their surroundings, approximating the effective biochemical environment of each species. The seed sets' composition significantly correlates with several properties characterizing the species' environments and agrees with biological observations concerning major adaptations. Computational reconstruction of metabolic networks of ancestral species and phylogenetic analysis of the seed sets reveal the complex dynamics governing gain and loss of biosynthetic capacity across the phylogenetic tree.
I will further present an extension of this framework, accounting for interactions between species, by introducing a pair-wise, topology-based measure of biosynthetic support, which reflects the extent to which the nutritional requirements of one species could be satisfied by the biosynthetic capacity of another. I will show that this measure is aligned with host-parasite interactions and facilitates successful prediction of such interactions on a large-scale.
Finally, I will discuss the application of this approach to the analysis of microbial communities and metagenomic data of the human microbiota and outline future research directions; The "reverse ecology" approach demonstrated in these analyses lays the foundations for further studying the complex web of interactions characterizing various ecosystems and the evolutionary interplay between organisms and their habitats on a large scale.
4:00pm, Wednesday, Feb. 18th 2009
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 3:45 pm
CCMB
Lecture Series |
Sohini Ramachandran
Harvard University
“The spatial distribution of human genetic variation across continents and chromosomes” |
|
Abstract: Theoretical population genetics offers us the ability to make inferences about past evolutionary forces from genetic variation observed in natural populations. However, the models we apply to data often make very simplistic assumptions about both demographic processes and the strength of natural selection over time. Such assumptions can lead to spurious conclusions about the likelihood of past evolutionary events.
I will discuss two spatial aspects of human genetic variation: the geographic distribution of genetic variation, and genetic differences within the genome due to chromosomal location on either the X chromosome or the autosomes. These studies highlight the role human demographic history has played in shaping the distribution of human genetic variability, and lead to interesting questions about the appropriate spatial and temporal scales at which we might study the signatures of evolutionary forces in the genomic era.
4:00pm, Wednesday, Feb. 25th 2009
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 3:45 pm
CCMB
Lecture Series |
Ting Wang
University of California, Santa Cruz
“Mobile elements shape transcriptional regulatory networks and the UCSC Cancer Genomics Browser” |
|
Abstract: Mobile elements shape transcriptional regulatory networks
The evolutionary forces that establish and hone the network of target genes for transcription factors are largely unknown. We report here new evidence that mobile element mediated transcription regulation is an ongoing process. Recent, species-specific mobile elements, in particular, endogenous retroviruses (ERVs), can actively shape transcriptional networks in a species-specific manner.
Using tumor suppressor and transcription factor p53 as an example, we show that its binding sites are highly enriched in Long Terminal Repeats (LTRs) of a few ERV subfamilies active only in primates. These p53 site-containing LTRs are in vivo binding sites for p53 and account for more than 30% of p53 genomic sites. Experimental validation confirmed that LTRs with a p53 site possess p53-dependent regulatory potential and regulate nearby gene expression. Similar phenomenon is observed for transcription factor Stat1.
Our study indicates that invaluable treasures are hidden in the “junk” DNA, that these once foreign genetic materials are important evolutionary forces that impact our genome in a systematic manner, and that the connections one can make between mis-regulated mobile elements and human diseases provide a new avenue for disease investigation.
The UCSC Cancer Genomics Browser
We describe the UCSC Cancer Genomics Browser, a suite of web-based tools to integrate, visualize and analyze cancer genomics and clinical data. This browser displays a whole-genome-oriented view of genome- wide experimental measurements for individual and sets of samples alongside their associated clinical information. The browser also enables investigators to order, filter, aggregate, classify and display data interactively based on any given feature set including clinical features, annotated biological pathways, and user-edited collections of genes. Standard statistical tools are integrated to provide quantitative analysis of whole genomic data or any of its subsets. We demonstrate the capability of this browser with data from several published large cancer genomics studies. The browser is also being used on confidential, prepublication data by multiple groups.
This browser is an extension of the UCSC Genome Browser; thus it inherits and integrates the Genome Browser’s existing rich set of human biology and genetics data to enhance the interpretability of the cancer genomics data.
4:00pm, Wednesday, March. 25th 2009
CIT Bldg, Room 241, SWIG Boardroom
Refreshments will
be served at 3:45 pm
Chalk Talk: Noon Thursday, March 26th 2009
CIT Bldg, Room 241, SWIG Boardroom
_______________________________________________________ Events
|