Brown University Center for Computational Molecular Biology

Events

CCMB Seminar Series 2005-2006

_______________________________________________________ Events

Center for Computational Molecular Biology Seminar Series

Luis E. Ortiz, Postdoctoral Lecturer
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Computer Science and Artificial Intelligence Laboratory (CSAIL)

Game Theory, Biology and the DNA Binding Game

Abstract:
We propose a game-theoretic approach to learn and predict coordinate binding of multiple DNA binding regulators. The framework implements resource constrained allocation of proteins to local neighborhoods as well as to sites themselves, and explicates coordinate and competitive binding relations among proteins with affinity to the site or region. Our model permits us to make numerical predictions genome-wide under different perturbations

This talk will emphasize the mathematical and computational foundations of the new modeling approach. I will start by formally presenting our proposed model: the DNA Binding game. I will establish its ability to make predictions under any perturbations by showing that an equilibrium exists in any instantiation of the game. I will present in some detail a simple iterative algorithm that monotonically converges to an equilibrium of the game (thus providing a constructive proof of existence). Time permitting, I will show a small-scale illustration of our approach on a well-known biological subsystem: lambda-phage. I will conclude by briefly discussing work in progress on learning games from data to address large-scale biological problems.

Joint work with Luis Perez-Breva, Chen-Hsiang Yeang and Tommi Jaakkola.

Wednesday, May 17, 2006
4:00 - 5:00 pm
Applied Mathematics Building
182 George Street ~ Room 110

Host: Professor Charles E. Lawrence

_____________________________________________

Eran Halperin
International Computer Science Institute, Berkeley

New Applications of DNA Pools for Disease Association Studies

Abstract:
The recent release of the Haplotype Mapping project (Nature, Oct. 26, 2005), and the rapid reduction in genotyping costs open new directions and opportunities in the study of complex genetic disease such as cancer or Alzheimer's disease. The datasets collected for many of these studies include Single Nucleotide Polymorphisms (SNP) data, which are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome is altered.

Even though technological improvements have recently reduced the genotyping costs considerably, the genotyping burden on disease association studies is still heavy. One technique that may be able to reduce this burden is the use of DNA pools. In DNA pools, the DNA samples of a group of individuals is pooled, and the resulting pool is then genotyped, resulting in a measure of the allele frequency in the pool. In this talk, I will describe new methods that use DNA pools for association studies involving unrelated individuals or mother-father-child trios. I will show how some combinations of DNA pools can reduce the genotyping burden considerably, or alternatively, can serve as "error detecting codes". I will also describe some wet lab experiments that support these results.

Thursday, May 11, 2006
3:00 - 4:00 pm
CIT Bldg, Room 368
115 Waterman Street

Host: Professor Sorin Istrail

_________________________________________________

Russell J. Turner
Johns Hopkins University
Applied Physics Laboratory

Visualizing Comparative Genomics Data

Abstract
Since the sequencing of the human genome in 2000, over 25 large eukaryotic genomes have been assembled. Many of these species are closely related to humans, ranging in evolutionary distance from 5 million (chimpanzee) to 400 million (fish) years. In the current post-genomic era, much of the focus of genomic research has shifted to comparative genomics, the study of the similarities and differences between the entire genomes of related species.
Comparative genomics can not only shed light on the evolutionary relationships among species, but also be used as a tool to annotate genes on newly sequenced genomes by projecting known gene locations from similar species.

Visualizing comparative genomics data presents a challenge due to the complexity, range of scale, and discontinuities in the differences between genomes. These can vary in size from single nucleotide polymorphisms to rearrangements of major portions of entire chromosomes. In this talk, we will present some of the techniques we have developed and experimented with to visualize comparative genomic data at Applied Biosystems Corporation, and discuss their implementation in a visualization tool we have developed called Atavist.

Biography:
Dr. Turner is a Senior Computer Scientist at Johns Hopkins University Applied Physics Laboratory. His research interests include bioinformatics visualization, interactive 2D and 3D graphics, object-oriented software design and 3D character animation. Before working at APL, he was a member of the Informatics Research group at Applied Biosystems and technical lead for development of the Celera Genome Browser at Celera Genomics.

Friday, April 28, 2006
3:00 - 4:00 pm
CIT Building, Room 368

Host: Professor Sorin Istrail

________________________________________

Hagit Shatkay
School of Computing
Queen's University

Hairpins in Bookstacks: Information Retrieval for Biomedical Text Mining

Abstract:
Current advances in high-throughput biology are accompanied by a tremendous increase in the number of related publications. Much biomedical information is reported in the abundant literature. The ability to rapidly and effectively survey the literature can support both the design and the interpretation of large-scale experiments, and the curation of structured biomedical knowledge in public databases. In an effort to meet these goals, a variety of text-mining methods are being applied to the biomedical literature.

This talk will briefly survey such methods, and will focus on two applications in which we use information retrieval, in non-traditional ways, to directly support biomedical discovery.

Tuesday, April 18, 2006
11:00 am
Lubrano Conference Room (CIT 4th floor)


Host: Professor Sorin Istrail

____________________________________________

Speakers:
Ruan, Ph.D. and Chia Lin Wei, Ph.D.
Genome Institute of Singapore

Genome Sequencing After the Human Genome Sequencing

Abstract:
Our primary interest is to elucidate the structures and dynamics of all functional DNA elements in complex genomes through transcriptome characterizations. To facilitate such understanding we have been developing highly efficient and accurate tag-based DNA sequencing and mapping methodologies to characterize transcripts and transcription regulatory elements in the human genome. We are also pushing to apply these technologies to address complex biological questions such as how cancer cells progress and how stem cells maintain their unique properties. Another major interest in our lab is to discover previously uncharacterized viruses and bacteria that reside in body cavities of human. To this end, we have developed a metagenome analysis capability that use shotgun sequencing and genome sequence assembly techniques to uncover genomes from uncultured microorganisms. We are currently focusing on characterizing the microbiota in human gastrointestinal (GI) system.

Tuesday, February 14, 2006
2:00 - 3:00 pm
LMM Room 107
70 Ship Street

Host: Professor Charles E. Lawrence

Computational Analysis of ChIP-chip on Affymetrix Tiled Arrays

Xiaole Shirley Liu
Department of Biostatistics
Harvard School of Public Health

Abstract:
Chromatin immunoprecipitation coupled with DNA microarray analysis (ChIP-chip) has evolved as a popular technique to study the genome level in vivo binding of transcription factors and chromatin remodeling and modifying proteins. Recently genome tiled microarrays have been developed that allow biologists to conduct unbiased genome-wide ChIP-chip experiments in mammalian genomes. However, they also generate massive amounts of data, and pose challenges for the development of effective analysis algorithms. I will present an approach to analyze ChIP-chip on Affymetrix tiled arrays which are the cheapest, yet the most difficult to analyze. The low-level analysis pools data from multiple samples to estimate probe behavior, then uses a hidden Markov model to detect genomic regions bound by the transcription factor. The high-level analysis finds common sequence patterns from regions enriched by the transcription factor ChIP-chip, thus characterizes the binding of the transcription factor and its cooperative binding partners. I will present our analysis of p53 and estrogen receptor ChIP-chip on chr21/22 tiled arrays.

Wednesday, November 2, 2005
4:00 - 5:00 pm
BMC 291

____________________________________________

Structure, Function, and Evolution of Transient and Obligate Protein-protein Interactions

Zhiping Weng
Bioinformatics Program
Biomedical Engineering Department
Boston University

Abstract:
Recent analyses of high-throughput protein interaction data coupled with large-scale investigations of evolutionary properties of interaction networks have left some unanswered questions. To what extent do protein interactions act as constraints during evolution of the protein sequence? How does the type of interaction, specifically transient or obligate, play into these constraints? Are the mutations in the binding site of an interacting protein correlated with mutations in the binding site of its partner? We address these and other questions by relying on a carefully curated dataset of protein complex structures. Results point to the importance of distinguishing between transient and obligate interactions. We conclude that residues in the interfaces of obligate complexes tend to evolve at a relatively slower rate, allowing them to coevolve with their interacting partners. In contrast, the plasticity inherent in transient interactions leads to an increased rate of substitution for the interface residues and leaves little or no evidence of correlated mutations across the interface.

Wednesday, October 26, 2005
2:30 pm
BMC 291

__________________________________________

Space-efficient Whole Genome Comparisons with Burrows-Wheeler Transforms

Ross A. Lippert
Massachusetts Institute of Technology
Department of Mathematics

Abstract:

Many genome-scale search or comparison projects require the creation of a data-structure which supports the efficient location of nucleotide or amino acid words. Such indices can, for example, provide the seeds for genome alignments (to proteins, ESTs, or other genomes) or an initial set of "overlaps" for assembly. These indices tend to be space intensive. For example, the suffix tree, a popular data structure for this purpose requires more than an order of magnitude more space than the original sequence. This requiring at least some part of the project to be run on a "big memory" machine or a cluster of computers, providing a significant obstacle to resource-poor researchers. With a recent data-structure, the compressed suffix array (CSA) implemented via the Burrows-Wheeler transform, we can trade time-efficiency for space-efficiency, taking equal or logarithmically more time, but typically taking up less space than that of the indexed sequence. This is more than an order of magnitude trade between the run time and the memory required. If space is more expensive than time, this is an appropriate approach to consider. I implemented a space-efficient implementation of the CSA on nucleotide data requiring less than 5 bits per nucleotide character to build, and less than 2.5 bits per character, once built. I will present a description of this data structure and how it can used to obtain matches. My implementation was demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures. I will also give rough comparisons to a few publicly available indexing structures.

Wednesday, October 19, 2005
BMC 291

_________________________________________

Structural Analysis of Protein-DNA Complexes: Insights into the Mechanism and Evolution of Transcriptional Regulation

Alexandre Morozov
Rockefeller University
Center for Studies in Physics and Biology

Abstract:
Structural modeling of protein-DNA complexes is complementary to genomic sequence based bioinformatics methods - the two can be used together to understand transcriptional regulatory networks. Using structural analysis, evolution of transcription factor binding sites due to mutations at the protein-DNA binding interface can be characterized. I will demonstrate how genome-wide sequence-structure threading can be used to study the degree of protein-DNA interface conservation across multiple genomes. Focusing on protein-DNA interfaces provides classification of transcription factors by their binding specificity, and allows us to find orthologs and paralogs in related species, complementing existing algorithms based on the overall sequence similarity. When a suitable structural template for modeling a transcription factor is available, transcription factor binding sites and energies can be directly predicted by computational modeling, and compared with experimental data.

Wednesday, October 12, 2005
4:00 pm
BMC 291

______________________________________________

Coding SNPs, Evolution and Disease Phenotype: Genome-wide Bioinformatics Predictions and Expirimental Functional Studies

Paul D. Thomas
Computational Biology
Applied Biosystems

Abstract:
Most human variation is selectively neutral, but a large number of both rare and common allelic variants are associated with human disease. Predicting which allelic variants may be causative for disease is an open problem. Many diseases, both Mendelian and complex, have been associated with single-nucleotide changes (SNPs) that lead to an amino acid substitution in the encoded protein (nonsynonymous SNPs, or nsSNPs). Because of ascertainment bias, nsSNPs may not necessarily be the dominant cause of human disease. Nevertheless, nsSNPs provide an excellent testing ground for using evolutionary analysis to predict the functional effects of genetic variation, as computational methods for inferring selective pressure in protein-coding sequences are well-established. We have applied models of both negative and positive selection. One signature of negative selection is that in groups of related protein sequences, many positions in the protein are “conserved”; for instance, all serine proteases must possess the catalytic serine residue. To quantify this negative selection, we developed a “substitution position-specific evolutionary conservation” (subPSEC) score. We then analyzed a large number of nsSNPs from a number of data sets: “normal” variation, Mendelian disease associated mutations and complex disease associated variation. We find that while Mendelian disease-associated nsSNPs tend to occur at highly conserved positions in proteins, complex disease nsSNPs do not. In contrast, applying a method for estimating positive selection, we show that genes involved in complex disease tend to have relatively large Ka/Ks ratios between human and mouse orthologs, suggesting that measures of recent positive selection may be useful in identifying complex disease-associated genetic variation. In collaboration with Dr. M.R. Hayden and colleagues at UBC we have experimentally and computationally characterized amino acid substitutions in one disease-associated gene, ABCA1, to assess evolutionary prediction methods in detail. The ABCA1 transporter has been implicated in both Mendelian and complex disease. We find that evolutionary conservation is, in most cases, an excellent predictor of functional importance of an amino acid in ABCA1. However, we also find that measures of positive selection are critical for predicting some of the mutational effects.

Wednesday, September 28, 2005
4:00 pm
BMC 291

_______________________________________________________ Events

Brown Homepage Brown University