Loading
Science Vol. 291 Number 5507

The Human Genome

Science Vol. 291 Number 5507

Genome Assemblies

Science Vol. 291 Number 5507

The Sea Urchin Genome

Science Vol. 291 Number 5507

cis-regulatory code

Welcome to the Istrail Laboratory!

The Istrail Lab is a computational biology and computer science research group in the Department of Computer Science and the Center for Computational Molecular Biology at Brown University. Sorin Istrail is the Julie Nguyen Brown Professor of Computational and Mathematical Sciences and Professor of Computer Science at Brown University.

News

October 2013

The alpha release for IBD inference software Tractatus is available for download from the Software page.

May 2013

Doug McErlean's Honor Thesis "One Constraint to Rule Them All: How to simplify optimizations under constant variable sum, with applications to maximum likelihood" ranks top among the 2013 computer science department honor theses. Doug graduated this May with Magna cum Laude and Honors. read more

July 27, 2012

"Researchers from Brown University have developed a method that they say can generate more accurate haplotype assemblies for genome-wide and whole-exome studies than current methods." Read the full story here or here.

Bioinformatics Workflows & Tools

Haplotype Analysis

Workflow: Infer haplotypes from sequence reads or genotypes

Software tool: The HapCompass alpha version haplotype assembly software and DELISHUS alpha version deletion inference in genotypes software are available for download.

Genome-wide association studies

Workflow: Infer deletions in genome-wide SNP data

Software tool: The DELISHUS software tool implements an algorithm for inferring genomic deletions in SNP genotype data of pairs or trios. After deletions are inferred in genome-wide data, each deletion can be tested for association to disease by using family-based statistical tests.

Regulatory genomics

Workflow: Analysis of regulatory genomics data

Software tool: The cis-Browser, cis-Lexicon, and CLOSE software tools provide a genomics workbench to analyze carefully annotated cis-regulatory data from several species and the relevant data-mined literature.

Research Problems

Importance of haplotypes reconstruction algorithms

Phasing, Imputation, Assembly, GWAS, Diploid, Haploid, Metagenomics, Epigenomics, HIV patient metagenome.

"Improving data quality is crucial, because if a human genome cannot be independently assembled then the sequence data cannot be sorted into the two sets of parental chromosomes, or haplotypes. This process haplotype phasing will become one of the most useful tools in genomic medicine. Establishing the complete set of genetic information that we received from each parent is crucial to understanding the links between heritability, gene function, regulatory sequences and our predisposition to disease." J. C. Venter, "Multiple personal genomes await" Nature, April 2010

The regulatory genome and the computer

The definitive feature of the many thousand cis-regulatory control modules in an animal genome is their information processing capability. In the “genomic computer” intra-machine communication occurs by means of diffusion (of transcription factors), while in electronic computers it occurs by electron transit along pre-organized wires. There follow fundamental differences in design principle in respect to the meaning of time, speed, multiplicity of processors, memory, robustness of computation and hardware and software. The genomic computer controls spatial gene expression in the development of the body plan, and its appearance in remote evolutionary time must be considered to have been a founding requirement for animal grade life.

Genomics of disease: genetic heterogeneity and missing heritability

The genetic architecture of a disorder, i.e., the number and frequency of susceptibility alleles, is shaped by evolution. The majority of variants detected by Genome-Wide Association Studies (GWAS) have no demonstrated biological significance.

"The general failure to confirm common risk variants is not due to a failure to carry out GWAS properly. The problem is underlying biology, not the operationalization of study design. The common disease—common variant model has been the primary focus of human genomics over the last decade. .. If common alleles influenced common diseases, many would have been found by now. The issue is not how to develop still larger studies, or how to parse the data still further, but rather whether the common disease—common variant hypothesis has now been tested and found not to apply to most complex human diseases." Jon McClellan, and Mary-Claire King, "Genetic Heterogeneity in Human Disease" Cell, 2010

Protein folding: social life of amino acids in protein structures

The search for combinatorial algorithms for lattice protein folding that construct the lowest energy fold with mathematical guaranteed error bounds are illuminating the elusive structure of optimal folds. Although for almost every model, protein folding was proved NP-complete, the search for the optimal algorithm that would find the optimum fold on lattice proteins with real protein sequences from PDB is under way.

A major unresolved problem is the protein energy function inference. We used methods from Economics and Political Science, namely, Voting Theory, to infer from individual preferences of amino acids in PDB protein structures, the social choice postulated by the thermodynamic hypothesis, namely the existence of a universal energy function.