Science Vol. 291 Number 5507

The Human Genome

Science Vol. 291 Number 5507


Science Vol. 291 Number 5507

The Sea Urchin Genome

Science Vol. 291 Number 5507

cis-regulatory genomics

Welcome to the Istrail Laboratory !

The Istrail Lab is a computational biology and computer science research group in the Department of Computer Science at Brown University. It is part of the Center for Computational Molecular Biology at Brown University. Sorin Istrail is the Julie Nguyen Brown Professor of Computational and Mathematical Sciences and Professor of Computer Science at Brown University.

Sorin is teaching two courses this Fall 2017

CSCI1810: "Computational Molecular Biology"

The aim of this course is to provide an introduction to Computational Molecular Biology. The course is organized into five chapters: Sequence Alignment, Combinatorial Pattern Matching, Phylogenetics Trees, Hidden Markov Models, and Genome Assembly. Each chapter is devoted to a class of basic computational problems related to the analysis of DNA, RNA and protein sequences and their molecular biology function. Our journey in each chapter is driven by a set of most beautiful algorithms. "Beautiful" algorithm here refers to an algorithm that is rigorous, practical and with ellegant simplicity that makes it also easy to implement.

New ressearch problems 2017

  • Hyperbolic Graph Theory and Hyperbolic Groups

  • Genomic Privacy Algorithms

  • Maximum Likelihood and Graph Theoretic Symmetries of the Likelihood Functions

  • Algorithmic Strategies for Somatic Mosaicism Haplotype Reconstruction

  • Bioinformatics Workflows & Tools

    Haplotype Analysis

    Workflow: Infer haplotypes from sequence reads or genotypes

    Software tool: The HapCompass alpha version haplotype assembly software and DELISHUS alpha version deletion inference in genotypes software are available for download.

    The alpha release for IBD inference software Tractatus is available for download from the Software page.

    Genome-wide association studies

    Workflow: Infer deletions in genome-wide SNP data

    Software tool: The DELISHUS software tool implements an algorithm for inferring genomic deletions in SNP genotype data of pairs or trios. After deletions are inferred in genome-wide data, each deletion can be tested for association to disease by using family-based statistical tests.

    Regulatory genomics

    Workflow: Analysis of regulatory genomics data

    Software tool: The cis-Browser, cis-Lexicon, and CLOSE software tools provide a genomics workbench to analyze carefully annotated cis-regulatory data from several species and the relevant data-mined literature.

    Research Problems

    Algorithms for Haplotypes Inference and Analysis

    Phasing, Imputation, Assembly, GWAS, Diploid, Haploid, Metagenomics, Epigenomics, HIV patient metagenome.

    "Improving data quality is crucial, because if a human genome cannot be independently assembled then the sequence data cannot be sorted into the two sets of parental chromosomes, or haplotypes. This process haplotype phasing will become one of the most useful tools in genomic medicine. Establishing the complete set of genetic information that we received from each parent is crucial to understanding the links between heritability, gene function, regulatory sequences and our predisposition to disease." J. C. Venter, "Multiple personal genomes await" Nature, April 2010

    The Regulatory Genome and the Computer

    The definitive feature of the many thousand cis-regulatory control modules in an animal genome is their information processing capability. In the “genomic computer” intra-machine communication occurs by means of diffusion (of transcription factors), while in electronic computers it occurs by electron transit along pre-organized wires. There follow fundamental differences in design principle in respect to the meaning of time, speed, multiplicity of processors, memory, robustness of computation and hardware and software. The genomic computer controls spatial gene expression in the development of the body plan, and its appearance in remote evolutionary time must be considered to have been a founding requirement for animal grade life.

    Genomics and Disease: Genetic Heterogeneity and the Missing Heritability Problem

    The genetic architecture of a disorder, i.e., the number and frequency of susceptibility alleles, is shaped by evolution. The majority of variants detected by Genome-Wide Association Studies (GWAS) have no demonstrated biological significance.

    "The general failure to confirm common risk variants is not due to a failure to carry out GWAS properly. The problem is underlying biology, not the operationalization of study design. The common disease—common variant model has been the primary focus of human genomics over the last decade. .. If common alleles influenced common diseases, many would have been found by now. The issue is not how to develop still larger studies, or how to parse the data still further, but rather whether the common disease—common variant hypothesis has now been tested and found not to apply to most complex human diseases." Jon McClellan, and Mary-Claire King, "Genetic Heterogeneity in Human Disease" Cell, 2010

    Protein Folding Algorithms

    The search for combinatorial algorithms for lattice protein folding that construct the lowest energy fold with mathematical guaranteed error bounds are illuminating the elusive structure of optimal folds. Although for almost every model, protein folding was proved NP-complete, the search for the optimal algorithm that would find the optimum fold on lattice proteins with real protein sequences from PDB is under way.

    A major unresolved problem is the protein energy function inference. We used methods from Economics and Political Science, namely, Voting Theory, to infer from individual preferences of amino acids in PDB protein structures, the social choice postulated by the thermodynamic hypothesis, namely the existence of a universal energy function.

    Universal Traversal Sequences and the L = NL Problem

    Statistical Mechanics, Three-Dimensionality and NP-completeness

    Computational Complexity of Models in Physical and Biological Sciences