Home > Sorin Istrail > Teaching

Relevant Course Offerings

Spring 2006-2011: CSCI1950L Algorithmic Foundations of Computational Biology

This course is devoted to computational and statistical methods as well as software tools for DNA, RNA, and protein sequence analysis. The focus is on understanding the algorithmic and mathematical foundations of the methods, the design of the associated genomics tools, as well as on their applications. A comprehensive set of programming assignments provides a hands-on journey for the student into the complexities of real genomic data. These include: basic components of a genome assembler, mapping sets of sequences to the genome, as those generated by high-throughput sequencing like Illumina/Solexa and 454, a BLAST-like search tool, HMM algorithms for gene prediction, suffix trees, motif prediction for transcription factors promoters, and genome mapping of genetic variation of SNPs, haplotypes, and copy number. The course has several unifying themes such as alignment, comparative genomics, protein structure, the newly unveiled role of RNA in the regulatory genome, and the intertwining of statistics and algorithmics in the design of powerful genomic tools.

The course is open to computer and mathematical sciences students as well as biological and medical students. Both advanced undergraduates and graduate students are welcome. Biomed students compensate for programming assignments with comparable work for a final project. Graduate credit is obtained by a final project devoted to a research problem. Two grassroots projects are being built gradually by final projects of students in this class. Genomathica is a library of biologist-friendly-code-tinkering genomic tools written in Mathematica, and Cellarium is a programming language framework for bioinformatics workflows. The instructor taught evolutionary versions of this course in Departments of Biology, Computer Science, and Biochemistry and Cell Biology (in the Medical School).

Fall 2009-2013: CSCI2950L Medical Bioinformatics: Disease Associations, Protein Folding and Immunogenomics

This course is devoted to computational problems and methods in the emerging field of Medical Bioinformatics where genomics, computational biology and bioinformatics impact medical research. We will present challenging problems and solutions in three areas: Disease Associations, Protein Folding and Immunogenomics.

Genome-wide disease association studies (GWAS) present major computational challenges. The goal is to identify inherited genetic variation and its critical role in human disease. Huge datasets containing billions of SNPs, such as the Multiple Sclerosis Consortium GWAS data, will be a subject of our investigations. We will also discuss GWAS analyses for type 2 diabetes (common variants) and mental diseases (rare variants).

The computational “protein folding problem,” the classical grand challenge of biotechnology, can be stated as follows: can we computationally predict the 3-D native structure of a protein from its 1-D amino acid sequence? Lattice models of protein folding, although unrealistic, contain longstanding unsolved combinatorial and algorithmic problems of exceeding difficulty. Alzheimer's disease has been linked to protein (mis)folding.

Do pathogens evolve their proteomes to avoid the surveillance of the human immune system? Killer T-cells, the “special forces” of the human immune system, travel throughout the body and eliminate cells that “display” on their surfaces short pieces of pathogen proteins, called epitopes that are difficult to computationally predict. We will search for epitopes in the immunopeptidomes of H. sapiens, M. musculus, D. melanogaster, HIV, vaccinia, herpesviruses and M. tuberculosis.

This course is open to graduate students and advanced undergraduates with Computational or Life Science backgrounds. Prior background in Biology is not required.

Fall 2007-2008: CSCI2950L Algorithmic Foundations of Computational Biology II

This course focuses on population genetics models, SNPs and haplotypes analysis, and disease associations. It presents the state of the art of the research area after the HapMap Project. The following topics will be covered: basic models of population genetics, linkage disequilibrium (LD), LD measures, LD theory and genetic determinants of disease, empirical state of LD patterns across populations, SNP challenges to genome assembly, haplotype blocks, and block-free methods, haplotype phasing (expectation maximization (EM) algorithms, Clark algorithm, parsimony algorithms, Bayesian methods, perfect phylogeny algorithms), proofs of NP-completeness for the haplotype phasing problem (EM, parsimony, Clark-type parsimony), SNP selection and the minimum informative subset, hypothesis testing and associations, disease associations tests of significance, Sir R.A. Fisher and likelihood, genome-wide association studies for cardiovascular disease, diabetes, and cancer, uses and misuses of tests of statistical significance, sample size and power calculations, haplotypes in association analysis, common disease common variant hypothesis, coalescent theory and the ancestor recombination graph problem.