Loading
Home > Lab Software Tools > HapCompass

HapCompass: A Cycle-Basis Algorithm for Haplotype Assembly

Download algorithm implementation
hapcompass v0.7.7 [zip] [tgz]

Download documentation [pdf] [html]

Note: HapCompass requires Java 1.6 or higher.

We are currently investigating an issue where HapCompass could produce better results for k>2 ploidy.

New in version 0.7.7 - June 14, 2014

HapCompass will now compute 2 SNP haplotypes instead of outputting "Phasing not computed for this block"

A new option now allows users to input Fragment files instead of BAM/SAM. See manual for documentation.

New in version 0.7.6 - June 4, 2014

Changed default number of iterations for diploid to 10. Change with -z (see manual)

New in version 0.7.5 - March 10, 2014

Faster polyploid algorithm

New in version 0.7.1 - Feb. 24, 2014

HapCompass can now correctly handles variants with more than 2 alleles in the polyploid case. The diploid algorithm still expects at most 2 alleles per variant site.

New in version 0.7 - Feb. 23, 2014

HapCompass can now handle large ploidys, orders of magnitude faster than older versions.

See previous version notes.

Related HapCompass utilities

hc2vcf.jar v0.2: converts a phased haplotype output file from HapCompass to VCF file format.

Input: the output phased haplotype file from HapCompass, the input VCF file, and a ploidy

java -jar hc2vcf.jar [solution_file] [input_vcf_file] [ploidy]

Example usage: java -jar hc2vcf.jar hc_MWER_solution.txt input.vcf 2

Output: The program will produce a file named [solution_file].vcf

HAPCOMPASS: A fast cycle basis algorithm for accurate haplotype assembly of sequence data

HapCompass for polyploid genomes can currently be used to create accurate pairwise SNP phasings. However, it currently only produces an entire haplotype assembly that is consistent with the data. We are currently working on modifying the algorithm to produce the best haplotype assembly given the resolved compass graph. File formats for representing polyploid sequences are not standardized so if you wish to run HapCompass on polyploid genomes please contact us (derek_aguiar at brown dot edu).

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. But haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current methods of determining haplotype phase from sequence data – known as haplotype assembly – have difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic using highthroughput sequencing technologies.

We present a novel algorithm, HAPCOMPASS, for haplotype assembly of densely sequenced human genome data. The HAPCOMPASS algorithm operates on a graph where SNPs are nodes and edges are defined by the sequencing reads and viewed as supporting evidence of co-occuring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees and each spanning tree uniquely defines a cycle basis. We define a global optimization on this graph and translate it into local optimization moves of the cycle basis using rules for resolving conflicting evidence. We estimate the amount of sequencing required to produce a complete haplotype assembly of a chromosome. Using metrics borrowed from genome assembly and haplotype phasing, we compare the accuracy of HAPCOMPASS, the Genome Analysis ToolKit, and HapCut for 1000 genomes and simulated data. We show that HAPCOMPASS performs significantly better for a variety of data and metrics.

This material is based upon work supported by the National Science Foundation under Grant Number 1048831 and 1321000. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.