Loading

DELISHUS: Deletion Inference from SNP Data

Input Files

DELISHUS takes two input files that are very similar to the input of PLINK.

The MAP file is exactly the same as in PLINK. It is a tab delimited file where each line represents a SNP with columns 1-3 representing the chromosome, SNP ID, and position respectively. The genetic distance column may be present, but is not used by DELISHUS. The order of the SNPs as they are presented in the MAP file must match the PED file and they must be in chromosome-position sorted order.

The PED file is a tab delimited file with 7 columns. The first six columns are the same as in PLINK input: family ID, personal ID, father ID, mother ID, sex, and phenotype. The phenotype of affected individuals is encoded by 2, unaffected by 1, and missing data by 0. The seventh column is not encoded the same. The biallelic genotypes are encoded with 0 1 2 and 3 corresponding to homozygous for the major allele, homozygous for the minor allele, heterozygous, and missing data. There is no whitespace between SNP alleles in the genotypes, e.g. 010200312.

If you experience problems running the software please contact Derek_Aguiar at brown dot edu.

Output Files

The output files are written to the directory specified in the -o option. They are as follows:

lohMap*png
A heat map of deletions. Each row in this image represents a parent(s)-child pair or trio and each columns is a inheritance pattern for a SNP in the input PED file. Each pixel is colored black, red, or white. The black pixels indicate that there is no deletion detected for the trio-SNP. Red pixels are strong evidence for a deletion at the trio-SNP and white pixels are regions that do not exhibit strong evidence for a deletion but may contain a deletion (lack of sufficient evidence to commit to a deletion). Deleted intervals are then represented by red and white intervals where the amount of red pixels are proportional to the strength of the deletion call.
deletion_list_threshold_[threshold].txt
A tab delimited list of deletions. Each line is a deletion call with columns:
column 1: chromosome
column 2: starting position, where this is the first evidence of deletion site in the clique.. this is often a conservative start because the actual deletion may indeed start before this position
column 3: stop position, same caveat as start
column 4: number of snps spanned by deletion
column 5: total number of evidence of deletion sites in deletion
column 6: total number of trios/pairs in deletion
column 7: the number of non evidence of deletion Mendelian errors... this column may be used for QC. If the number of non-evidence of a deletion mendelian errors is high, then the SNP might just be poorly genotyped.
column 8: currently not used on most input data
column 9: the sib-tdt p-value for this deletion. Only valid for families with at least one affected and one unaffected child.
column 10: the number of deletions transmitted from the father
column 11: the number of deletions transmitted from the mother
column 12-15: the number of affected with a deletion, affected without a deletion, unaffected with a deletion, and unaffected with no deletion within the families called for deletions. These columns only have context for multiplex families and I'm not sure they are enabled for this build of the algorithm.
column 16: four comma separated lists each separated by semi-colons. The lists are: children with a deletion; parents with a deletion; children without a deletion; and parents without a deletion. These lists only include families with at least one child deleted.

This material is based upon work supported by the National Science Foundation under Grant Number 1048831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.