DarkHorse HGT Candidate Resource

A database of phylogenetically atypical microbial proteins

Home Tutorial Download

What is DarkHorse?

DarkHorse is a bioinformatic algorithm for rapid, automated identification and ranking of phylogenetically atypical proteins within assembled genomic or metagenomic data sets. It works by selecting taxonomically classified potential ortholog matches from a reference database of amino acid sequences, then using these matches to calculate a lineage probability index (LPI) score for each unknown query protein.

LPI scores are inversely proportional to the phylogenetic distance between query sequences and database matches. DarkHorse is configurable for varying degrees of phylogenetic granularity and protein sequence conservation, enabling evaluation of historical ages associated with putative HGT events. Because the program does not require exact matches, it is particularly useful for classifying sequences from novel taxa with only distant database relatives. Users should consult the references cited below for a complete explanation of parameter selection and result interpretation. A brief tutorial page is also available online.

Although significantly more computational resources (e.g. 128 GB RAM) are now required to run the software than when it was first introduced in 2007, this program is still quite useful for taxonomic classification of metagenomic contigs from novel organisms and identification of horizontal gene transfer candidates in large eukaryotic genomes - see recent references on GoogleScholar.

Download Program

A stand-alone, unix command-line version of DarkHorse version 2 is available for local installation on GitHub.

DarkHorse version 2 allows for the incorporation of custom, non-NCBI data into DarkHorse reference databases, including private, unpublished sequences. The installation procedure has been revised to accommodate extremely large input files more efficiently and provide a set of informative reference sequences guaranteed to match the database for use in BLASTP searches. Optional tools are provided for dividing these validated reference search sequences into smaller, taxonomically focused subsets.

Database Search

Unfortunately, pre-calculating LPI scores is no longer practical for the large numbers of microbial genomes currently available, so search functions have been discontinued on this website.


  1. Podell, S and Gaasterland, T (2007). DarkHorse: A method for genome-wide prediction of horizontal gene transfer. Genome Biology 8(2):R16

  2. Podell, S Gaasterland, T, and Allen, EE (2008). A database of phylogentically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm. BMC Bioinformatics 9:419

  3. Examples of recent citations on GoogleScholar .

Those desiring to incorporate the DarkHorse algorithm, software, associated HGT candidate database, or information downloaded from the database into commercial products, or to use any of these materials for commercial purposes, should contact Technology Transfer & Intellectual Property Services, University of California, San Diego, 9500 Gilman Drive, Mail Code 0910, La Jolla, CA 92093-0910, Ph: (858) 534-5815, E-MAIL: invent at ucsd.edu.