DarkHorse HGT Candidate Resource

A database of phylogenetically atypical microbial proteins

Home Tutorial Search Download

What is DarkHorse?

DarkHorse is a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis. It works by selecting potential ortholog matches from a reference database of amino acid sequences, then using these matches to calculate a lineage probability index (LPI) score for each genome protein.

LPI scores are inversely proportional to the phylogenetic distance between database match sequences and the query genome. These scores are useful not only for large-scale de novo predictions of horizontally transferred proteins, but can also serve as an independent quality control test for potential horizontal transfer candidates identified by alternative methods, especially those based on nucleic acid signatures. Candidates having high LPI scores are unlikely to have been horizontally transferred, since they are highly conserved among closely related organisms.

One unique and powerful feature of the DarkHorse HGT Candidate database is the opportunity to explore the phylogenetic background of potential HGT donors as well as recipients. The breadth of the database allows not only query sequences, but also their database match partners to be evaluated for sequence similarity or novelty compared to taxonomically related organisms.

DarkHorse is configurable for varying degrees of phylogenetic granularity and protein sequence conservation. Users should consult the references cited below for a complete explanation of parameter selection and result interpretation. A brief tutorial page is also available on-line.

  • Search Database

    The DarkHorse database has been expanded to include more genomes!

    Pre-calculated DarkHorse results are now available for 1456 bacterial and archaeal genomes, updated April 1, 2009. The database can be queried by genome name, annotation keywords, protein sequence, or LPI score. Results are returned on-screen in summary form, with the option to download a detailed, tab-delimited file of raw data.
  • Download Program

    DarkHorse version 2.0 (beta) is now available!

    DarkHorse2 is compatible with recent format changes in sequence identifiers for NCBI Genbank nr, and now allows the incorporation of custom, non-NCBI data into DarkHorse reference databases, including private, unpublished sequences. The installation procedure has been revised to accommodate extremely large input files more efficiently and provide a set of informative reference sequences guaranteed to match the database for use in BLASTP searches. Optional tools are available for dividing these validated reference search sequences into smaller, taxonomically focused subsets.


  1. Podell, S and Gaasterland, T (2007). DarkHorse: A method for genome-wide prediction of horizontal gene transfer. Genome Biology 8(2):R16
  2. Podell, S Gaasterland, T, and Allen, EE (2008). A database of phylogentically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm. BMC Bioinformatics 9:419

Those desiring to incorporate the DarkHorse algorithm, software, associated HGT candidate database, or information downloaded from the database into commercial products, or to use any of these materials for commercial purposes, should contact Technology Transfer & Intellectual Property Services, University of California, San Diego, 9500 Gilman Drive, Mail Code 0910, La Jolla, CA 92093-0910, Ph: (858) 534-5815, E-MAIL: invent at ucsd.edu.