What is DarkHorse?

DarkHorse is a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis. It works by selecting potential ortholog matches from a reference database of amino acid sequences, then using these matches to calculate a lineage probability index (LPI) score for each genome protein.

LPI scores are inversely proportional to the phylogenetic distance between database match sequences and the query genome. These scores are useful not only for large-scale de novo predictions of horizontally transferred proteins, but can also serve as an independent quality control test for potential horizontal transfer candidates identified by alternative methods, especially those based on nucleic acid signatures. Candidates having high LPI scores are unlikely to have been horizontally transferred, since they are highly conserved among closely related organisms.

One unique and powerful feature of the DarkHorse HGT Candidate database is the opportunity to explore the phylogenetic background of potential HGT donors as well as recipients. The breadth of the database allows not only query sequences, but also their database match partners to be evaluated for sequence similarity or novelty compared to taxonomically related organisms.

DarkHorse is configurable for varying degrees of phylogenetic granularity and protein sequence conservation. Users should consult the references cited below for a complete explanation of parameter selection and result interpretation. A brief tutorial page is also available on-line.

  Search Database

    The DarkHorse database has been expanded to include more genomes!

    Pre-calculated DarkHorse results are now available for 1456 bacterial and archaeal genomes, updated April 1, 2009. The database can be queried by genome name, annotation keywords, protein sequence, or LPI score. Results are returned on-screen in summary form, with the option to download a detailed, tab-delimited file of raw data.
  Download Program

    DarkHorse2 is coming soon!

    DarkHorse is currently being revised to accomodate the new format of NCBI Genbank nr sequence sequence identifiers. DarkHorse2 will also offer the option of using alternative, non Genbank sequences as reference data sets (including private, unpublished sequences), and provide a faster, more efficient database installation process. Anticipated release date is mid-December 2016. Thanks for your patience!


  1. Podell, S and Gaasterland, T (2007). DarkHorse: A method for genome-wide prediction of horizontal gene transfer. Genome Biology 8(2):R16
  2. Podell, S Gaasterland, T, and Allen, EE (2008). A database of phylogentically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm. BMC Bioinformatics 9:419

Those desiring to incorporate the DarkHorse algorithm, software, associated HGT candidate database, or information downloaded from the database into commercial products, or to use any of these materials for commercial purposes, should contact Technology Transfer & Intellectual Property Services, University of California, San Diego, 9500 Gilman Drive, Mail Code 0910, La Jolla, CA 92093-0910, Ph: (858) 534-5815, E-MAIL: invent at ucsd.edu.