All mRNAs of a species from GenBank were aligned to the genome using Blastz. mRNAs that aligned twice in the genome (once with introns and once without introns) were initially screened. Next, a series of features were scored to determine candidates for retrotranspostion events. These features include position and length of the polyA tail, degree of synteny with mouse, coverage of repetitive elements, number of exons that can still be aligned to the retroGene and degree of divergence from the parent gene. These features are combined and scored against a training set of known pseudogenes using AdaBoost. RetroGenes in the final set have an AdaBoost confidence above 50%, less than 50% overlap with simple repeats, greater than 65% identity and a blastz alignment axtScore greater than 10000.
The "type" field has four possible values:
Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D (2003). Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20):11484-11489 Sep 30 2003.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W (2003). Human-Mouse Alignments with BLASTZ. Genome Res. 13(1):103-7.
Robert E. Schapire. Theoretical views of boosting and applications. In Tenth International Conference on Algorithmic Learning Theory, 1999.
The pseudoMasker program and browser track were developed by Robert Baertsch at UCSC.