The primary and secondary structure of selenocysteine tRNAs differ from canonical tRNAs in several respects, most notably an eight base pair acceptor stem, a long variable region arm, and substitutions at several well-conserved base positions. These differences make detection and accurate secondary structure prediction difficult using tRNA search programs geared towards canonical tRNAs. tRNAscan 1.3 fails to detect most selenocysteine tRNAs; the Pavesi algorithm incorporates a separate routine specifically for eukaryotic selenocysteines; and the TRNA2.cm covariance model barely detects selenocysteine tRNAs, giving scores just over the minimum cutoff of 20 bits, and in two cases, below the cutoff. tRNAscan-SE addresses this problem in the first-pass stage using EufindtRNA modifications, and in the second stage using selenocysteine tRNA-specific covariance models.
The first-pass scanner EufindtRNA implements a specialized subroutine described by Pavesi et al. [Pavesi et al., 1994] for identifying eukaryotic selenocysteine tRNAs (based on a B box score with a value between -2.2 and -3.6, and the motif GGTC(C/T)G(G/T)GGT appearing 36 nucleotides upstream of the B box). To similarly identify prokaryotic selenocysteine tRNAs, a subroutine was added to EufindtRNA which detects tRNAs with B box scores between -2.2 and -4.9, and a conserved sequence motif found in the anticodon loop of all known prokaryotic selenocysteine tRNAs (anticodon in bold): GG(A/T)(C/T)TTCAAA(A/T)CC. It is unclear if this motif will generalize well for new selenocysteine tRNAs, but it is conserved among the closely related Escherichia coli (Y00299), Proteus vulgaris (X14255), Haemophilus influenzae (U32753), and Desulfomicrobium baculatus (X75790) tRNAs, and in the more distant Clostridium thermoaceticum (Z26950) tRNA. After EufindtRNA has identified a candidate selenocysteine tRNA, it is passed to a eukaryotic or prokaryotic selenocysteine-specific covariance model. These two covariance models were developed by aligning selenocysteine tRNAs with inferred secondary structure information. Another program in the covariance model program suite, coveb, was used to build covariance models from the structure-annotated RNA sequence alignments. The five prokaryotic tRNAs noted above were used to build the prokaryotic selenocysteine model. Seven selenocysteine tRNAs from Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, chicken, mouse, bovine, and human were used to build the eukaryotic model.