The most sensitive and selective tRNA detection method that we are aware of utilizes probabilistic RNA covariance models [Eddy & Durbin, 1994], which are based on stochastic context-free grammar techniques. However, searching with covariance models has two drawbacks. First, it is extremely CPU-intensive, requiring days to weeks of processor time to scan megabase-size genomic data from higher eukaryotes. Second, the general nature of the approach hampers output of tRNA-specific feature information such as anticodon, isotype, and intron position. Our goal in the development of tRNAscan-SE was to produce a practical (i.e., fast) application of stochastic context-free grammar-based RNA analysis methods with sensitivity and selectivity as close as possible to using native covariance model searches. tRNAscan-SE achieves this goal.
tRNAscan-SE increases tRNA covariance model search speed by 1,000 to 3,000 fold while offering nearly equal sensitivity and slightly improved selectivity. Selenocysteine tRNA detection features are built into tRNAscan-SE, including modifications to EufindtRNA and the use of selenocysteine tRNA covariance models. With these additions, tRNAscan-SE correctly identifies both of the selenocysteine tRNAs in the Sprinzl database not detected by normal covariance model analysis. The Genbank version of one of these two selenocysteine tRNA sequences, CTTRSEL from C. thermoaceticum, was also detected within the Genbank tRNA subset (the other selenocysteine tRNA was not in the Genbank subset).
tRNAscan-SE also extends the maximum length of tRNAs detectable to almost any length. In covariance model analysis, search time increases as the square of the maximum tRNA length, so the search window has typically been limited to 150 bp. In tRNAscan-SE, the first-pass scanners define the approximate bounds of a tRNA, and for tRNAs with very long introns, intervening sequences can be cut out based on the first-pass analysis. This allows detection of rare, abnormally long tRNAs without greatly increasing the overall average search time. In the Genbank subset, tRNAscan-SE detected four tRNAs (HALTGW plus three detected with the -L option) whose introns, ranging from 104 to 850 bp, exceeded the normal length limit for covariance model detection.