|
tRNAscan-SE was shown to be more sensitive than tRNAscan 1.3 by
several measures, the first being a search of the Sprinzl and Genbank
databases subsets (Table 2.2). In the Sprinzl
test set, tRNAscan-SE detected 586 of 589 known tRNAs (99.5%), versus
560 of 589 (95.1%) for tRNAscan 1.3. Of all 1144 non-organellar tRNAs
in the complete Sprinzl database, tRNAscan-SE fails to recognize
seven. One was a eukaryotic sequence from Trypanosoma brucei (Sprinzl
ID DT6050, Genbank TBTRNA3) which has been previously noted by Pavesi
et al. (1994) as being missed by both tRNAscan 1.3 and the Pavesi search
algorithm. The other six tRNAs missed by tRNAscan-SE were from various
eubacteria (Sprinzl ID's: DA1543, DE2180, DG1351, DG1482, DS1250,
RG1380). Several of these undetected tRNAs appear to be irregular in
source or function. DE2180 is derived from DNA from the cyanelle (a
photosynthetic organelle) of the unicellular eukaryote Cyanophora
paradoxa and is thus misclassified as eubacterial in the
database. DG1482 and RG1380 both contain substitutions of four highly
conserved bases within the TC loop, an indication that the tRNAs are
probably used in synthesis of the peptidoglycan instead of protein
translation (29). All seven of these atypical tRNAs were detected
using covariance model analysis. The tRNA covariance model search does
miss two tRNAs within the 1144-member Sprinzl database subset, both
selenocysteine tRNAs (Sprinzl ID DZ1430 & DZ7742) that pass below the
20.0 bit cutoff at 0.60 and 14.19 bits, respectively. EufindtRNA,
designed to search eukaryotic sequences exclusively, shows improved
sensitivity for eukaryotic tRNAs (98.6%) over tRNAscan 1.3 (95.0%),
but is still slightly less sensitive than tRNAscan-SE (100%). Over
the three phylogenetic domains, tRNA covariance model analysis appears
to be the most sensitive detection method, yet tRNAscan-SE trails by
as little as one third of one percentage point.
Searching the Genbank subset sequences which contain less reliable tRNA annotation, tRNAscan-SE detects 98.5% of the 1462 tRNAs verified by either covariance model analysis or visual inspection, whereas tRNAscan 1.3 has a 93.4 % detection rate (Table 2.2). All prediction discrepancies were visually inspected. Of the 18 tRNAs that covariance model analysis detected but were missed by all three other methods, all had scores over 36 bits, and were annotated in the Genbank entries. The two tRNAs detected by tRNAscan-SE but missed by covariance model analysis were a selenocysteine tRNA (CTTRSEL; same as previously noted Sprinzl DZ1430 tRNA), and a long tRNA from Haloferax volcanii (HALTGW) whose 104 bp intron caused the tRNA to exceed the maximum total length limit for normal tRNA covariance model analysis (150 bp). Of the 9 sequences annotated as tRNAs but missed by all four detection methods, four have large group I or group II introns of 241 bp or larger (ANATGL, SSU10482, PHU29955, SYOTRNLUAA), and five appear to have either sequencing errors or modified bases which appear in the Genbank annotation but not in the sequence (corresponding tRNAs within the Sprinzl database were identified correctly by all four detection methods). Because of sequence discrepancies between the Genbank sequences and corresponding Sprinzl entries, these five Genbank tRNAs were not included in the 1462-member test set.