next up previous contents
Next: Genome Analysis Up: Results Previous: Results


Table 2.2: tRNA prediction within annotated database subsets.

The detection rates for the Sprinzl tRNA database are broken down by phylogenetic domain. The Sprinzl subset tested contains only non-organellar, non-viral tRNAs which were not used in training of the tRNA covariance model. For the Sprinzl database subset, numbers in parentheses indicate percentage of correct tRNA identifications relative to total in the literature. The Genbank subset sequences were selected by retrieving non-organellar, non-viral, full-length tRNA sequences with ``tRNA'' indicated in the feature field of the entry. Since Genbank tRNA annotation is less reliable, the numbers in parentheses for this row are the percentage of correct tRNA identifications relative to all tRNAs verified by either covariance model analysis or visual inspection.

  Literature tRNAscan 1.3 EufindtRNA tRNA CM tRNAscan-SE

[0pt]Sequence Source

tRNAs Tot (%) Tot (%) Tot (%) Tot (%)

Sprinzl db (Archaea)

70 69 (98.6) 43 (61.4) 70 (100) 70 (100)

Sprinzl db (Eubacteria)

240 226 (94.2) 205 (85.4)1 239 (99.6) 237 (98.7)

Sprinzl db (Eukarya)

279 265 (95.0) 275 (98.6) 279 (100) 279 (100)

Sprinzl db (total)

589 560 (95.1) 523 (88.8) 588 (99.8) 586 (99.5)

Genbank tRNA subset

1462 1366 (93.4) 760 (52.0) 1456 (99.6) 1440 (98.5)

tRNAscan-SE was shown to be more sensitive than tRNAscan 1.3 by several measures, the first being a search of the Sprinzl and Genbank databases subsets (Table 2.2). In the Sprinzl test set, tRNAscan-SE detected 586 of 589 known tRNAs (99.5%), versus 560 of 589 (95.1%) for tRNAscan 1.3. Of all 1144 non-organellar tRNAs in the complete Sprinzl database, tRNAscan-SE fails to recognize seven. One was a eukaryotic sequence from Trypanosoma brucei (Sprinzl ID DT6050, Genbank TBTRNA3) which has been previously noted by Pavesi et al. (1994) as being missed by both tRNAscan 1.3 and the Pavesi search algorithm. The other six tRNAs missed by tRNAscan-SE were from various eubacteria (Sprinzl ID's: DA1543, DE2180, DG1351, DG1482, DS1250, RG1380). Several of these undetected tRNAs appear to be irregular in source or function. DE2180 is derived from DNA from the cyanelle (a photosynthetic organelle) of the unicellular eukaryote Cyanophora paradoxa and is thus misclassified as eubacterial in the database. DG1482 and RG1380 both contain substitutions of four highly conserved bases within the T$\psi$C loop, an indication that the tRNAs are probably used in synthesis of the peptidoglycan instead of protein translation (29). All seven of these atypical tRNAs were detected using covariance model analysis. The tRNA covariance model search does miss two tRNAs within the 1144-member Sprinzl database subset, both selenocysteine tRNAs (Sprinzl ID DZ1430 & DZ7742) that pass below the 20.0 bit cutoff at 0.60 and 14.19 bits, respectively. EufindtRNA, designed to search eukaryotic sequences exclusively, shows improved sensitivity for eukaryotic tRNAs (98.6%) over tRNAscan 1.3 (95.0%), but is still slightly less sensitive than tRNAscan-SE (100%). Over the three phylogenetic domains, tRNA covariance model analysis appears to be the most sensitive detection method, yet tRNAscan-SE trails by as little as one third of one percentage point.

Searching the Genbank subset sequences which contain less reliable tRNA annotation, tRNAscan-SE detects 98.5% of the 1462 tRNAs verified by either covariance model analysis or visual inspection, whereas tRNAscan 1.3 has a 93.4 % detection rate (Table 2.2). All prediction discrepancies were visually inspected. Of the 18 tRNAs that covariance model analysis detected but were missed by all three other methods, all had scores over 36 bits, and were annotated in the Genbank entries. The two tRNAs detected by tRNAscan-SE but missed by covariance model analysis were a selenocysteine tRNA (CTTRSEL; same as previously noted Sprinzl DZ1430 tRNA), and a long tRNA from Haloferax volcanii (HALTGW) whose 104 bp intron caused the tRNA to exceed the maximum total length limit for normal tRNA covariance model analysis (150 bp). Of the 9 sequences annotated as tRNAs but missed by all four detection methods, four have large group I or group II introns of 241 bp or larger (ANATGL, SSU10482, PHU29955, SYOTRNLUAA), and five appear to have either sequencing errors or modified bases which appear in the Genbank annotation but not in the sequence (corresponding tRNAs within the Sprinzl database were identified correctly by all four detection methods). Because of sequence discrepancies between the Genbank sequences and corresponding Sprinzl entries, these five Genbank tRNAs were not included in the 1462-member test set.

next up previous contents
Next: Genome Analysis Up: Results Previous: Results
Todd M. Lowe