next up previous contents
Next: Conclusion Up: Discussion Previous: Speed, Sensitivity, and Selectivity

tRNA False Positives & Pseudogenes

Of the 5,591 total false positives identified by tRNAscan 1.4 in 15 gigabases of simulated human sequence (Table 2.4), in only six instances did it agree with EufindtRNA (relaxed parameters) in falsely identifying a sequence as a tRNA. The majority of false positives found by tRNAscan 1.4 seem to have tRNA-like secondary structure but lack similarity to conserved tRNA primary sequence. EufindtRNA, on the other hand, identifies correctly spaced primary sequence promoter elements, yet tends to err because it does not check for proper tRNA secondary structure.

These observations hold up on examination of false positives from actual genomic sequence from C. elegans. Most of the 29 false positives identified by tRNAscan 1.3 were discarded by covariance model analysis because of the lack of primary sequence similarity to the general tRNA model. EufindtRNA, on the other hand, more commonly identifies pseudogene tRNA fragments, SINE-like repetitive elements, or other tRNA-like sequences containing A and B boxes (Table 2.3). Pseudogenes are recognizable since part of the sequence is very similar to other intact tRNAs, in spite of truncations or large insertions elsewhere in the pseudogene. However, tRNA secondary structure in pseudogenes and SINE-like elements tends to be lost more quickly than primary sequence promoter elements. This may not be surprising in light of the observation that portions of tRNA sequences are thought to help provide mobility for some tRNA-derived repetitive elements [Keeney et al., 1995]. Since EufindtRNA (relaxed parameters) only looks for canonical promoter regions, it is prone to finding these instances of pseudogenes and repetitive elements with tRNA promoters in the absence of structural tRNA features.

To some extent, covariance model analysis is also apt to identify truncated tRNAs and other tRNA-derived sequence elements. The minimum cutoff score of 20 bits has been set to include outlying tRNAs with low overall homology to the general tRNA model. However, if a part of a high-scoring tRNA is truncated, the score may be much lower, but still exceed the 20 bit threshold. The most extreme example of this occurs with a tRNA in the C. elegans cosmid W03A3. The tRNA has 100% identity with tRNAs on at least four other cosmids, except for a truncation of the first 16 bases that removes the 5' side of the aminoacyl acceptor stem and the first half of the A box promoter sequence (part of the D-loop). tRNAscan 1.3 did not detect this pseudogene because of the lost base pairings in the D-loop and aminoacyl stems, whereas EufindtRNA could not locate the A box promoter sequence. Covariance model analysis similarly identified three other pseudogenes that neither tRNAscan 1.3 nor EufindtRNA found: one appears to have a 13 bp truncation relative to tRNAs in two other cosmids; one has a peculiar 21 bp insertion in the middle of the A box promoter sequence that makes three near-perfect repeats of the 7-mer ``GTCGCGA''; and one cosmid has a pseudo tRNA containing a 55 bp insert in the anticodon loop that does not appear to be a true intron. Since none of these were identified by either tRNAscan 1.3 or EufindtRNA, tRNAscan-SE necessarily does not detect them.

tRNAscan-SE does, however, detect 19 other tRNA-like sequences that are identified by EufindtRNA and ``confirmed'' by covariance model analysis (scores greater than 20 bits). These may or may not be pseudogenes. Nine of these involve 5' truncations of 3 to 15 nucleotides relative to other tRNAs in the nematode. It is impossible to determine by computational analysis alone if these are functional tRNAs or inactive pseudogenes. In either case, it is important to be aware of these possible tRNA pseudogenes for possible further experimental and/or computational study. Elucidating a common transpositional mechanism for preferential loss of the 5' end of these tRNAs is a question of interest.


next up previous contents
Next: Conclusion Up: Discussion Previous: Speed, Sensitivity, and Selectivity
Todd M. Lowe
2000-03-31