Created on: 2004 December 06
After hearing Mike talk about his work looking for tRNA genes in N. equitans, I decided to try running Aragorn on the N. equitans genome since their pattern matching approaches sounded similar. Aragorn provides an option -i that can be used to specify the maximum intron length. I tried intron lengths of 10000 and 20000, and Aragorn still reported 38 tRNA genes, the same as tRNAscan-SE (using the -A option).
Later, Todd suggested that I try specifying a maximum intron length equal to the genome length. The genome length is 490885 bases. I ran Aragorn specifying a maximum intron length of 500000 and still only 38 tRNA genes were found. What I found suspicious was the execution time: no matter what maximum intron length I specified, Aragorn seemed to run equally fast.
I looked at the C code for Aragorn and found that the program allows a maximum intron length of 3000. If the -i option specifies a value greater than 3000, then 3000 is used. No warning message is provided. The value 3000 will appear near the beginning of the Aragorn output, but the "configuration" appearing near the end lists the user-specified value.
I made a new version of Aragron called xaragorn that allows introns up to length 20000. I ran xaragorn on the N. equitans genome with various maximum intron lengths. For lengths up through 5000, xaragorn finds 38 tRNA genes. When I tried maximum intron length 6000, xaragorn finds 39 tRNA genes. xaragorn also finds 39 tRNA genes for maximum intron lengths 7000, 8000, and 9000. When I tried 10000, xaragorn's run time exceeded my patience.
What is problematic about the result is that the additional tRNA gene that xaragorn finds (relative to tRNAscan-SE -A) does not have an intron.
Additionally, xaragorn and tRNAscan-SE find two tRNA genes at the same location, but report different amino acids. To make sure this difference was not introduced by my modifications to allow longer maximum intron lengths, I compared the results of running xaragorn and Aragorn on the N. equitans genome and got identical results (except for the extra tRNA gene found by xaragorn).
This gene has exactly the same sequence as the other Leu(gag) gene.
at t c t-a g-c c-g g-c g-c c-g c-g g-c ta t gcccc a cgag g !!!!! g g ccc cgggg c g !!! c tt a ggg g caaa g t c-gt c c-g g c a-t g c g-c g t g-c g g t t ta t g gag tRNA-Leu(gag) 89 bases, %GC = 69.7 Sequence c[479670,479758] 1 . 10 . 20 . 30 . 40 . 50 tgcggccgtgcccgagcggacaaaggggccaggttgaggtcctggtgggg tagtccctgccggggttcgaatccccgcggccgcactat
As noted earlied, this gene is suspicious because it contains no intron even though it is only found when a long intron length is specified. I tried searching for the reverse complement of (part of) the sequence and did not find it in the genome file. Perhaps the code is written so that increasing the value for the constant for maximum intron length causes a bug.
Aragorn and tRNAscan-SE differ in their identification of two tRNA genes.
Here are the tRNAscan results:
Nanoarchaeum_equitans_Kin4-M 7 225624 225716 Ile TAT 225662 225680 86.96
This is the information from tRNAscan-SE Analysis of the Nanoarchaeum equitans Genome:
Nanoarchaeum_equitans_Kin4-M.trna7 (225624-225716) Length: 93 bp Type: Ile Anticodon: TAT at 35-37 (225658-225660) Score: 86.96 Possible intron: 39-57 (225662-225680) HMM Sc=77.68 Sec struct Sc=9.28 * | * | * | * | * | * | * | * | * | Seq: GGGCCCGTGGCTCAGCCtGGGAGAGCGCCGGCCTTATAtggcggcctctcctaagaaAGCCGGAGGtCCC GGGTTCGAATCCCGGCGGGCCCA Str: >>>>>>>..>>>>.........<<<<.>>>>>..........................<<<<<.....>> >>>.......<<<<<<<<<<<<.
Here are the Aragorn results:
tt t c t-a g-c g-c g-c c-g c-g c-g g-c ta t ggccc a cga g !!!!! g c ctcg ccggg c t !!!! c tt g gagc t gga g g c-gag c-g g-c g-c c-g c a t a tga tRNA-Ser(tga) 79 bases, %GC = 70.9 Sequence [225623,225720] 1 . 10 . 20 . 30 . 40 . 50 tgggcccgtggctcagcctgggagagcgccggccttatatggcggcctct cctaagaaagccggaggtcccgggttcgaatcccggcgggcccacttt Intron from Nanoarchaeum_equitans_Kin4-M gi|38349555|ref|NC_005213.1| Nanoarchaeum equitan 1 . 10 . 20 . 30 . 40 . 50 atatggcggcctctcctaa Intron Length: 19 Intron Insertion Position(37): gcctt-Intron-gaaag
Here are the tRNAscan results:
Nanoarchaeum_equitans_Kin4-M 12 327362 327500 Met CAT 327399 327464 76.97
This is the information from tRNAscan-SE Analysis of the Nanoarchaeum equitans Genome:
Nanoarchaeum_equitans_Kin4-M.trna12 (327362-327500) Length: 139 bp Type: Met Anticodon: CAT at 34-36 (327395-327397) Score: 76.97 Possible intron: 38-103 (327399-327464) HMM Sc=64.30 Sec struct Sc=12.67 * | * | * | * | * | * | * | * | * | * | * | * | * | * Seq: GCCGCCGTAGCTCAGCGGTcAGAGCGCCCGGCTCATAgcatgggctatgagctctgacccgaaaggggat gatctcgggggctcttatgccccctcgtgagaaACCGGGAGGtCGCGGGTTCGAATCCCGCCGGCGGCA Str: >>>>>>>..>>>>........<<<<.>>>>>....................................... ..................................<<<<<.....>>>>>.......<<<<<<<<<<<<.
Here are the Aragorn results:
ca t a g-c c-g c-g g-c c-g c-g g-c ta t cgccc a ga a !!!!! g c ctcg gcggg c g !!!! c tt g gagc t tca g g c-gag c-g c-g g-c g-c c a t a aga tRNA-Ser(aga) 76 bases, %GC = 71.1 Sequence [327362,327503] 1 . 10 . 20 . 30 . 40 . 50 gccgccgtagctcagcggtcagagcgcccggctcatagcatgggctatga gctctgacccgaaaggggatgatctcgggggctcttatgccccctcgtga gaaaccgggaggtcgcgggttcgaatcccgccggcggcatca Intron from Nanoarchaeum_equitans_Kin4-M gi|38349555|ref|NC_005213.1| Nanoarchaeum equitan 1 . 10 . 20 . 30 . 40 . 50 catagcatgggctatgagctctgacccgaaaggggatgatctcgggggct cttatgccccctcgtg Intron Length: 66 Intron Insertion Position(34): cggct-Intron-agaaa
The two programs identify introns at different locations. One program's anticodon is (at least partially) in the other program's intron, and vice versa.
Aragorn reports no tRNA genes for methionine.