Aragorn Results on Nanoarchaeum equitans

Created on: 2004 December 06


Experiments

After hearing Mike talk about his work looking for tRNA genes in N. equitans, I decided to try running Aragorn on the N. equitans genome since their pattern matching approaches sounded similar. Aragorn provides an option -i that can be used to specify the maximum intron length. I tried intron lengths of 10000 and 20000, and Aragorn still reported 38 tRNA genes, the same as tRNAscan-SE (using the -A option).

Later, Todd suggested that I try specifying a maximum intron length equal to the genome length. The genome length is 490885 bases. I ran Aragorn specifying a maximum intron length of 500000 and still only 38 tRNA genes were found. What I found suspicious was the execution time: no matter what maximum intron length I specified, Aragorn seemed to run equally fast.

I looked at the C code for Aragorn and found that the program allows a maximum intron length of 3000. If the -i option specifies a value greater than 3000, then 3000 is used. No warning message is provided. The value 3000 will appear near the beginning of the Aragorn output, but the "configuration" appearing near the end lists the user-specified value.

I made a new version of Aragron called xaragorn that allows introns up to length 20000. I ran xaragorn on the N. equitans genome with various maximum intron lengths. For lengths up through 5000, xaragorn finds 38 tRNA genes. When I tried maximum intron length 6000, xaragorn finds 39 tRNA genes. xaragorn also finds 39 tRNA genes for maximum intron lengths 7000, 8000, and 9000. When I tried 10000, xaragorn's run time exceeded my patience.

What is problematic about the result is that the additional tRNA gene that xaragorn finds (relative to tRNAscan-SE -A) does not have an intron.

Additionally, xaragorn and tRNAscan-SE find two tRNA genes at the same location, but report different amino acids. To make sure this difference was not introduced by my modifications to allow longer maximum intron lengths, I compared the results of running xaragorn and Aragorn on the N. equitans genome and got identical results (except for the extra tRNA gene found by xaragorn).

Details

  1. For the N. equitans genome I used /projects/lowelab/db/Bacteria/Nanoarchaeum_equitans/Nano_equi.fa.
  2. The standard version of the Aragorn program is /projects/lowelab/users/dmng/bin/aragorn.
  3. The modified version of the Aragorn program allowing introns up to length 20000 is /projects/lowelab/users/dmng/xaragorn/xaragorn.
  4. Options for Aragorn (including xaragorn): -t (search for tRNA genes only) and -seq (print the sequence). By default, the program assumes a circular, double stranded genome.
  5. Output files in /projects/lowelab/users/dmng/xaragorn:
    1. trnascanse.out: output of running tRNAscan-SE on N. equitans genome
    2. Ne6000.out: output of running xaragorn on N. equitans genome (with option -i6000)
    3. Ne6000.batch.out: output of running xaragorn on N. equitans genome with batch output (-w) specified (with option -i6000)
    4. aragorn.out: output of running aragorn on N. equitans genome (with option -i)
    5. aragorn.batch.out: output of running aragorn on N. equitans genome with batch output (-w) specified (with option -i)

Additional tRNA Gene Found by Aragorn

This gene has exactly the same sequence as the other Leu(gag) gene.

                 at                     
                t                       
               c                        
             t-a                        
             g-c                        
             c-g                        
             g-c                        
             g-c                        
             c-g                        
             c-g                        
             g-c     ta                 
            t   gcccc  a                
    cgag   g    !!!!!  g                
   g    ccc     cgggg  c                
   g    !!!      c   tt                 
   a    ggg       g                     
    caaa   g       t                    
            c-gt    c                   
            c-g g   c                   
            a-t  g   c                  
            g-c   g   t                 
            g-c    g  g                 
           t   t    ta                  
           t   g                        
            gag                         
                                        
                                        
     tRNA-Leu(gag)                      
     89 bases, %GC = 69.7               
     Sequence c[479670,479758]          
                                        


1   .   10    .   20    .   30    .   40    .   50
tgcggccgtgcccgagcggacaaaggggccaggttgaggtcctggtgggg
tagtccctgccggggttcgaatccccgcggccgcactat

Discussion

As noted earlied, this gene is suspicious because it contains no intron even though it is only found when a long intron length is specified. I tried searching for the reverse complement of (part of) the sequence and did not find it in the genome file. Perhaps the code is written so that increasing the value for the constant for maximum intron length causes a bug.


tRNA Genes Identified with Different Anticodons

Aragorn and tRNAscan-SE differ in their identification of two tRNA genes.

Gene 1: Ile (TAT) vs. Ser (TGA)

Here are the tRNAscan results:

Nanoarchaeum_equitans_Kin4-M	7	225624  225716	Ile	TAT	225662	225680	86.96

This is the information from tRNAscan-SE Analysis of the Nanoarchaeum equitans Genome:

Nanoarchaeum_equitans_Kin4-M.trna7 (225624-225716)	Length: 93 bp
Type: Ile	Anticodon: TAT at 35-37 (225658-225660)	Score: 86.96
Possible intron: 39-57 (225662-225680)
HMM Sc=77.68	Sec struct Sc=9.28
         *    |    *    |    *    |    *    |    *    |    *    |    *    |
         *    |    *    |  
Seq: GGGCCCGTGGCTCAGCCtGGGAGAGCGCCGGCCTTATAtggcggcctctcctaagaaAGCCGGAGGtCCC
     GGGTTCGAATCCCGGCGGGCCCA
Str: >>>>>>>..>>>>.........<<<<.>>>>>..........................<<<<<.....>>
     >>>.......<<<<<<<<<<<<.

Here are the Aragorn results:

                 tt                     
                t                       
               c                        
             t-a                        
             g-c                        
             g-c                        
             g-c                        
             c-g                        
             c-g                        
             c-g                        
             g-c     ta                 
            t   ggccc  a                
    cga    g    !!!!!  g                
   c   ctcg     ccggg  c                
   t   !!!!    c     tt                 
   g   gagc     t                       
    gga    g     g                      
            c-gag                       
            c-g                         
            g-c                         
            g-c                         
            c-g                         
           c   a                        
           t   a                        
            tga                         
                                        
                                        
     tRNA-Ser(tga)                      
     79 bases, %GC = 70.9               
     Sequence [225623,225720]           
                                        


1   .   10    .   20    .   30    .   40    .   50
tgggcccgtggctcagcctgggagagcgccggccttatatggcggcctct
cctaagaaagccggaggtcccgggttcgaatcccggcgggcccacttt

Intron from Nanoarchaeum_equitans_Kin4-M  gi|38349555|ref|NC_005213.1| Nanoarchaeum equitan
1   .   10    .   20    .   30    .   40    .   50
atatggcggcctctcctaa

Intron Length: 19
Intron Insertion Position(37): gcctt-Intron-gaaag

Gene 2: Met (CAT) vs. Ser (AGA)

Here are the tRNAscan results:

Nanoarchaeum_equitans_Kin4-M	12	327362	327500	Met	CAT	327399	327464	76.97

This is the information from tRNAscan-SE Analysis of the Nanoarchaeum equitans Genome:

Nanoarchaeum_equitans_Kin4-M.trna12 (327362-327500)	Length: 139 bp
Type: Met	Anticodon: CAT at 34-36 (327395-327397)	Score: 76.97
Possible intron: 38-103 (327399-327464)
HMM Sc=64.30	Sec struct Sc=12.67
         *    |    *    |    *    |    *    |    *    |    *    |    *    |
         *    |    *    |    *    |    *    |    *    |    *    |    *   
Seq: GCCGCCGTAGCTCAGCGGTcAGAGCGCCCGGCTCATAgcatgggctatgagctctgacccgaaaggggat
     gatctcgggggctcttatgccccctcgtgagaaACCGGGAGGtCGCGGGTTCGAATCCCGCCGGCGGCA
Str: >>>>>>>..>>>>........<<<<.>>>>>.......................................
     ..................................<<<<<.....>>>>>.......<<<<<<<<<<<<.

Here are the Aragorn results:

                 ca                     
                t                       
               a                        
             g-c                        
             c-g                        
             c-g                        
             g-c                        
             c-g                        
             c-g                        
             g-c     ta                 
            t   cgccc  a                
     ga    a    !!!!!  g                
    c  ctcg     gcggg  c                
   g   !!!!    c     tt                 
   g   gagc     t                       
    tca    g     g                      
            c-gag                       
            c-g                         
            c-g                         
            g-c                         
            g-c                         
           c   a                        
           t   a                        
            aga                         
                                        
                                        
                                        
     tRNA-Ser(aga)                      
     76 bases, %GC = 71.1               
     Sequence [327362,327503]           
                                        


1   .   10    .   20    .   30    .   40    .   50
gccgccgtagctcagcggtcagagcgcccggctcatagcatgggctatga
gctctgacccgaaaggggatgatctcgggggctcttatgccccctcgtga
gaaaccgggaggtcgcgggttcgaatcccgccggcggcatca

Intron from Nanoarchaeum_equitans_Kin4-M  gi|38349555|ref|NC_005213.1| Nanoarchaeum equitan
1   .   10    .   20    .   30    .   40    .   50
catagcatgggctatgagctctgacccgaaaggggatgatctcgggggct
cttatgccccctcgtg

Intron Length: 66
Intron Insertion Position(34): cggct-Intron-agaaa

Discussion

The two programs identify introns at different locations. One program's anticodon is (at least partially) in the other program's intron, and vice versa.

Aragorn reports no tRNA genes for methionine.