Wed Jun 28 19:00:38 PDT 2000 Kevin Karplus T0104 Hypothetical protein HI0065, Haemophilus influenzae swissprot YJEE_HAEIN no hits with blast or double-blast. 29 sequences in t2k alignment. 2ry structure alternates helices and strands. Target model gets moderate hits ID E-value FSSP SCOP 1ckeA 0.0067 1ckeA 3.31.1.1.4 2cmkA 0.0068 1ckeA 3.31.1.1.4 1skyE 0.14 1skyE 1.6.1,2.46.1,3.31.1.10.5 1d2mA 0.20 1d2mA ? (UVRB) 1qbkC 0.25 1byuA 3.31.1.7.7 1byu[AB]0.26 1byuA 3.31.1.7.6 1shkB 0.26 1shkA 3.31.1.2.1 ... All agree on the superfamily, but disagree on the family. The template models also get moderate hits 5p21 0.0018 1ctqA 3.31.1.7.1 1ctqA 0.0019 1ctqA 3.31.1.7.1 1kao 0.0153 1ctqA 3.31.1.7.4 3vtk 0.023 1qhiA 3.31.1.1.6 3rabA 0.0255 1ctqA 3.31.1.7.5 3tmkA 0.030 3tmkA 3.31.1.1.15 1ek0A 0.050 1ctqA 3.31.1.1.6 1kimA 0.056 1qhiA 3.31.1.1.6 ... Again, all agree on 3.31.1, but disagree on the family. This looks like a strong prediction, with 3.31.1.1 and 3.31.1.7 as the main families to consider. 28 June 2000 Kevin Karplus The 3.031.001 domain is a common one, with 421 instances in SCOP v 50. It often comes in pairs, with 83 of the 421 being second domains. Families 3.31.1. [7-9,12-13] seem to be the ones with the tandem duplication. If we have tandem duplication here, we should favor one of these families (probably 3.31.1.7), otherwise 3.31.1.1 seems most likely. The best-scoring of the alignments generated so far is 1d2mA/T0104-1d2mA-global.pw which gets and E-value for 1d2mA of 1.3e-09, which probably has a 3.31.1 domain (maybe 2), though it isn't in version 50 of SCOP. Next best is 1ckeA/T0104-1ckeA-global 3.6e-08 Third best is a template model 5p21/5p21-T0104-vit.pw or 5p21/5p21-T0104-local.pw 4.1e-07 6.e-07 Next different chain is 1byuA 1byuA/T0104-1byuA-global.pw 5.5e-06 All these are 3.31.1.7. The next is 3.31.1.10: 1skyE 1skyE/T0104-1skyE-local 1.9e-05 Then the first 3.31.1.1: 3vtk 3vtk/T0104-3vtk-global.pw 2.0e-05 Best fssp model is 1skyE/1skyE-T0104-fssp-global.pw 6.0e-03 Thu Jun 29 11:17:20 PDT 2000 Kevin Karplus The 1d2mA/T0104-1d2mA-global alignment is not particularly compact. The 1ckeA/T0104-1ckeA-global alignment is missing a couple of helices, making a gap that would be difficult to bridge (though conceivable, if the connection goes over the beta sheet, ratherthan under). The 5p21/5p21-T0104-local alignment has only one 1-residue gap, but is missing the final helix that the beta sheet curls around. Possibly something from the unmatched beginning of the target sequence could fill in there. 1byuA/T0104-1byuA-global is missing an interior beta strand, which would be hard to fix. 1skyE/T0104-1skyE-local only matches a few residues. 3vtk/T0104-3vtk-global has few conserved residues and many gaps. 1skyE/1skyE-T0104-fssp-global looks pretty good, but is missing one long helix, making a hard-to-close gap. The 3.31.1 superfamily seems to have a rather diverse collection of structures, so picking the right template may be difficult. Wed Jul 5 12:12:05 PDT 2000 Kevin Karplus CAFASP summaries not available yet, but SAM-T99 gets 5p21 as top hit and 1ckeA as only bi-directional hit. Tue Jul 11 11:45:17 PDT 2000 CAFASP top hits: 37 3.31.1 22 3.56.1 So most agree with us on 3.31.1, but we should look at 3.56.1, just to be sure: CAFASP hit FSSP SCOP 1hgx[AB] 1tc1A 3.56.1 1d6nB 1qk3A 3.56.1 1tc1[AB] 1tc1A 3.56.1 1nulA 1nulA 3.56.1 1a3c 1a3c 3.56.1 1opr 1opr 3.56.1 1hmpA 1qk3A 3.56.1 There is no evidence of a tandem repeat (using DOTLET http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html), so 3.31.1.1 seems like the most likely family (8 FSSP reps: 1ckeA,1zin,1nksA,3tmkA,1dekA,1gky,5tmpA,1qhiA), though the 3.31.1.7 families are scoring slightly better. Top-scoring alignments in the 3.56.1 set are 1tc1A/1tc1A-T0104-global T0104 5.9e-04 1tc1A/1tc1A-T0104-fssp-global T0104 2.0e-03 1opr/1opr-T0104-fssp-global T0104 2.4e-03 1nulA/1nulA-T0104-global T0104 5.8e-03 1tc1A/1tc1A-T0104-vit T0104 4.6e-02 1opr/1opr-T0104-global T0104 4.7e-02 1a3c/1a3c-T0104-fssp-global T0104 6.3e-02 1tc1A/1tc1A-T0104-local T0104 7.0e-02 ... These are not nearly as strong as for 3.31.1 3tmkA/3tmkA-T0104-global T0104 2.7e-10 1ctqA/1ctqA-T0104-global T0104 8.5e-10 1d2mA/T0104-1d2mA-global 1d2mA 1.3e-9 1kao/1kao-T0104-global T0104 2.1e-9 5tmpA/5tmpA-T0104-global T0104 1.0e-08 1ckeA/T0104-1ckeA-global 1ckeA 3.6e-08 1byuA/1byuA-T0104-global T0104 3.9e-08 1ctqA/1ctqA-T0104-vit T0104 9.2e-08 1ckeA/1ckeA-T0104-global T0104 1.3e-07 1nksA/1nksA-T0104-global T0104 1.6e-07 1ckeA/T0104-1ckeA-global 1ckeA 1.6e-07 Looking at 3.56.1 alignments: 1tc1A/1tc1A-T0104-global is missing 2 of the 4 beta strands. 1tc1A/1tc1A-T0104-fssp-global is missing one helix and half of two strands---the gap is easily closed to make a sheet with one less strand. 1tc1A/1tc1A-T0104-vit is too short to be useful. 1tc1A/1tc1A-T0104-local is also too short. 1opr/1opr-T0104-fssp-global is missing two strands in mid sheet. 1opr/1opr-T0104-global is missing a midsheet strand and a helix. 1nulA/1nulA-T0104-global is missing a strand in mid sheet and a helix. 1a3c/1a3c-T0104-fssp-global is missing a mid-sheet strand and a big helix Of the 3.56.1 alignments I've looked at so far, only 1tc1A/1tc1A-T0104-fssp-global looks feasible, and that only if closing the gap makes sense. Looking at some of the 3.31.1.1 alignments: 3tmkA/3tmkA-T0104-global is missing an edge strand, and has low conservation in the sheet, but if we erase the YDDAR segment near the end, the gaps in the alignment are all closable. (We are missing a fair chunk of the helices in 3tmkA) One of the binding sites seems to be conserved. 1ctqA/1ctqA-T0104-global is also missing the final strand, and conserves the binding site. The final segment ELIAQTNLGKNIIS is probably misaligned. (Note: this is 3.31.1.7) 1d2mA/T0104-1d2mA-global is missing 2 mid-sheet strands and has some alignment into the next domain. (Note: not in SCOP) 1kao/1kao-T0104-global is missing a helix in the middle, leaving a hard-to-close gap, but there is pretty good conservation around binding site. 5tmpA/5tmpA-T0104-global is a rather gappy alignment, and some of the gaps seem hard to close. (Again a pretty good binding site.) 1ckeA/T0104-1ckeA-global has conservation around binding site, but some hard-to-close gaps, as does 1ckeA/1ckeA-T0104-global. 1nksA/1nksA-T0104-global has some awkard gaps. Right now, I favor 3tmkA/3tmkA-T0104-global, but I could be talked into 1tc1A/1tc1A-T0104-fssp-global, if someone has a good reason to favor the 3.56.1 superfamily. Sat Aug 26 00:23:22 PDT 2000 Remade 2track predictions Sat Aug 26 14:23:03 PDT 2000 Remade 2track predictions Kevin Karplus A few 2track weak hits: % Sequence ID Length Simple Reverse E-value SCOP 1d2nA 272 -31.34 -17.62 1.4e-04 3.31.1.12.4 3tgl 269 -26.86 -14.27 2.8e-03 3.64.1 1vom 762 -26.04 -12.28 2.1e-02 2.32.3,3.31.1.8.3 1hgxA 183 -26.64 -11.26 5.7e-02 3.56.1 1do0A 442 -24.73 -11.11 5.7e-02 3.31.1.12.5 1qorA 327 -24.93 -11.07 5.7e-02 2.33.1,3.2.1 These seem to pick up 3.31.1 superfamilies most. I added 1d2nA, 1do0A, and 1vom as possible hits. We should look at them as well. Tue Sep 5 10:35:22 PDT 2000 Kevin Karplus remaking 2track to get latest additions to model library. No real change in top 2track hits. I noticed that we do have a 3.56.1 hit. I added 1hgxA and 3tgl to the list of alignments to check. Swissprot lists target and homologs being part of Uncharacterized Protein Family : UPF0079 Taxonomic range: Eubacteria PROSITE entry: None Comments: Probable ATP-binding protein. YJEE_ECOLI (P31805), YJEE_HAEIN (P44492), Y843_AQUAE (O67011), YDIB_BACSU (O05515), Y186_BORBU (O51204), YY22_MYCLE (Q49864), YY22_MYCTU (Q50706), Y013_RICPR (Q9ZED0), Y257_SYNY3 (P74415), Y875_TREPA (O83845), YJEE_STRCO (O86788), YJEE_ANASP (O52749) http://www.expasy.ch/cgi-bin/lists?upflist.txt We have annotation of the potential ATP binding site as 53-60 of MOXR_METEX, which translates to 40-57 and 222-229 of SUG2_YEAST, which also translates to 40-47. This is the highly conserved GdlGAGKT motif. Top alignments are now: 1hgxA/T0104-1hgxA-2track-global 1hgxA 164 -23.47 -33.92 1.4e-14 pretty good secondary structure matches, but residue identity low, and doesn't have good match for GdlGAGKT motif. Fixed up alignment a bit to get 1hgxA/T0104-1hgxA-karplus1.a2m 1a3c/T0104-1a3c-2track-global 1a3c 181 -18.03 -28.76 2.1e-12 Missing beta strand makes hard-to-close gap. Doesn't have good match for GdlGAGKT motif. 1tc1A/T0104-1tc1A-2track-global 1tc1A 175 -14.95 -27.52 5.6e-12 Good 2ry match, but missing 2 beta strands, making hard-to-close gaps. Doesn't have good match for GdlGAGKT motif. 3tmkA/3tmkA-T0104-global T0104 158 -17.77 -23.13 2.7e-10 Has GKTT conserved at proposed ATP-binding site, but 2ry match is poor and there are several tough gaps. 1ctqA/1ctqA-T0104-global T0104 158 -15.39 -21.98 8.5e-10 excellent conservation at proposed ATP-binding site. With a little hand editing, we can get a lot of conservation near the active site, though the alignment is a bit gappy. 1ctqA/1ctqA-T0104-karpus.a2m MAY BE POSSIBLE PREDICTION. 1d2mA/T0104-1d2mA-global 1d2mA 665 -176.73 -21.57 1.3e-09 Good conservation at motif Somewhat non-compact, but can be improved by realigning, For example 1d2mA/T0104-1d2mA-karplus1.a2m or, better, 1d2mA/T0104-1d2mA-karplus2.a2m MAY BE POSSIBLE PREDICTION. 1kao/1kao-T0104-global T0104 158 -13.74 -21.07 2.1e-09 1d2nA/T0104-1d2nA-2track-global 1d2nA 246 -0.41 -21.13 2.3e-09 5tmpA/5tmpA-T0104-global T0104 158 -15.63 -19.48 1.0e-08 1nulA/T0104-1nulA-2track-global 1nulA 142 -12.29 -19.79 1.7e-08 1ckeA/T0104-1ckeA-global 1ckeA 227 -84.37 -18.24 3.6e-08 1byuA/1byuA-T0104-global T0104 158 -9.85 -18.17 3.9e-08 1nulA/T0104-1nulA-2track-global 1nulA 152 -9.71 -18.84 4.6e-08 1ctqA/1ctqA-T0104-vit T0104 158 -21.03 -17.30 9.2e-08 1d2nA/T0104-1d2nA-2track-local 1d2nA 246 -31.43 -17.62 1.2e-07 1ckeA/1ckeA-T0104-global T0104 158 -11.25 -16.95 1.3e-07 1ckeA/T0104-1ckeA-global 1ckeA 212 -80.72 -16.73 1.6e-07 1nksA/1nksA-T0104-global T0104 158 -13.43 -16.76 1.6e-07 5 Sept 2000 Rachel Karchin Functional information: Two new hypothetical proteins from H. influezae have had crystal structures determined which imply methyltransferase function according to a seminar given at NIH (NDDK). www-mslmb.niddk.nih.gov/midatlantic/schedule.html One of these may (not for sure) be HI0065. At the same meeting, Alexey Teplyakov presented a talk on the "Crystal Structure of HI0065". (May 3) General info: TIGR has an HMM for HI0065 and they claim that "apart from the nucleotide-binding motif GXXGXGKT . . . it lacks detectable homology to other proteins". http://www.tigr.org/tigr-scripts/hmm/hmm_report.spl?acc=TIGR00150