12 May 2000 Kevin Karplus No good hits. In fact, none with an E value even as large as 10. I'll have to rerun with even larger thresholds, to get even wildly speculative hits. The t2k alignment is not very diverse---it is essentially multiple copies of the same sequence, going by slightly different names (adhesin, lectin, adhesin precursor). The target model and secondary structure prediction are likely to be rather poor. The adhesins and lectins in PDB seem to be beta-sandwiches, and we have predicted a fair amount of beta-strand residues in t88. There is one helix predicted, which is quite likely to separate the two sheets (as in 1amx). Should we try forcing alignments to known adhesins and lectins? 15 May 2000 kevin Karplus Saira recommends looking at 1a12[ABC] which is a PSI-BLAST hit. (See saira.mail). I'll try seeing what forcing the alignment with the SAM models to 1a12A (the FSSP rep) give us. SCOP: 2.64.4.1.1 Interestingly, the T0088-1a12A-global alignment (that is using the T0088.t2k model) scores 1a12A with -106.41 with the simple null, but 3.43 with the reverse-null. This explains why the first round PSI-BLAST finds it, and T2K rejects it. The alignment seems to be to part of a beta-propeller, which seems to me like an unlikely match (if it were to the whole propeller, now...). The conserved residues do not seem to line up with each other or cluster in any way. The alignment T0088-1a12A-local looks much better, pulling out one pair of adjacent beta sheets from the propeller (so getting the beta sandwich that adhesins and lectins seem to be), and clustering several conserved residues. There is a mid-sheet gap that will be hard to fill, but this is still much more promising than the global alignment. (use see-a2m 1a12A/T0088-1a12A-local.pw.a2m.gz from pce/casp4/t88 to view the alignment---press the "Kimmen" button to color conserved positions RED and other aligned positions GREEN). The Viterbi alignment T0088-1a12A-vit is too small to be useful, and has too little structure to be confident of that small a fragment having the same structure. Tue Jun 6 13:35:29 PDT 2000 Redid 2ry prediction with new neural net. 8 June 2000 Kevin Karplus Looked at summary of CAFASP servers. There is some consensus among the predictors: # superfamily 14 2.28.1 ConA-like lectins/glucanases 9 2.10.1 Crystallins/protein S/yeast killer toxin 5 6.4.1 Outer membrane protein 5 2.1.1 Immunoglobulin (5) 4 2.9.1 4 2.64.4 3 2.17.1 We may want to try some of the 2.28.1 hits, since they are at least lectins. There are 366 SCOP sequences in this superfamily (11 families) FSSP reps include (possibly among others): 1nlr, 1led, 1axkA, 1c1lA, 1qu0A, 1sacA, 1a8d, 1kit, 2sli, 1celA, 2nlrA The T99 server had hits (as models 3,4,5) for 1ax0,1ax1,1ax2, which are represented by 1led). The 2.10.1 hits are for 1wkt (yeast killer toxin). I tried doing a search for remoter homologs using the t2k alignment as a seed, but got no new hits. The alignment 1wkt-T0088-fssp-global.pw looks pretty good, with 24 conserved residues and insertions and deletions occuring in loop regions. The alignment with the target model (1wkt/T0088-1wkt-global.pw) is identical. The alignment 1led/1led-T0088-fssp-global.pw skips some beta strands in the middle of the sheet and has only 19 conserved residues. Mon Jun 26 09:48:55 PDT 2000 Remade 2ry predictions Mon Jun 26 09:52:02 PDT 2000 Remade 2ry predictions Sat Aug 26 15:27:47 PDT 2000 Remade 2track predictions Still no good hits. Top 2track: % Sequence ID Length Simple Reverse E-value SCOP 1amuA 563 -20.42 -7.26 3.1e+00 5.20.1 2mprA 421 -32.63 -7.16 3.1e+00 6.4.3 1qtsA 247 -24.67 -6.64 8.5e+00 2.1.9,4.88.1 1a1x 108 -23.69 -6.60 8.5e+00 2.59.1 1yer 228 -17.76 -5.56 2.3e+01 4.101.1 1byqA 228 -17.77 -5.54 2.3e+01 4.101.1 1bza 262 -17.91 -5.52 2.3e+01 5.3.1 1lci 550 -17.50 -5.45 2.3e+01 5.20.1 1aly 146 -25.30 -5.39 2.3e+01 2.21.1 1xnb 185 -27.23 -5.08 2.3e+01 2.28.1 I'll try making joints for the top 4, but I don't expect much. 28 Aug 2000 Rachel Karchin GafD is highly homologous with fimbria-associated F17-G and F17b-G. cite: Saarela, et.al Infection and Immunity Jul '96 2857-2860 "Two lines of evidence suggest a two-domain structure for GafD . . ." The family of GafD, F17-G and F17b-G are 87% identical within the first 160 residues and 100% identical within the remaining 194 residues. So all variation in the family is in the N-terminal half. The C-terminal half is absolutely conserved suggesting "a strict structural or functional role for this part of the protein. Second, non-polar mutations in the N- and C-terminal halves of GafD affected fimbrial expression differently . ." cite: GafD and Fimbrial Biogenesis and Receptor Recognition, Saarela et. al Journal of Bacteriology, Mar 1995 1477-1484. I've put both articles in the t88/papers directory. They are named by the first page the article starts on. Fri Sep 1 11:56:39 PDT 2000 Kevin Karplus Note: t88 is "an engineered N-term fragment which is proteolytically stable in E.coli." So this is in the part that is expected to vary. Sun Sep 3 16:37:26 PDT 2000 Kevin Karplus The papers tell us that the protein is a fimbrial lectin (or adhesin), similar in fuction to concanavalin A and other plant lectins. The fimbrial lectins have very high binding specificity. There are 86 PDB files for concanacalin A, and 366 chains in SCOP for the lectin domain 2.28.1 The 15 FSSP representatives for 2.28.1 are 1nls 1led 1axkA twice 1lcl 1a3k 1c1lA 1quoA 1sacA 1a8d 1kit twice 2sli 1celA 2nlrA I'll try alignments for all these. Top-scoring alignments (including ones NOT in 2.28.1) are 1wkt/1wkt-T0088-global T0088 156 -54.87 -10.00 1.4e-04 1wkt/1wkt-T0088-fssp-global T0088 156 -54.59 -9.86 1.6e-04 1wkt/T0088-1wkt-global 1wkt 88 -36.40 -8.12 8.9e-04 These three alignments are essentially the same with 24 identical residues (T0088-1wkt-global adds one more with a dubious long insertion before the N-terminus). With a tiny bit of hand editing 1wkt/1wkt-T0088-karplus, we get excellent striping on the beta sandewich. There are 2 predicted strands in an insertion, but these are on the edge of the beta sandwich, and could extend the sandwich or make it into a barrel. COULD PREDICT WITH THIS. 1amuA/T0088-1amuA-2track-local 1amuA 509 -20.51 -7.27 2.7e-03 gapless alignment with only 7 identical residues. Not very convincing. 2mprA/T0088-2mprA-2track-local 2mprA 421 -32.63 -7.16 2.7e-03 pulls out a small part of a large barrel, giving a flat beta sheet. Not very convincing. * 1led/1led-T0088-fssp-global T0088 156 -16.95 -6.77 3.4e-03 19 identical residues in very gappy alignment. misses 2.5 beta strands from middle of beta sandwich. 1a12A/T0088-1a12A-vit 1a12A 401 -10.04 -6.56 4.3e-03 10 identical residues in short gapless alignment. 1a12A/T0088-1a12A-local 1a12A 401 -16.24 -6.16 6.3e-03 extends the Viterbi alignment to get 25 identical residues. Light editing extends this to 38 identical residues, being 2.5 blades of a 7-bladed propellor.1a12A/T0088-1a12A-karplus.a2m Very pretty, very high residue ID, but I'm not happy with having a third of a beta propellor. 1a1x/T0088-1a1x-2track-global 1a1x 108 -10.98 -6.31 7.4e-03 1a1x/T0088-1a1x-2track-local 1a1x 108 -23.69 -6.60 7.4e-03 10 identical residues and one insertion in a beta sandwich Not very exciting. 1qtsA/T0088-1qtsA-2track-local 1qtsA 247 -24.67 -6.64 7.4e-03 19 identical residues with 2 short insertions. Can be extended to 20 residues and the entire domain with light editing: 1qtsA/T0088-1qtsA-karplus.a2m Nice beta sandwich---COULD PREDICT WITH THIS. * 1sacA/1sacA-T0088-fssp-global T0088 156 -3.01 -5.35 2.0e-02 19 identical residues in rather gappy alignment. Skips some interior strands of beta sandwich. If I were forced to predict right now, I'd go with 1wkt/1wkt-T0088-karplus.a2m as model 1 1qtsA/T0088-1qtsA-karplus.a2m as model 2 It's a shame that I couldn't get any of the lectins to line up well. If I get more time, I should look at more 2.28.1 possiblities. I'll also start a "make remote" search, to see if we can get some homologs into the t2k alignment. OOPS, did that already, and got no more hits. Sun Sep 3 20:43:15 PDT 2000 Tried running "make remoter" but this fails because the e-value computation of HMMSCORE is wrong when the reverse score is +infinity, and so everything passes the e-value test. Mon Sep 4 12:57:13 PDT 2000 Actually, the "make remoter" did work, though it accidentally overwrote the remote alignmenst, rather than making new "remoter" alignments. It doesn't really matter, since the HMM thresholding discarded all except the 19 sequences in the seed anyway. Tue Sep 5 10:35:27 PDT 2000 remaking 2track