18 May 2000 Kevin Karplus ADP-ribose pyrophosphatase seems to be similar to 1mut and 1tum nucleoside triphosphate pyrophosphohydrolase (mutt), based on double-blast results. The FSSP representative is 1tum. The 1MUT and 1TUM files are included in the T2k alignment, so prediction should be relatively straightforward. We may want to try doing some alignment trimming or adjustment. With a quick eyeballing of the results using see-a2m, I think I like the 1tum/1tum-T0090-fssp-global.pw.a2m alignment best, but there are some suspect chunks of it (the insertion in the beta strand, for example). After we get a good alignment, we should probably run it through SCRWL to set the sidechains. Questions: should we keep the ligand there? Will SCWRL pay attention to ligands? The target-model search found 1tum and 1mut, of course, but also added a very weak hit for 1lvl. I see no reason to pursue the 1lvl hit. The template model search also scored 1tum and 1mut at the top, with 1tum slightly better, and added three weak hits 1puc, 1sceA, 4xis, all down at the noise level. 19 May 2000 Kevin Karplus Saira made a recommendation for how to get a better multiple alignment for T0090 (and similar large sets of related sequences). See "mail" for details. The basic idea is to use model surgery with stringent cutoffs to get the core region, then freeze the core region (she suggests changing the model, but I think that using constraints may be more principled, and redo the model surgery with lower threhsolds to get the somewhat conserved non-core regions. I have not tried it on T0090 yet, but the T0090.t2k alignment seems to have a pretty solid core from about 65 through 180. Model surgery is not very good about adding things back on at the end, so I'm not sure how well Saira's method would work on this multiple alignment. We could try doing a tuneup on the alignment (which allows model surgery). I tried doing "tuneup" and "noseed" alignments for T0090, and creating the pairwise alignments with 1TUM. The alignment 1tum/nooseed-1tum-T0090-local.pw.a2m.gz looks pretty good, getting a conserved glutamic acid (E) that interacts with the ligand (GLU98) that is not conserved in the T0090-1tum-global alignment. The 1tum-T0090-fssp-global alignment still looks much better to me, as it not only conserves GLU98, but also several other residues in that neighborhood. 22 May 2000 Kevin Karplus Looking at the evolutionary tree, we can see that the MUTT_ECOLI (1tum template) is almost an out-group member for tree. It is only added in t2k_4 (not t2k_3), consistent with this view of it as a fairly distant homolog. We may want to make a family alignment (excluding subfamilies 33 and 34, the MUTT_HAEIN and MUTT_ECOLI sequences), to get a more precise secondary-structure prediction. We may want to even narrow it to subfamilies 9 through 25, though that is less clear. Tue Jun 6 13:35:30 PDT 2000 Redid 2ry prediction with new neural net. Fri Jun 23 10:56:10 PDT 2000 Melissa Cline Okay, let's see about finalizing one of these alignments! First off, which alignments are starting with a good score? 1tum-T0090-fssp-global.pw -9.20 weak 1tum-T0090-global.pw. -37.00 promising 1tum-T0090-local.pw -40.50 also promising 1tum-T0090-vit.pw -40.89 "" "" T0090-* < -150 un-credible: 1tum got in the training set? noseed-1tum-T0090-global.pw -82.04 looks better... noseed-1tum-T0090-local.pw -83.14 tuneup-1tum-T0090-global.pw -114.76 tuneup-1tum-T0090-local.pw -112.46 Mon Jun 26 09:52:02 PDT 2000 Remade 2ry predictions Mon Jun 26 16:40:44 PDT 2000 cline Okay, continuing from campus, where I can *run* all of the tools... Notes on the alignments: 1tum-T0090-vit.pw The alignment starts with an n-terminus insert of about 50 residues. Then, there's a stretch of 25 residues with two identical columns, covering the beta strands. There's an insert of 5-6 residues; we're at the surface of the structure here, and there seems to be plenty of room. Then, there's a high-identity stretch of about 60 residues, covering a helix and binding with a metal ion. Another insert, again at an exposed portion of the structure. Then, there's a short aligned region (20 columns, 2 identical), followed by a large c-terminus insert. The last helix in the structure is not aligned to: the alignment ends with a gap of about 20 residues. 1tum-T0090-local.pw same as above, except for two minor changes. The C-terminal gap starts about four residues later, at the start of the helix rather than just before the start. At the n-terminal, the first residue or two is not aligned. Other than that, the alignments are identical. 1tum-T0090-global.pw Most of the alignment is the same as the top one. Differences are as follows. The c-terminal helix is aligned to, though the alignment of the helix shows no identitical columns and several unlikely substitutions. I'd be inclined to go with the local alignment, or check the posterior decoding cost in the helix. The other difference is that rather than starting with a long n-terminal gap, the alignment starts with one residue aligned followed by a long interior gap. Again, the viterbi alignment is probably right. 1tum-T0090-fssp-global.pw Quite a bit different from the top alignment, but looks to be much higher in identity. However, it's also got a lot more internal gaps, which is probably how it has such high identity and such a low score. Anyway, here are the differences. At the N-terminal, it starts with 5 columns aligned (2 identical), then has an insert of 10 columns. Then, there's about 10 columns aligned (4 identical), followed by an insert of about 30 characters. Then, there's another 10 aligned (5 identical), followed by an insert of about 30 characters. These inserts are at the surface but very close to secondary structural elements, which makes this whole alignment region a bit shady. From residue 26 or 27 in 1tum, they align the same for a while, up to guide sequence residue 70. Then, the fssp alignment inserts 5 residues right before the start of a strand (right on the surface, plenty of room to grow), aligns 5 residues with one identical, inserts 20 residues (again, plenty of room there on the surface), and then aligns everything except for the first turn in the last helix. The region covering that last alignment segment seems high in identity, and could be a good alternative to the end of the other alignment - if its posterior decoding cost is reasonable. noseed-1tum-T0090-global.pw Starts off like the viterbi alignment. Very minor differences up to about template column 75. Then, there's a two-residue insert where a P and G are stuck into the middle of a beta strand. Since there's a beta turn right after this, right where the other alignment had them, I like the other alignment better here. Then, there's about 8 residues and a 4-residue insert at a surface loop. The viterbi alignment puts a big insert about three positions later. After that, they agree. Based on what I see, the viterbi still looks better. noseed-1tum-T0090-local.pw Just like the previous alignment, except it stops the alignment not at the c-terminal helix but at the segment of loop immediately adjacent to that helix. Before I say if this is a good thing, I'd like to see the posterior decoding cost in this region. tuneup-1tum-T0090-global.pw Only minor differences from the viterbi alignment until about template column 95. Then, where the viterbi has a short insert, this one keeps going for a while, with a small insert in a position where there's a surface beta turn in the structure - and plenty of room to grow. To the end of the alignment, it's shifted from the viterbi alignment by 4 residues. Neither alignment looks clearly better. tuneup-1tum-T0090-local.pw Almost identical to the global version (above). Just missing a couple positions at the very end. Wed Jun 28 08:48:08 PDT 2000 cline Now, I'm setting out to take a look at some of the alignments from above in terms of their posterior decoding column cost. The commands executed are shown, and all commands were executed in the directory pce/casp4/1tum/cline 1tum-T0090-vit.pw rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \ 1tum/nostruct-align/1tum.t2k.a2m.gz test -viterbi 1 -sw 2 gzip test.a2m (note: here I used measure_shift to verify that in test.a2m.gz I reproduced the alignment of 1tum-T0090-vit.pw.a2m.gz. I did.) build-trimming-info -align test.a2m.gz -target T0090 \ > 1tum-T0090-vit.pw.pdoccost On inspection of 1tum-T0090-vit.pw.pdoccost, all posteriors are very strong. The only two that are marginal are for the first two residues aligned. 1tum-T0090-local.pw rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \ 1tum/nostruct-align/1tum.t2k.a2m.gz test -adpstyle 5 -sw 2 gzip test.a2m.gz build-trimming-info -align test.a2m.gz -target T0090 \ > 1tum-T0090-local.pw.pdoccost (used measure_shift to test that we rebuilt the proper alignment) The only position that's possibly questionable is the first aligned, with a cost of 1.08. Second and third are slightly on the high side. All this meas is if there's an unreliable piece of this alignment, that's it. 1tum-T0090-global.pw rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \ 1tum/nostruct-align/1tum.t2k.a2m.gz test -adpstyle 5 -sw 0 gzip test.a2m measure_shift -r test.a2m.gz -c ../1tum-T0090-global.pw.a2m.gz \ -ta T0090 -te 1tum (note: checked out fine) build-trimming-info -align test.a2m.gz -target T0090 \ > 1tum-T0090-global.pw.pdoccost Here, the shakiest parts of the alignment are the first 2-3 positions at the beginning and the last 2-3 positions at the end. The rest looks excellent. 1tum-T0090-fssp-global.pw rebuild-align ../../T0090.seq 1tum/struct-align/1tum.fssp-w0.5.mod \ 1tum/struct-align/1tum.fssp.a2m.gz test -adpstyle 5 -sw 0 gzip test.a2m measure_shift -r test.a2m.gz -c ../1tum-T0090-fssp-global.pw.a2m.gz \ -ta T0090 -te 1tum (checks out) build-trimming-info -align test.a2m.gz -target T0090 \ > 1tum-T0090-fssp-global.pw.pdoccost This one is quite interesting! There are many positions with suspiciously high pdoc costs. The beginning looks mostly like crap. In the middle of the alignment, where it looks okay, it's basically the same alignment as all of the above. Then, there's a four-position gap, and a region that definitely looks like crap. Then, at the end of the alignment, the region scores pretty well with the following exceptions: the first position aligned doesn't look great, and the first 5 of the last 10 positions look shaky (the last 5 positions look much better). tried out a new alignment: 1tum-T0090-fssp-global-fw0.5.a2m.gz rebuild-align ../../T0090.seq 1tum/struct-align/1tum.fssp-fw0.5.mod \ 1tum/struct-align/1tum.fssp.a2m.gz 1tum-T0090-fssp-global-fw0.5 \ -adpstyle 5 -sw 0 build-trimming-info -align 1tum-T0090-fssp-global-fw0.5.a2m.gz \ -target T0090 > 1tum-T0090-fssp-global-fw0.5.pdoccost The alignment is similar to, and not obviously better than the last one. A look at the fssp alignment makes it clear why the sequence-based alignments for this template are looking so much better than the structure-based alignment. The FSSP alignment for 1tum contains only two sequences, 1tum and 1mut, and these sequences are nearly identical. Their structural alignment is no different than their sequence alignment. So, the fssp alignment really has only one sequence. The sequence-based alignment has a lot more information: it has 1mut, plus it has many more homologs. So, even though the idea of aligning to that last helix (as shown in the fssp-based alignments) is interesting, it's really not founded on much information. Wed Jun 28 15:00:06 PDT 2000 cline Here's the gist of what Kevin, Christian, and I talked about when we talked about this target. We like 1tum as the fold (so far so good). The non-fssp alignment looks good for the beginning, but we don't like that there's two beta strands in the second half where it doesn't get anything good. The fssp-based alignment, even though it's not based on many sequences, has some good features: greater identity around the active site, stronger signal in the second half of the alignment. However, the first half of the alignment doesn't look so good. In terms of consistency with the secondary structure prediction: There's a strong prediction of a strand from residues 59 to 66. In the FSSP alignment (for which we like the second half but not the first), it ends up in a loop. In the 1tum-T0090-local alignment, (for which we like the first half but not the second), it looks great; it covers a loop that's next to a beta strand, and looks credibly like it might be a strand itself. The 1tum secondary structure string shows a short strand there. Another beta strand is predicted with very high probability from residues 72 to 77. This is in the segment of the sequence following shortly after the last predicted strand. The fssp alignment doesn't align these residues. The t2k alignment aligns it to a part of the structure where the rasmol window shows a loop but the secondary structure string shows a short strand. The structure even puts something in between these two segments that could plausibly pose as a beta turn. Very strong helix prediction from residues 105 to 117. Both alignments put the helix in this region. strong strand prediction from residues 141 to 147. FSSP does not align this region. T2K aligns it to a strand. All this supports the T2K alignment at the beginning (up to about 147), FSSP at the end. In addition, both alignments have an insert at around residue 150; FSSP puts that in the middle of a strand, while T2K puts it at a beta turn (much better spot). To try to combine the best features of the fssp and t2k alignment, we created 1tum-T0090-edited.a2m from 1tum-T0090-fssp-global.a2m. Next, the goal is to make sure it's consistent with the secondary structure prediction. Contraditions and consistencies: strand prediction in residues 59-66: mostly not aligned. But, the insert comes at a good spot in the structure. Strand prediction in residues 72-77: not aligned. Perhaps the other alignment is better for this section. Saira? Strong helix prediction from 105-117 where all alignments put a helix. This region is nailed down. strand prediction at 141-148: covers a strand. helix prediction at 189-199: not aligned. Had been aligned in the automated alignment, but we had a few datapoints against this region and figured it was an artifact of global alignment. Wed Jun 28 22:03:19 PDT 2000 Kevin Karplus The region we are uncertain about is T0090 VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC which we want to align with 1tum PQHFSLFEKLEYEFP STRIDE CCEEEEEEECCEETT DSSP LBLLEEEEEELLBLS We can try to align with the first strand, leaving no gap at the beginning T0090 VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC 1tum PQHFSLFEKLEYEFP STRIDE CCEEEEEEECCEETT DSSP LBLLEEEEEELLBLS or we can try to align the second strand T0090 VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC 1tum PQHFSLFEKLEYEFP STRIDE CCEEEEEEECCEETT DSSP LBLLEEEEEELLBLS Neither alignment strikes me immediately as more correct. The second-strand alignment could be moved back 1-3 positions without being any worse, so these possible alignments should be looked at. Thu Jun 29 10:22:48 PDT 2000 Kevin Karplus The alignment in 1tum/1tum-T0090-edited is probably the best for the second strand aligmment: T0090 VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC : . :.. 1tum PQHFSLFEKLEYEFP STRIDE CCEEEEEEECCEETT DSSP LBLLEEEEEELLBLS since it conserves two residues and has some other reasonable substitutions. Aligning to the first strand gets no conserved residues and only one or two reasonble substitutions.