Wed Aug 7 19:34:44 PDT 2002 t0172 Homology modeling, template 1dusA, 1dl5A, ... (c.66.1) 23 Aug 2002 Kevin Karplus try1-opt-scwrl, the best scorer, does not look too good. Let's do another run from scratch with a few more alignments and the try-heavy alignments. If that doesn't help, we'll have to add constraints to pull the sheet back together. 24 Aug 2002 00:54 New best is try2-opt-scwrl. I haven't looked at it yet. It does not look bad, though I think S76-R83 are somewhat misaligned, and the predicted strand L262-E267 is coiled up into a helix. We can try forcing a better alignment of S76, L77, F78, K79 with R50 I51 I52 G53 I54 D55---perhaps adding CB constraints: I52.CB S76.CB G53.CA L77.CA I54.CB F78.CB D55.CB K79.CB Available for hbonding are I51 and G53 I51.O S76.N G53.N S76.O G53.O F78.N With these constraints try1.6.80 scores best. R238-S243 looks like it should be in the parallel sheet somewhere, and R287-A291 should probably be straight. Let's add straightness constraints for them, which makes the best-scoring decoy still be try1.6.80 I think another optimization run from just alignments is called for, then we can start trying to figure out other constraints to pull the other beta strands together. 25 August 2002 20:56 Kevin Karplus try3-opt-scwrl doesn't score quite as well as try1.6.80, but it beats everything else. Strand I264-T266 is detached from the sheet and L288-R293 is also detached. I have Hbonds for D98.O R238.N I100.N R238.O I100.O V240.N M102.N V240.O M120.O I242.N leaving I239, V241, S243 for Hbonding on the other side Let's try this alignment: 98> DGILMD 238> RIVVIS 263> RILTEK 287> RLRAAE So Hbonds would be I239.N R263.O I239.O L265.N V241.N L265.O V241.O E267.N S243.N E267.O I264.N R287.O I264.O R289.N T266.N R289.O T266.O A291.N K268.N A291.O With the try4 constraints, the best model is try1-al2, so we should probably do yet another run from the alignments. 27 Aug 2002 14:12 Kevin Karplus Best score is now try4-opt-scwrl. The main sheet looks pretty good, but the L104-L226 domain is almost completely disordered. Let's do a VAST alignment and try extending try4-opt-scwrl: Run vs30838 password casp5t0172 17:56 The PDB file T0172-1dusA.vast.pdb based on the alignment T0172-1dusA.vast.a2m does not match the way I edited in CN3D---the unaligned residues have been realigned (probably because the FASTA output from CN3D has no notion of unaligned residues). Hand-editing the a2m file to match the breaks I edited in the VAST file produces a good file which is missing 107-221. Let's make a subdomain for V106-L222 with the constraints on V106.CA, S107.CA, S107.N, N221.C, and L222.CA to match the distances in T0172-1dusA.vast.pdb: V106.CA N221.C 8.241 S107.CA N221.C 9.369 S107.N N221.C 9.894 V106.CA L222.CA 7.543 S107.CA L222.CA 9.176 S107.N L222.CA 9.938 28 Aug 2002 07:44 Kevin Karplus Let's cut-and-paste in t0172-106-222 try2-opt Oops--I just overwrote undertaker-try4.log---it should have been undertaker-try5.log (there should be a law against typing while tired). Wed Aug 28 17:06:28 PDT 2002 New best is try5-opt-scwrl, which has blown the inserted region up again, and failed to connect at G237-R238. Let's do another optimization run, to see if it can pack things a little better, then try adding some constraints to guide it. 29 Aug 2002 17:01 Kevin Karplus new best is try6-opt-scwrl. It joins up reasonably well, but is not compact. Let's try cutting at E123 and L177 and add some weak constraints to put A168 near V184 and V172 near L181. 21:59 try7-opt-scwrl is best score, but there is a truly horrendous break between G237 and R238. Maybe we should add some constraints for L222 and F225/L226 near either I239 and V241 or V240 and I241 depending which side of the sheet I think it should be on---there are holes on both sides---probably I239 and V241, continuing the pattern of the rest of the C terminus. I think there is a metal-binding site at C46-C49 and the charged residues E23 D24 E25 K26. Without constraints, try6-opt-scwrl still scores best. With the try8 constraints (try7 minus the helix folding constraints plus the constraints on L222 and F225), try6-opt-scwrl scores best. Let's modify try8 to have the helix folding constraints of try7, but weaken them, by moving the endpoints farther from the desired value, until try6-opt-scwrl and try7-opt-scwrl look about equally good. Fri Aug 30 11:02:44 PDT 2002 Kevin Karplus try8-opt is new best. It still has the long helix of try6, and a terrible break between G237 and R238. That break was in all runs from try5 on. If we go back to the try4 constraints, and turn breaks way up, then the best score is still for try8-opt. Fri Aug 30 13:11:43 PDT 2002 Kevin Karplus New best score is try9-opt. Breaks are still bad, but there looks like there is an attempt to wrap the helices around the top. Let's go back to try4.constraints but turn up pred_alpha a lot. If I do that, then try8-opt scores best! Fri Aug 30 16:06:31 PDT 2002 Kevin Karplus try8-opt still best score (with try4.constraints), but try10-opt comes close. Let's try adding some constraints to put T119-F120 parallel to L288-R293, maybe as rgFTFEre gRLRAAER 31 Aug 2002 09:47 Kevin Karplus Best score with try11.constraints is try11-opt. The G237-R238 break is still terrible. If I take out all constraints, but add nonalpha_hydrogen_bonds to the scoring function, then T0172.try1+T0172-1xvaA-2track-...+1i9gA-T0172-vit-adpstyle1.pw.a2m.gz:1i9gA.3.80.pdb scores best. This has part of the sheet nicely formed, but strands L288-A291 and I264-T266 are detached, and T119-F120 is wound up into a helix. Hbonds alone don't seem to find the good sheet models. If I turn breaks down to 10 and turn nonalpha_hydrogen_bonds up from 1 to 10, then the full-sheet models should be strongly favored, but the same try1 model is liked best! Perhaps the disorded helices are giving it the appearance of lots of non-alpha hbonds. Turning on try11 constraints, turning up breaks again, and setting nonalpha_hydrogen_bonds to 1 makes try8-try6.2.80 and try4.7.25 score best. The try4.7.25 model has all the sheet except the conjectured T119-F120 strand, but the helices in the linker region are a terrible mess. The try8-try6.2.80 model also has the sheet, and T119-F120 is almost in place near the C-end of the sheet --- perhaps trying to pair with R293-E295, though G236 is currently Hbonded there, instead of near I239 where it belongs. If I use all hydrogen bonds, instead of the non-alpha ones (to avoid giving bonuses to disrupting helices), the best decoys are the same 2, but try8-try6.2.80 moves up. Maybe I should go back to try11.constraints add a lot of constraints near G237-R238 to make fixing that break more important than others. 16:05 With the try12 constraints and hydrogen_bonds, the best score is for T0172.try12-try7+T0172-1bhjA-2...+T0172-1bhjA-vit-adpstyle1.pw.a2m.gz:1bhjA.0.100.pdb which looks like complete trash. With breaks turned down to 10, that is still best. With hydrogenbonds turned off, that still comes out best. The second best (try12-try4-7.1.100) also has separated the parts of the sheet. Switching back to try11.constraints, the best is still these trashy ones. I think that the constraint weight is set too high, since that seems to be the ONLY way in which the try12 runs is better. Turning it down to 0.01 from 0.1. Oops---miscounted, it's pred_alpha2 that is the problem. Turning pred_alpha2 down to 1, constraints back up to 0.1 and break back up to 20 isn't enough. Taking pred_alpha2 down to 0.1 moves try8-opt to the top. I'm confused about why pred_alpha2 is pointing in such a wrong direction, since the coloring by the "alpha" script seems to indicate that the predictions are good. Moving break up to 30 and switching back to try12.constraints, but leaving pred_alpha2 down around 0.1, makes try10.0.40 best, then try9-opt. Adding hydrogen bonds back in doesn't seem to change the order at the top. 21:36 The problem with pred_alpha2 was in the code the Jonathan added for Hbonds---it is messing up the pred_alpha2 computation somewhere---probalby a memory-management error. 22:00 the try13 run was probably useless--since it was done with Jonathan's code. Let's do it again as try14, with the old code. 22:27 Best current model is try13-try9.0.120, but I didn't include that in the try14 run, so I wonder if it will be found in try14. Oh well, on try15, I can toss that into the mix. 1 Sept 2002 08:48 Kevin Karplus Best with try12 constraints is now try14-opt (scwrl failed). Best without constraints is also try14-opt. The pieces are now almost connected, without destroying the beta sheet. The helix is disrupted from V215 to N234, and the long helix P146-A190 needs to be bent in a coupl of places to fold down tighter. There is a hydrophobic slot around L43-V60-I63 into which F163-I167-I171 might fit. Let's try adding weak constraints there as try15.constraints. 17:12 best score now try15.opt (with try15 constraints). Maybe part of the problem I'm having is that the c-terminal strands are connected to the sheet wrong---maybe there is a parallel strand involving T119-F120 between G99-M102 and I239-S243 92> lgiekVDGILMDlgv 111> lkgenrgFTFEree 238> RIVVIS 260> kklRILTEkpvr 287> RLRAAE 2 Sept 2002 11:32 Kevin Karplus The try16 run did not insert the extra strand, probably because it was starting from structures that had already assembled the other way. Even with the try16 constraints, try15-opt scores best. Let's try a run from scratch, with no initial conformations, and see what we can come up with. 2 Sept 2002 14:43 try17 ran for about an hour, when it got stuck in a very slow SCWRL run. What it has produced so far has no more than 4 strands in a sheet, so is not nearly as good as the best-scoring model (currently try15-opt). I'm not sure what to try next. Maybe I should look at the CAFASP alignments and see if anyone got a split-domain alignment for this target. SUPERFAMILY is looking for such things, so that is a possibility. Tue Sep 3 14:08:34 PDT 2002 Kevin Karplus Looking at the CAFASP results. 1) for 2ry prediction, there is pretty good agreement, except for residues 208-215 (helix or strand---we have strong helix prediction) and 263-266 (helix, strand, turn, other--we have weak strand prediction). There is even some disagreement for 286-295 (helix or strand, we have both with strand stronger). 2) the very first superfamily-pp alignment is split, but does not look very different from the T99 and T02 alignments, so is not likely to help us much. I'll try running from it anyway. 3 Sept 2002 14:43 kevin Karplus Started try18, using just a couple of good alignments as seeds and the try15.constraints (which are consistent with the older predictions) Started try19, using try16.constraints, though I don't think either of the starting alignments will be able to provide that arrangement. We may want to submit a model that does not include the middle domain. 3d-PSSM got an excellent hit to 1j4fA, which is not in our template library. It is also not in all-protein! No wonder---it is a theoretical model. Interestingly, the model was created by Bujnicki (and Rychlewski), supposedly from 1dusA. It is just a 7-strand sheet 3^2^1^4^5^7v6^ as is 1dusA. Our best current prediction (try15) has 3^2^1^4^5^6^7 Although strand 7 is predicted to be anti-parallel, the helix before it makes the parallel arrangement we're predicting more reasonable (to me). If I'm right, then it will be a god thing for our method, since 1dusA has the antiparallel arrangment that Bujnicki used. If I'm wrong---oh well. Let me make another attempt at packing the helices in the middle into the structure. There is slot with I54 and I52 exposed---perhaps the helix could fit in there with V187 and V184 in contact. Tue Sep 3 16:25:36 PDT 2002 Oops, the try16 conformations did not get properly named (there was a typo in the NameConform command), so they came out as T0172.try15-opt.pdb.4.100.pdb instead of T0172.try16-try15.4.100.pdb The first iteration of try20 just finished, but it did not do very well---it has still not brought the pieces where they need to go. Perhaps I need to turn break penalties way down for a while to get pieces into reasonable places, then turn them back up to heal the gaps. try21 will try to do just this. Tue Sep 3 17:24:28 PDT 2002 try21's first iteration is moving in a desirable direction, I think. At least the long helix has been broken up and the protein is a bit more compact, though the breaks are, as expected, terrible. Incidentally, try19 has the 1dusA alignment of the strands, but was unable to connect up s7. The middle domain is also a mess. Tries 18 and 20 have gotten stuck in long SCWRL runs---scwrl_each should be turned off until we start getting something modestly compact, where rotamers might matter. 3 Sept 2002 21:12 try21 has folded things up a bit, but I now think that trying to put the strand R287-R293 on the end of the sheet is a bad idea. We're not even certain this is a strand! It looks like it should be parallel to R263-K268, but that alread has strands on both sides. Let's let it do whatever it likes in try22, but try to pack the helices more against the sheet. We may have to do some cut-and-paste from another file, to restore helices that have been shredded, like N216-L233. Wed Sep 4 10:49:35 PDT 2002 Kevin Karplus best score with try22 constraints is try22.2.100=try22-opt. Helices are mostly wound back up again, and it looks like L222 and 226 should pack against I239 and V241. Let's try that. Wed Sep 4 14:50:31 PDT 2002 In try23.0.100, the helix K223-G237 has drifted a long way from where it belongs, which is probably packed against strand R238-F244. A constraint putting L226 near I239 might help it get back into place (or does it belong on the other side, near V240?) Currently, try23.constraints is trying to put it near V241, and L222 near I239. Perhaps OptSubtree can fix this in the superiteration?? Wed Sep 4 15:56:15 PDT 2002 I think I have the helix constraints for L222 and L226 the wrong way round in try23.constraints--- the helix should be in the OPPOSITE order to the strand it packs against. We can also try to do a better job of packing S127-L150. I think I'll go back to believing that C46-C49 is a disulphide bond, rather than a metal-binding site--exposing it would be a bit difficult since the sequence is CPGC---the PG probalby makes a tight turn. 4 Sept 2002 19:07 try24-opt: Looking better, let's up the break penalty a bit more and reoptimize. 5 sept 2002 06:48 try25-opt: the sheet has formed nicely, but the helix between K223 and G237 has still not found its place. 9:39 try26 is making very slow improvement, so I made a minor change to OptSubtree, so that it randomly decides to ignore some of the constraints derived from breaks when doing the optimization. In this way, I hope to avoid the "split-the-difference" problem that happens when there are conflicting demands onthe opposite ends of a segment. try27 will use the new version of undertaker, on a shorter run than try26, to see if faster improvement is made. 11:30 I plotted the improvement versus iteration for try26 and try27, and they track each other fairly closely, so the change does not seem to have helped at all. 11:49 try27-opt has made percious little improvement in the placement of the K223-G237 helix. I'll have to move it manually. 16:40 try28 looked a little better, but scored worse. Starting try29 with somewhat different packing constraints, to see how it will do. It will probably base too much on try27, so I'll start try30 from just try28. 20:54 try26-opt scores better than try29-opt with the try29.constraints. try30 scores much worse (worse than try25-opt and try27-opt). If we turn off break penalties, but leave in try29 constraints, the best score is still for try26-opt. Turning breaks up to 10, but removing all constraints still leaves try26-opt as the best, with try20-opt as next best, but that is a very foamy protein, with the helices sticking way out. I give up---I can't get the helices to pack nicely with undertaker in the remaining time. model 1: try26-opt model 2: try29-opt model 3: T0172-1dusA.pdb model 4: try20-opt 26 Nov 2002 Kevin Karplus best whole chain is try4.13.25, as was best on domain 2. CA RMSD whole _2 best: 11.3483 12.1486 model 1: try26-opt 20.8816 17.4086 model 2: try29-opt 20.9515 17.4927 model 3: T0172-1dusA.pdb not tested! model 4: try20-opt 20.9603 19.9299 28 Nov 2002 Kevin Karplus Model3, the simple alignment, does the best job on domain 1, and none of the attempts at domain 2 (116-216) are really worth anything, even though the secondary structure prediction was pretty good.