T0132 6 June 2002 Kevin Karplus This looks like a fold-recognition/homology-modeling target with 1bvqA as the template. 1bvqA is a homotetramer 4-hydoxylbenzoyl CoA thioesterase, while the target is YCIA_HAEIN Putative acyl-CoA thioester hydrolase HI0827. The catalytic residue in 1bvqA is supposed to be D17, which seems to correspond to D27 in the target, which is NOT well-conserved in the t2k alignment. The Pfam family for the target for PF01662 while the template is Pfam 4HBT_PSESP (P56653). The highly-conserved part of the Pfam target alignment is P25, D27, G36, G37. Perhaps I need to filter the t2k alignment to insist on D27, then use the filtered alignment as a seed? I'll try this in subdirectory "selected". 1bvqA is a homotetramer, with two chains joining along the C-most strands of the beta sheet to make a longer beta sheet, then the two large sheets rotated 180 degrees to bring the D17 residues close to each other. In 1c8u there are 4 copies of the domain, two linked in each chain, but the two large sheets dimerize in the opposite way from 1bvqA, with the sheets together and the helices on the outside. There seems to be an ASP at the N-terminal end of the long helices in 1c8u also. 7 June 2002 Kevin Karplus I'm having trouble with undertaker crashing, so I can't get structures really optimized for the score. The best ones seem to be based mainly on 1bvqA (no surprise there!). There is a beta bulge at L86, K87 that seems to be making the turn be offset by one--I wonder if we can fix that? There is also a difficult gap to close between R58 and V59 that may require repacking the helix against the sheet. At the other end of the helix, there is a hydrogen bond between atom 245 (N of I34) and atom 543 (O of I73) that maybe should be extended to an antiparallel sheet for DIF 33-35. We may want to fiddle with the alignment of the strand from V59 to M67 (or G57 to F69), since that seems to be a bit problematic. 13 June 2002 Kevin Karplus In try5-opt, the helix 37-54 has been changed into a beta sheet! This is a bit suspicious, to say the least, since 1bvqA DOES have a helix in that part of the alignment. Note: T0132-try3+2bpa1+1iq0A+1bvqA+1bvqA.13.20.pdb, which scores better does have helices, though it has stripped one of the strands off the sheet. 25 June 2002 Kevin Karplus In try6-opt, strand 16-23 has been changed into a helix. The alignments in T0132.t2k-2track-undertaker.a2m to 1bvqA and 1c8uA have all 5 strands aligned AND have a helix nicely packed against the sheet. Perhaps I should tighten up the NUM_BEST and BEST_EVALUE again, and remake the starting alignments. (It really is a problem that the undertaker scoring function can't select out the right template---but not too surprising as the break cost makes gaps very expensive.) The score function still prefers T0132-try3+2bpa1+1iq0A+1bvqA+1bvqA.13.20.pdb.gz 26 June 2002 Kevin Karplus try7-opt has a new best score, quite a bit lower than the old best. Strand 16-23 is still being wound up into a helix. Perhaps we need to add some constraints to hold the sheet together, at least until Hbond scoring is written. 12 July 2002 Kevin Karplus I reran undertaker with the fragments from the new fragfinder. try9-opt does not score quite as well as try7-opt and Strand 16-23 is still being wound up into a helix. The long helix that the sheet wrpas around is broken up. 16 July 2002 I finally looked at the results in "selected" which started with the t2k alignment with sequences with P25 and DE27 selected as the seed. This now looks like it gets MUCH stronger scores for 1bvqA, 1lo7A, 1krs, and 1c8uA. Unfortunately, this is an illusion due to bugs in the way the template scores are computed and included---the template library was run on ALL the sequences in the seed, and the combining method did not select out just those for the target sequence, so many probabilities were multiplied together, producing a gross underestimate of the probability. I fixed the casp5/Make.main file to avoid this mistake in future and reran---the results are in fact slightly WEAKER than before. I'll try running undertaker starting from these alignments anyway. HMM---it isn't going to work, because all those extra sequences in the alignment get put into the a2m files, and undertaker tries to interpret them as PDB files, with nasty results. This even makes a security hole, as the "pdb-get" command didn't have quotes around the file name it was looking up, and the sequence names have "|" in them! I ran undertaker again (try11) and got ludicrous results, with all the strands would up into helices, probably because of a typo in the undertaker.script file (Couldn't open file T0132-t2k-2track-undertaker.a2m for input). On try12, undertaker segfaulted after 51 minutes. Although debugging the seg fault would be virtuous and would probably save time in the long run, I'm not feeling like debugging right now (particularly not if I have to wait an hour for the failure). Maybe I should try adding a constraint to hold the first strand in place. Let's try adding a CB constraint to R20 and Q84, and to the residues before it. 16> VLLLRTLA ||| 88< VKLCQGYCCWW 17 July 2002 Kevin Karplus I showed the rest of the CASP5 team how to add constraints by adding a constraint to define-score.script to keep the helix straight Constraint 273 385 20 22.5 27 // CA W38 CA E53 and fixed a bug in undertaker.script that prevented all the alignments from being read correctly. The cost function still prefers T0132.try7-opt.pdb, but T01232.try14-opt.pdb looks much better to me. We need to look at what components of the score function contribute to the better score for try7, and re-weight appropriately. Note: try14-opt still has a few bad breaks that need to be resolved. 18 July 2002 Kevin Karplus We can compare the costs for try7 (which I don't like) and try14 (which I do like), and adjust the weights to be bigger where try7 has the larger cost. name length gen6.5 wet6.5 dry6.5 dry8 dry12 radius_norm radius_fit T0132.try7-opt.pdb 154 3.56617 0.61065 3.38836 3.69612 3.74141 3.09104 0.839003 T0132.try14-opt.pdb 154 3.34621 0.60679 3.18575 3.58241 3.69001 3.09258 0.849246 approx difference 0.22 0.004 0.20 0.11 0.05 -0.0015 -0.01 name sidechain clashes sidechain_clashes backbone_clashes break T0132.try7-opt.pdb -2.80537 1.35065 0.61039 0.0324675 0.0627997 T0132.try14-opt.pdb -2.77252 1.75974 1.07143 0.12987 0.157677 approx difference -0.033 -0.409 -0.46 -0.10 -0.095 name constraints alpha alpha_prev contact_order cost T0132.try7-opt.pdb 22.3164 1.18207 1.25568 -1.18967 15.1907 T0132.try14-opt.pdb 28.9556 1.31104 1.31978 -1.17394 16.1675 approx difference -6.64 -0.13 -0.07 -0.016 -1.04 The biggest differences are the constraints and the clashes (particularly sidechain clashes). The clashes might be fixed by scwrl, but I have to look why the constraints are favoring try7---perhaps I've mis-specified a constraint! Here are the currently defined constraints: // add constraints to hold first strand onto sheet Constraint 131 629 2 5.12 7 // CB L18 CB L86 Constraint 139 623 2 5.12 7 // CB L19 CB C85 Constraint 147 614 2 5.12 7 // CB R20 CB Q84 // add constraint to keep helix straight Constraint 273 385 20 22.5 27 // CA W38 CA E53 The atom numbers are correct for the specified atoms. In try14, we have H bonds C81 N L19 O C81 O L19 N G83 N L17 O G83 O L17 N These are quite different from the constraints I put in to try to hold the first strand in place---perhaps I was trying to hold it to the wrong place. Let's just remove the first set of constraints. Without them the TERRIBLE try11-opt scores best. Let's try adding the H-bond constraints hold the first strand on in try14. Now the two best are T0132.try8.6.30.pdb and T0132.try14-opt.pdb, both of which look good. Let's do another run with the new constraints and run scwrl. Try15 gets a new best score, and the first strand looks fine, but now the last strand is drifting away,and we still have a bad break between T121 and F122. Maybe we need to add some more constraints, to try to guide F122 to the right place and to hold on E65-I70. F122 should be double-Hbonded to I93. // add constraints to close gap before F122 Constraint 678 919 2.0 2.7 3.2 // I93 N F122 O Constraint 684 910 2.0 2.7 3.2 // I93 O F122 N // add constraints to keep last strand close Constraint 494 870 2.0 2.7 3.2 // N68 N T116 O Constraint 500 865 2.0 2.7 3.2 // N68 O T116 N This change in the score function makes T0132.try15-al6.18.25.pdb score best, and it does look pretty good, but the last strand is only attached where the constraints force it--we may want to stitch it down in some other place. I'm not sure where though, so let's see what happens if we just use this score function. In try16-opt-scwrl, the sheet looks fairly good, though some H bonds for A63 or V64 would help get the edge strand nailed down. The helix across the middle (G36-H56) looks good. Let's try to extend the last strand back a bit by adding H-bonds N S66 O V119 O S66 N V119 This score function still liked try16-opt best. 19 July 2002 Kevin Karplus try17-opt looks almost identical with try16-opt, probably because it started with try16-opt. Let's try again WITHOUT seeding in initial conformation. Try18 does not score as well as try17 or try16, probably because of breaks, but I like the way it packs the short helix better. I'll try reoptimizing with it as a starting point. Try19 still has some bad breaks, but otherwise looks good. K8-K14 are sticking out in a badly packed loop---perhaps we'd like to encourage G10 close to C85 to fold the loop down? I think I'll wait on that---I'm not sure it's the right direction. Try16 and Try17 still score better with the current scoring function---perhaps I should add something to get the helices to stay together? Maybe I should try to make CD1 of I149 try to get close to CA of M45? We could also use constraints to try to close the gap after Y113, by continuing the Hbonding pattern of the strands. With the new constraints and a slight increase to the penalty for breaks, the best-scoring decoy is T0132.try17-al2.4.20.pdb, which doesn't have the helices oriented the way I like them, so let's add another constraint---in fact, let's change the helix packing from Constraint 1129 335 0 3.1 5 // CD1 I149 CA M45 to Constraint 1129 355 0 3.1 5 // CD1 I149 CA I49 Constraint 1077 376 0 3.1 5 // CD2 L142 CA K52 based on the alignment of the helices in T0132.try19.6.20.pdb, which is the best-scoring of the decoys that is not in the try17... helix conformation. Hmm---still not enough---it is the OTHER end of the helix that is swinging around. Let's add Constraint 1146 336 0 4 8 // C E151 CG M45 Although T0132.try17-al2.4.20.pdb still scores best, the difference is small, and a small improvement in the breaks in try19.6.20 would make it best. 20 July 2002 Kevin Karplus try20-opt-scwrl is looking pretty good. The dimerizing edge is free and the helices are in a good position. There is a nice cluster of sulfurs for M40, M43, and M67. The small helix is still not as tightly packed as I'd like to see. Perhaps I should put back the constraint 1129 CD1 I149 CA M45 334 add 1108 CD2 L146 CB A48 351 and remove Constraint 1146 336 0 4 8 // C E151 CG M45 I could also try fixing up a little bit at the N terminus by adding a couple of H-bonds to tack down G10 and R11, which are predicted to be a strand. It is a little tricky, since the current turn is twisting the end of the next strand, and I don't know how the pairing should go. I'm going to guess that R11 pairs with V16: Constraint 75 119 2.0 2.7 3.2 // N R11 O V16 Constraint 84 114 2.0 2.7 3.2 // O R11 N V16 With the modifications to the score function, try20-opt-scwrl still scores the best. 20 July 2002 Kevin Karplus try21-opt is now the best scorer (slightly better than try21-opt-scwrl, though they have the same backbone). The helices are still not tightly packed against the sheet, but I doubt that I'll be able to get them much tighter, unless I reduce clash penalties and introduce some sort of "jiggling" operators to make small movements. We may want to trim off up to residue 9 or 10, as the N terminus is almost certainly wrong, but I'm not sure how to fix it. 23 July 2002 Kevin Karplus Redid the scoring function using the new pred_alpha cost function. Try21-opt is still the best scoring, but I'll start a new run with the new scoring function. OOPS---not done right---I was missing the weight after pred_alpha2. The best is still try21-opt, though, after scoring with the fixed function. Now try22-opt is best. It has 4 breaks: 58-59, 124-125, 65-66, and 115-116 in decreasing order of severity. I'm not sure how much more effort is justified for this target---the improvements are now pretty small, and the conformation is still a bit "foamy"---not as tightly packed as I'd like. 25 July 2002 Kevin Karplus Trying the new "JiggleSubtree" operator to see if it can pack sidechains better. If it works, I'll try to create a more directed search for subtree twiddling. Hmm---the cost drops quite a bit even on the first iteration of try23, when using a high probability for trying JiggleSubtree. I should probably reduce the probability somewhat, as ReduceBreak and CloseGap may be needed to clean up the gaps that jiggle may make worse. I wonder if the gap near V115 could be closed better if we replaced Constraint 852 754 2.0 2.7 3.2 // N C114 O V101 with a consraint on O C114 and N V101 instead. We might also want to add Hbonds for O V115 and N I70 and N V115, O I70. try23-opt scores much better than previous runs, and the helices seem to pack quite tightly. There are still some bad breaks though, and I think the N terminus is on the wrong side of the sheet. Let's put a weak constraint to have S2 or N4 near Q42. This should be a pretty weak constraint though, as we don't know where it really should go. The helix-packing constraints I added may have caused the small breaks in the helices. Let's try removing them again and seeing if the jiggling will still pack the helices.. With the modified scoring scheme, try23-opt-scwrl beats out try23-opt, probably as a result of removing the helix-packing constraints that pulled some sidechains into unusual positions. 25 July 2002 Kevin Karplus Trying again with the new OptSubtree operator and the modified scoring function makes for even more improvements. The new best-scorer is try24-opt-scwrl, which is looking pretty good. I could try running again, since I've added some more packing operators, and there is still a little foaminess to the structure. Also, I think that the S2-Q42 constraint is wrong though, so I should probably make another run after removing it. 26 July 2002 Kevin Karplus Although try25-opt-scwrl is a new lowest cost model, I don't like it much. I think there is a bug in the constraints--T116 should be close to V99, and N68 should be close to D117. After fixing these constraints, the best scorer is try25.9.40. Adding dry5 to the score function and increasing the weight for clashes (to try to favor better packing) makes try25.4.40 score best. The sheet has curled up enough in both of these to push the shorter helix out so it doesn't pack well. Let's try starting another run and seeing what comes out. Looking at the first iteration of try26, I think I need to add some more Hbond constraints---like V97 A118 and I95 F120, which are getting a bit stretched in the model. I let the try26 run finish and then rescored with the "improved" constraints, which added several of the hbonds for strands and removed a few around the bulge on the edge strand. try26-opt-scwrl comes out best with both the new scoring function and the old one, and it looks pretty good. Let's do one more short run with the newest version of undertaker, which has improved operators.