T0132

6 June 2002 Kevin Karplus

This looks like a fold-recognition/homology-modeling target with 1bvqA
as the template. 

1bvqA is a homotetramer 4-hydoxylbenzoyl CoA thioesterase, while the
target is YCIA_HAEIN Putative acyl-CoA thioester hydrolase HI0827.

The catalytic residue in 1bvqA is supposed to be D17, which seems to
correspond to D27 in the target, which is NOT well-conserved in the
t2k alignment.  The Pfam family for the target for PF01662 while the
template is Pfam 4HBT_PSESP (P56653).

The highly-conserved part of the Pfam target alignment is P25, D27,
G36, G37.  Perhaps I need to filter the t2k alignment to insist on
D27, then use the filtered alignment as a seed?  I'll try this in
subdirectory "selected".

1bvqA is a homotetramer, with two chains joining along the C-most
strands of the beta sheet to make a longer beta sheet, then the two
large sheets rotated 180 degrees to bring the D17 residues close to
each other.

In 1c8u there are 4 copies of the domain, two linked in each chain,
but the two large sheets dimerize in the opposite way from 1bvqA, with
the sheets together and the helices on the outside.
There seems to be an ASP at the N-terminal end of the long helices in
1c8u also.

7 June 2002 Kevin Karplus

I'm having trouble with undertaker crashing, so I can't get structures
really optimized for the score.  The best ones seem to be based mainly
on 1bvqA (no surprise there!).  There is a beta bulge at L86, K87 that
seems to be making the turn be offset by one--I wonder if we can fix that?

There is also a difficult gap to close between R58 and V59 that may
require repacking the helix against the sheet.  At the other end of
the helix, there is a hydrogen bond between atom 245 (N of I34) and
atom 543 (O of I73) that maybe should be extended to an antiparallel
sheet for DIF 33-35.


We may want to fiddle with the alignment of the strand from V59 to M67
(or G57 to F69), since that seems to be a bit problematic.

13 June 2002 Kevin Karplus

In try5-opt, the helix 37-54 has been changed into a beta sheet!  This
is a bit suspicious, to say the least, since 1bvqA DOES have a helix
in that part of the alignment.  Note: 
T0132-try3+2bpa1+1iq0A+1bvqA+1bvqA.13.20.pdb, which scores better does
have helices, though it has stripped one of the strands off the sheet.

25 June 2002 Kevin Karplus

In try6-opt, strand 16-23 has been changed into a helix.
The alignments in T0132.t2k-2track-undertaker.a2m to 1bvqA and 1c8uA
have all 5 strands aligned AND have a helix nicely packed against the sheet.

Perhaps I should tighten up the NUM_BEST and BEST_EVALUE again, and
remake the starting alignments.  (It really is a problem that the
undertaker scoring function can't select out the right template---but
not too surprising as the break cost makes gaps very expensive.)

The score function still prefers
	T0132-try3+2bpa1+1iq0A+1bvqA+1bvqA.13.20.pdb.gz

26 June 2002 Kevin Karplus

try7-opt has a new best score, quite a bit lower than the old best.

Strand 16-23 is still being wound up into a helix.  Perhaps we need to
add some constraints to hold the sheet together, at least until Hbond
scoring is written.

12 July 2002 Kevin Karplus

I reran undertaker with the fragments from the new fragfinder.
try9-opt does not score quite as well as try7-opt and Strand 16-23 is
still being wound up into a helix.  The long helix that the sheet
wrpas around is broken up.

16 July 2002

I finally looked at the results in "selected" which started with the
t2k alignment with sequences with P25 and DE27 selected as the seed.

This now looks like it gets MUCH stronger scores for 1bvqA, 1lo7A,
1krs, and 1c8uA.  Unfortunately, this is an illusion due to bugs in
the way the template scores are computed and included---the template
library was run on ALL the sequences in the seed, and the combining
method did not select out just those for the target sequence, so many
probabilities were multiplied together, producing a gross
underestimate of the probability.  I fixed the casp5/Make.main file to
avoid this mistake in future and reran---the results are in fact
slightly WEAKER than before.

I'll try running undertaker starting from these alignments anyway.
HMM---it isn't going to work, because all those extra sequences in the
alignment get put into the a2m files, and undertaker tries to
interpret them as PDB files, with nasty results.  This even makes a
security hole, as the "pdb-get" command didn't have quotes around the
file name it was looking up, and the sequence names have "|" in them!


I ran undertaker again (try11) and got ludicrous results, with all the
strands would up into helices, probably because of a typo in the
undertaker.script file (Couldn't open file
T0132-t2k-2track-undertaker.a2m for input).

On try12, undertaker segfaulted after 51 minutes.  Although debugging
the seg fault would be virtuous and would probably save time in the
long run, I'm not feeling like debugging right now (particularly not
if I have to wait an hour for the failure).

Maybe I should try adding a constraint to hold the first strand in place.

Let's try adding a CB constraint to R20 and Q84, and to the residues
before it.

16> VLLLRTLA
      |||
88< VKLCQGYCCWW

17 July 2002 Kevin Karplus

I showed the rest of the CASP5 team how to add constraints by adding a
constraint to define-score.script to keep the helix straight
	Constraint 273	385	20 22.5  27	// CA W38 	CA E53
and fixed a bug in undertaker.script that prevented all the alignments
from being read correctly.

The cost function still prefers T0132.try7-opt.pdb, but
T01232.try14-opt.pdb looks much better to me.  We need to look at what
components of the score function contribute to the better score for
try7, and re-weight appropriately.  Note: try14-opt still has a few bad
breaks that need to be resolved.


18 July 2002 Kevin Karplus

We can compare the costs for try7 (which I don't like) and try14
(which I do like), and adjust the weights to be bigger where try7 has
the larger cost.

name            	length	gen6.5	wet6.5	dry6.5	dry8	dry12	radius_norm	radius_fit
T0132.try7-opt.pdb	154	3.56617	0.61065	3.38836	3.69612	3.74141	3.09104 	0.839003
T0132.try14-opt.pdb	154	3.34621	0.60679	3.18575	3.58241	3.69001	3.09258 	0.849246  
approx difference		0.22	0.004	0.20	0.11	0.05	-0.0015		-0.01

name            	sidechain	clashes	sidechain_clashes	backbone_clashes   break
T0132.try7-opt.pdb	-2.80537	1.35065	0.61039         	0.0324675	0.0627997
T0132.try14-opt.pdb	-2.77252	1.75974	1.07143         	0.12987        	0.157677
approx difference	-0.033		-0.409	-0.46			-0.10		-0.095

name            	constraints	alpha	alpha_prev	contact_order	cost
T0132.try7-opt.pdb	22.3164 	1.18207	1.25568 	-1.18967	15.1907
T0132.try14-opt.pdb	28.9556 	1.31104	1.31978 	-1.17394	16.1675
approx difference	-6.64		-0.13	-0.07		-0.016		-1.04

The biggest differences are the constraints and the clashes
(particularly sidechain clashes).  The clashes might be fixed by
scwrl, but I have to look why the constraints are favoring
try7---perhaps I've mis-specified a constraint!

Here are the currently defined constraints:
    // add constraints to hold first strand onto sheet
    Constraint 131	629	2 5.12 7	// CB L18 	CB L86
    Constraint 139	623	2 5.12 7	// CB L19 	CB C85
    Constraint 147	614	2 5.12 7	// CB R20 	CB Q84


    // add constraint to keep helix straight
    Constraint 273	385	20 22.5  27	// CA W38 	CA E53

The atom numbers are correct for the specified atoms.

In try14, we have H bonds
	C81 N	L19 O
	C81 O	L19 N
	G83 N	L17 O
	G83 O	L17 N

These are quite different from the constraints I put in to try to hold
the first strand in place---perhaps I was trying to hold it to the
wrong place.  Let's just remove the first set of constraints.
Without them the TERRIBLE try11-opt scores best.

Let's try adding the H-bond constraints hold the first strand on in
try14.  Now the two best are T0132.try8.6.30.pdb and
T0132.try14-opt.pdb, both of which look good.  Let's do another run
with the new constraints and run scwrl.

Try15 gets a new best score, and the first strand looks fine, but now
the last strand is drifting away,and we still have a bad break between
T121 and F122.  Maybe we need to add some more constraints, to try to
guide F122 to the right place and to hold on E65-I70.  F122 should be
double-Hbonded to I93.

// add constraints to close gap before F122
Constraint	678	919	2.0 2.7 3.2 	// I93 N F122 O
Constraint	684	910	2.0 2.7 3.2 	// I93 O F122 N

// add constraints to keep last strand close
Constraint	494	870	2.0 2.7 3.2 	// N68 N T116 O
Constraint	500	865	2.0 2.7 3.2 	// N68 O T116 N

This change in the score function makes T0132.try15-al6.18.25.pdb
score best, and it does look pretty good, but the last strand is only
attached where the constraints force it--we may want to stitch it down
in some other place.  I'm not sure where though, so let's see what
happens if we just use this score function.

In try16-opt-scwrl, the sheet looks fairly good, though some H bonds
for A63 or V64 would help get the edge strand nailed down.  The helix
across the middle (G36-H56) looks good.

Let's try to extend the last strand back a bit by adding H-bonds
	N S66	O V119
	O S66	N V119
This score function still liked try16-opt best.

19 July 2002 Kevin Karplus

try17-opt looks almost identical with try16-opt, probably because it
started with try16-opt.  Let's try again WITHOUT seeding in initial conformation.

Try18 does not score as well as try17 or try16, probably because of
breaks, but I like the way it packs the short helix better.  I'll try
reoptimizing with it as a starting point.  

Try19 still has some bad breaks, but otherwise looks good.  K8-K14 are
sticking out in a badly packed loop---perhaps we'd like to encourage
G10 close to C85 to fold the loop down?  I think I'll wait on
that---I'm not sure it's the right direction.

Try16 and Try17 still score better with the current scoring
function---perhaps I should add something to get the helices to stay
together?  Maybe I should try to make CD1 of I149 try to get close to
CA of M45?  We could also use constraints to try to close the gap
after Y113, by continuing the Hbonding pattern of the strands.

With the new constraints and a slight increase to the penalty for
breaks, the best-scoring decoy is T0132.try17-al2.4.20.pdb, which
doesn't have the helices oriented the way I like them, so let's add
another constraint---in fact, let's change the helix packing from 
	Constraint 1129 335	0 3.1 5	// CD1 I149  CA M45
to
	Constraint 1129 355	0 3.1 5	// CD1 I149  CA I49
	Constraint 1077	376	0 3.1 5	// CD2 L142  CA K52
based on the alignment of the helices in T0132.try19.6.20.pdb, which
is the best-scoring of the decoys that is not in the try17... helix
conformation.  Hmm---still not enough---it is the OTHER end of the
helix that is swinging around. Let's add
	Constraint 1146	336	0 4 8	// C E151 CG M45

Although T0132.try17-al2.4.20.pdb still scores best, the difference
is small, and a small improvement in the breaks in try19.6.20 would
make it best.


20 July 2002 Kevin Karplus

try20-opt-scwrl is looking pretty good.  The dimerizing edge is free
and the helices are in a good position.  There is a nice cluster of
sulfurs for M40, M43, and M67.

The small helix is still not
as tightly packed as I'd like to see. Perhaps I should put back the constraint
	1129 	CD1 I149 CA M45	334
add	1108 CD2 L146 CB A48 351
and remove Constraint 1146	336	0 4 8	// C E151 CG M45


I could also try fixing up a little bit at the N terminus by adding a
couple of H-bonds to tack down G10 and R11, which are predicted to be
a strand.  It is a little tricky, since the current turn is twisting
the end of the next strand, and I don't know how the pairing should
go.  I'm going to guess that R11 pairs with V16:
	Constraint 75	119	2.0 2.7 3.2 	// N R11 O V16
	Constraint 84	114	2.0 2.7 3.2 	// O R11 N V16

With the modifications to the score function, try20-opt-scwrl still
scores the best.

20 July 2002 Kevin Karplus

try21-opt is now the best scorer (slightly better than
try21-opt-scwrl, though they have the same backbone).

The helices are still not tightly packed against the sheet, but I
doubt that I'll be able to get them much tighter, unless I reduce
clash penalties and introduce some sort of "jiggling" operators to
make small movements.  We may want to trim off up to residue 9 or 10,
as the N terminus is almost certainly wrong, but I'm not sure how to
fix it.

23 July 2002 Kevin Karplus

Redid the scoring function using the new pred_alpha cost function.
Try21-opt is still the best scoring, but I'll start a new run with the
new scoring function.

OOPS---not done right---I was missing the weight after pred_alpha2.
The best is still try21-opt, though, after scoring with the fixed function.

Now try22-opt is best.  It has 4 breaks: 58-59, 124-125, 65-66, and
115-116 in decreasing order of severity.  

I'm not sure how much more effort is justified for this target---the
improvements are now pretty small, and the conformation is still a bit
"foamy"---not as tightly packed as I'd like.


25 July 2002 Kevin Karplus

Trying the new "JiggleSubtree" operator to see if it can pack
sidechains better.  If it works, I'll try to create a more directed
search for subtree twiddling.

Hmm---the cost drops quite a bit even on the first iteration of try23,
when using a high probability for trying JiggleSubtree.  I should
probably reduce the probability somewhat, as ReduceBreak and CloseGap
may be needed to clean up the gaps that jiggle may make worse.


I wonder if the gap near V115 could be closed better if we replaced 
Constraint 852	754	2.0 2.7 3.2	// N C114 O V101
with a consraint on O C114 and N V101 instead.
We might also want to add Hbonds for O V115 and  N I70 and
N V115, O I70.


try23-opt scores much better than previous runs, and the helices seem
to pack quite tightly.  There are still some bad breaks though, and I
think the N terminus is on the wrong side of the sheet.  Let's put a
weak constraint to have S2 or N4 near Q42.  This should be a pretty
weak constraint though, as we don't know where it really should go.

The helix-packing constraints I added may have caused the small breaks
in the helices.  Let's try removing them again and seeing if the
jiggling will still pack the helices..

With the modified scoring scheme, try23-opt-scwrl beats out try23-opt,
probably as a result of removing the helix-packing constraints that
pulled some sidechains into unusual positions.


25 July 2002 Kevin Karplus

Trying again with the new OptSubtree operator and the modified scoring
function makes for even more improvements.  

The new best-scorer is try24-opt-scwrl, which is looking pretty good.
I could try running again, since I've added some more packing
operators, and there is still a little foaminess to the structure.
Also, I think that the S2-Q42 constraint is wrong though, so I should
probably make another run after removing it.


26 July 2002 Kevin Karplus

Although try25-opt-scwrl is a new lowest cost model, I don't like it much.
I think there is a bug in the constraints--T116 should be close to
V99, and N68 should be close to D117.  After fixing these constraints,
the best scorer is try25.9.40.  Adding dry5 to the score function and
increasing the weight for clashes (to try to favor better packing)
makes try25.4.40 score best.  The sheet has curled up enough in both
of these to push the shorter helix out so it doesn't pack well.

Let's try starting another run and seeing what comes out.

Looking at the first iteration of try26, I think I need to add some more
Hbond constraints---like V97 A118 and I95 F120, which are getting a
bit stretched in the model.  I let the try26 run finish and then
rescored with the "improved" constraints, which added several of the
hbonds for strands and removed a few around the bulge on the edge strand.
try26-opt-scwrl comes out best with both the new scoring function and
the old one, and it looks pretty good.

Let's do one more short run with the newest version of undertaker, which has
improved operators.