Wed Aug  4 15:14:42 PDT 2004
T0262

DUE 27 Aug 2004


Fri Aug  6 14:08:46 PDT 2004 Kevin Karplus

Because of a bug I introduced to Make.main, I had to remake this prediction. 

This looks like a fold-recognition model with 1top (a.39.1.5) as the template.
Problem: t2k and t04 hits do not agree: t04 puts 1q0uA (c.37.1.19) as
the top hit.

We should probably use the rr constraints (when they are generated) to
help choose between models.  Could there be 2 domains (the protein is
long enough)?

I'll make extra alignments for the top t04 hits as well as the top t2k
hits. 


Fri Aug  6 21:57:07 PDT 2004 Kevin Karplus

Unfortunately the rr constraints are VERY weak, so probably won't help.

The try1 model looks pretty bad.  The helices form, and it is fairly
compact, but the strands aren't extended, burial is pretty much
ignored, and the protein is foamy.

All the alignments in T0262.t2k.undertaker-align.pdb are very short,
so are not giving much structure.


Thu Aug 12 11:49:17 PDT 2004	Sol Katzman

The try1 model is nearly all helix. But both t2k and t04 have lots of
strand predictions that largely agree. One severe disagreement is the
region W120-L127 which t2k.str2 has helix, t04.str2 has strand.
(Note that t04 bys,stride,alpha,dssp do have HelixConstraints E119-L127)

Looking at the t2k and t04 hits for the 100-40-40-str2+CB_burial_14_7
models there are numerous SCOP domains, that we can filter assuming
that we believe the large number of anti-parallel strand predictions
from both t2k,t04:

a.39.1.5  -- all helix, EF-hand  (1top,1ncx etc.)

c.37.1.19 -- parallel beta sandwich (1q0uA,1qdeA etc.)

c.51.1.1  -- mostly parallel sheets (1h4vB[326-421],1adjA[326-421] etc.)

c.94.1.1  -- mostly parallel sheets (1aljA etc.)

d.104.1.1 -- large mixed sheets (1h4vB[2-325],1adjA[2-325] etc.)

It seems that something like d.104.1.1 is the template we want.

I will set up some strand constraints, and create a rasmol script pointed
to by 'strands' to define them, mostly from the t04.str2 prediction:

s1	V92-L94
s2	W99-R101
s3	I122-L127	# only predicted by t04.str2, others have helix
s4	R133-R137
s5	E140-Y145
s6	I149-P155
s7	L164-H166
s8	L187-L191
s9	A210-V214
s10	K225-R228
s11	V233-V236

For the fairly obvious turns, I can make anti-parallel sheet constraints:

s1 ^v s2
s3 ^v s4 ^v s5 ^v s6
s10 ^v s11

These will be included in try2.

Since s3 is questionable, and since try1 made a s1 ^v s2 sheet,
I will in parallel with try2, make another run that eliminates s3,
and uses the try1 s1 ^v s2 constraint.

That will be included in try3

For both try2 and try3, I increased the weight for constraints (10 -> 25)


Mon Aug 23 10:50:04 PDT 2004	Sol Katzman

Not getting much hbonding in the requested sheets, either in try2 or try3.

For try4, use the weights that are in vogue in the later targets
(sidechain lower, dry and wet higher), as well as increase hbond_geom_beta*,
and increase the individual weights for the SheetConstraints.


Tue Aug 24 10:07:58 PDT 2004	Sol Katzman

At our group meeting a couple of things were suggested:
  1) break this target into several domains
  2) try to get the 4 strongly conserved histidines
     (H85,H147,H166,H190) to cluster.

I created a subdirectory 75-200 and ran the base make on
it to get 75-200/try1. As for the whole model, the top hit is:
  a.39.1.5  -- all helix, EF-hand  (1top,1ncx etc.)

Looking at 1top and 75-200/try1 there is something to see.

1top consists of two domains, separated by a very long (7 turns)
straight helix. Each domain contains two separate EF Hand motifs. The
1top structure binds 2 Ca ions in one domain, and a SO4 in the other
domain. I presume that this is an artifact of the crystallization, and
that each domain could bind 2 Ca ions. The 3.0 Angstrom neighbors of
each Ca ion are a bunch of acid residues. See Branden and Tooze, 2nd
edition figures 2.13 and 6.21. Viewed from this perspective, each of
the 4 conserved histidines in T0262 could separately participate in
one of 4 binding sites, so trying to cluster them would definitely be
erroneous.

Looking more closely at the structure of 75-200/try1, it actually
corresponds fairly well to 1top, with the 7-turn linking helix from
1top reduced to a very short segment. This was obscured in the full
T0262 tries by extra helices in the preceding (1-74) and following
(201-256) regions.

One problem with this theory is that the conserved H147 is just
about where I would like to split 75-200 into two subdomains. on
the other hand, the strongly conserved R133 could participate in
ion binding in the first such subdomain.

For 75-200 try2, use the try1 sheet constraints, and increase the cost
of breaks somewhat as there are some bad breaks in try1. Use the
t2k.dssp-ehl2 constraints, except for the strand constraints that
do not correspond to the try1 sheets.


Tue Aug 24 19:47:47 PDT 2004	Sol Katzman

Looked at 75-200/try2 with Kevin. Since the putative EF-hand binding
sites have a distinct dearth of acidic residues, it seems unlikely
that this is really the function of this protein, despite its being the
closest template family. So for 75-200/try3 I am using constraints to
keep the NE2 atoms of the four conserved histidines (H85,H147,H166,H190)
together.

I also created two other subdirectories for domains which will overlap
with 75-200, namely 1-85 and 190-256.

In 1-85 we find there are no good templates, the best t2k E-value is 1.6E+01.
1-85/try1 does form a small antiparallel sheet K62-V66 ^v L75-M71 so include
that constraint in 1-85/try2.

In 190-256 there is also not much to go on, the best t2k E-value is 8.5E+00.
190-256/try1 does not look like much of anything. For 190-256/try2, I will include
the s10 ^v s11 constraint from the whole target.


Wed Aug 25 09:37:38 PDT 2004	Sol Katzman

75-200/try3 did group the histidines as desired, although it introduced
a number of breaks.

1-85/try2 does not look much better than try1.

190-256/try2 formed a little bit of the s10 ^v s11 sheet that we were
looking for, but is still not a great model.

Created a chimera from 1-85/try2 + 75-200/try3 + 190-256/try2:

   merge.d1-try2.d2-try3.d3-try2.under

   printAllConformPDB \
   T0262.chimer.d1-try2.d2-try3.d3-try2.pdb \
   superpose \
   atom T78.CA atom L79.CA atom A80.CA atom G81.CA \
   atom T192.CA atom S193.CA atom S194.CA atom L195.CA

The superposition is good for 75-200 + 190-256. In particular one of
the conserved histidines H190 overlaps. But 1-85 and 75-200 do not
overlap, with H85 in two completely different places. Something
wrong with my undertaker command?


Wed Aug 25 12:45:49 PDT 2004 Kevin Karplus

I'm seeing almost perfect overlap for T78-R84, so  I assume that Sol
means that the 1-85+75-200 overlap was fine, but the 75-200+190-256 failed.
I'm not sure why that happened, but I suspect an undertaker bug having
to do with incomplete conformations.

I'll look into it.


Wed Aug 25 13:02:28 PDT 2004 Kevin Karplus

I found the problem, but have not fixed it---it would require an
algorithm change to the method for figuring out superposition that
will require some thought.  As a quick workaround, putting the common
part as the first conformation should fix the problem (the initial
conformation is taken from the first one, and ones that don't have any
atoms in common with it may end up locked into arbitrary positions).

The merge.d2-try3.d1-try2.d3-try2.under does the different order,
correctly creating T0262.chimer.d2-try3.d1-try2.d3-try2.pdb
which has some really bad clashes.  It may be possible to reshape it
by optimizing (after cutting and pasting to make a single chain).


Wed Aug 25 14:37:30 PDT 2004	Sol Katzman

To avoid confusion, I renamed the two (3-model) superposition files
created above to: (and I also edited the merge.*.under to use these
names)

  T0262.superpose.d2-try3.d1-try2.d3-try2.pdb
  T0262.superpose.d1-try2.d2-try3.d3-try2.pdb

After cutting out the overlap, and renumbering in rasmol,
the actual (single model) chimera is here:

  decoys/T0262.chimer.d2-try3.d1-try2.d3-try2.pdb

As Kevin pointed out, there are some bad clashes:

1-85   and 75-200 do NOT clash
1-85   and 190-256 clash a lot
75-200 and 190-256 clash a lot

So the key is to move 190-256 if possible. I may pursue this
with DeepView.


Wed Aug 25 17:15:35 PDT 2004	Sol Katzman

We never used the clustered histidine constraints on the whole
protein. Turn down the previous (whole try4 etc.)  weights on
SheetConstraints and add the HIS constraints for whole try5, starting
with TryAllAlign.

For whole try6, use the same constraints as try5, but read in the 3
versions of the chimera that I modified in DeepView (dv1,dv2,dv3), as
well as the unmodified (highly clashing) original chimera. (I cannot
say that I am very fond of any of these):

 T0262.chimer.d2-try3.d1-try2.d3-try2.pdb
 T0262.chimer.dv1.d2-try3.d1-try2.d3-try2.pdb
 T0262.chimer.dv2.d2-try3.d1-try2.d3-try2.pdb
 T0262.chimer.dv3.d2-try3.d1-try2.d3-try2.pdb


Wed Aug 25 18:10:24 PDT 2004 Kevin Karplus

I picked up the templates from the three subdirectories and added them
to MANUAL_TOP_HITS, and am making extra_alignments and all-align.
When that is done, I'll start try7 (no SCWRL on all-align, since it
will be so huge).

The try7 score function has only the histidine packing (no strands or
helices, since those seem to be inconsistently predicted between the
whole protein and the subdomains).  Of the current models, it likes 
T0262.chimer.dv1.d2-try3.d1-try2.d3-try2.pdb best, which appears to
have been hand-crafted to have the histidines close.


Thu Aug 26 09:52:07 PDT 2004	Sol Katzman

Regarding chimer.dv1,dv2,dv3 -- they should all have the histidines
close because they only moved residues 192-256 (and tried not to
introduce a large break between 191 and 192). The chimer produced from
the superpose (from which the deepview models in turn were
constructed) did the cut and paste in the middle of the overlap
sections, preserving residues 80-193 intact from the 75-200/try3
model, thus including the four histidines from that model.

The try7 costfcn likes try7 best, then try6,try5,chimer.dv1,dv2,dv3.

The unconstrained costfcn (nearly the same as try7 costfcn without
the histidines, and a little higher soft-clashes and break weights)
also likes try7 best, then try1,try6,try4.

Rosetta likes the repacked models in the order try6,try7,try4,try5.


Thu Aug 26 10:29:26 PDT 2004 Kevin Karplus

I'd like to start a try8 run, like the try7 run from alignments, but
with helix and strand constraints as well as the strong
histidine-clustering constraints.

I made a rasmol script "hist" that shows the conserved histidines,
defining set "histcons" in the process.

The try8 costfcn likes try7 best then try6.


Thu Aug 26 13:59:01 PDT 2004 Kevin Karplus

After doing the try8 optimization from alignments, the try8 costfcn orders:
try7-opt2, try6-opt2, try8-opt2 (all fairly close in cost). I'm not
sure what to do next on this one---pick models and submit? Try more runs?


Thu Aug 26 14:41:46 PDT 2004 Kevin Karplus

I'll superimpose the top 3 contenders and see what I think.
(Rosetta dislikes least  T0262.try6-opt2.repack-nonPC, but it hates
them all.)

I think I like try8-opt2 best of the bad lot, though it does not score
as well---at least it gets some beta sheet pieces.

The question is---do I try to polish it, or do I submit as is?
The chance of significant improvement is small.

The unconstrained costfcn likes best try7-opt2, try8-opt2, try1-opt2,
try6-opt2, try4-op2.repack-nonPC.

try4 may have the best secondary-structure match, but does not cluster
the histidines.

I'll submit
	try8-opt2
	try7-opt2
	try6-opt2
	try4-opt2.repack-nonPC
	try1-opt2

There are no template alignments long enough to be worth submitting.


Thu Nov 18 23:46:19 PST 2004	Martina Koeva

Based on the smooth gdt scores:

best sam-t04	15.3386	(also align2)
best submit	14.8555 (model3) 
model1		10.7583
auto		12.7065
align		11.6800
robetta best	21.6417 (also robetta model1)
robetta1	21.6417