Wed Jun  9 10:20:50 PDT 2004
T0199

DUE 3 Aug 2004

Wed Jun  9 12:31:22 PDT 2004	Kevin Karplus

According to the documentation, the function of this protein is known:
	heat shock operon repressor HrcA
We'd expect to see a DNA-binding motif in a repressor.

Our best alignments are to 1mkmA, but we seem to be getting other hits
to superfamily a.4.5.* also.  None of the hits are very strong, but
they are consistent with each other.  This is a big protein though, so
that hit is probably only for one domain (probably through L130, based
on predicted helices).

We may have to split this protein into separate domains and model the
parts separately.  In addition to a domain break around 130 there may
be another one around L195-V200 (between the predicted helices), based
on the few non-full-length sequences in the t2k alignment.

Wed Jun  9 22:13:24 PDT 2004

Yes---it definitely looks like we need to split this into domains.

It looks like the first domain is about 1-114 and is a fairly easy
fold-recognition target.  The other domains may be harder.
I should set up subdirectories for each of the domains (perhaps with
small overlaps, so that we can paste domains back together).

I'll do 1-130 , 111-210, 190-end.

Maybe I'll run try2 first, to see if that helps me find domain
boundaries at all.

Thu Jun 10 12:03:52 PDT 2004	Kevin Karplus

try2 has thrown away a lot of stuff that was good in the top alignment
to 1mkmA---particularly some beta sheet and helix packing from the 
second domain.  Perhaps I should copy all the beta Hbonds from that alignment.
There may also be some good hbonds in the 1q48A alignment.

Thu Jun 10 23:02:55 PDT 2004 Kevin Karplus

try3 has done a much better job of preserving the good stuff.
The hairpin at Y134-E157 probably belongs with the rest of the sheet,
and 174-180 seems to be oriented backwards.

In 1mkmA, the sheet is strictly antiparallel with order 321654, though
strand3 is not a clean edge strand, having a big kink at an RLGM sequence.
If there is something similar in the target, then the alignment I have
is wrong, and V135-N141 belongs antiparallel to G302-T308.
The sheet order would then be 4321765.

If the hairpin is built right, I already have hbonds for V135, I137, R139, N141,
leaving Y134, L136, E138, and P140 to pair---probably like
	Y134	F307
	L136	Y305
	E138	S303
	P140	I301

If we just insert the hairpin then we want Y134's partner on the
hairpin to align with Y159, L136's with I161, E138's with S163.
That is
	L151	Y159
	R149	I161
	I147	S163

I've modified the Hbond constraints to try to get this packing of the
sheet for try4.costfn.  I'll see how it comes out, then try adjusting
things for a better fit.


It may not be necessary to break into domains, but if we do, it looks
like E132 would be a good breakpoint.  There are only 2 domains,
unless the second domain has an insertion in the middle.

Fri Jun 11 07:35:47 PDT 2004 Kevin Karplus

The second domain is beginning to look pretty good in try4-opt though
one strand needs work and the region G225-V267 needs to be moved, but
the first domain (which had better homology originally) has been damaged.

I see a choices:
	1) add constraints taken from the models built from alignments.
	2) superimpose on the first domain, then cut-and-paste to
		produce a model to refine further.
	3) break into domains, and model domains separately,
		combining when done.

Fri Jun 11 09:23:25 PDT 2004	Kevin Karplus

I created two subdirectories with the domains: 1-133 and 131-end
I'm running make on each, using for try1 in the domains the cost
function for try4 here, but with constraints restricted to the subdomain.

I'll try to put the comments for the domains into this README file.

Domain 1 is definitely matching a.4.5.* domains, with the best match
to 1hqcA.

Fri Jun 11 13:52:18 PDT 2004 Kevin Karplus
Domain 2 looks like a high probability of matching domain d.17.4.3,
with 1gs3A as the top example.  We did have this as the 9th match in
the whole-chain fold recognition, but now we have many more templates
to align to.  The cost function being used may be inappropriate for
the second domain, since I was guessing on sheet construction.

(oops--I forgot to same the Template.atoms file in the subdomains)

If I get decent models for the two subdomains, I'll have to play
around a bit with how to put them back together.

Fri Jun 11 14:19:33 PDT 2004	Kevin Karplus

Hmm---it looks like the homology models for the first domain only
extend through about E97, or possibly only L84, with the last two
helices serving as a linker to the next domain.  We might want to save
what is created up to L84 for use in the whole chain.

Fri Jun 11 20:41:19 PDT 2004	Kevin Karplus

I had to fix some of the scripts for creating the .rasmol and
.constraints files for the second domain (built-in assumptions that
residue numbering started at 1).  The second domain looks like a beta
sheet is starting to form, but there isn't much in the
alignments---this may be worse than the whole-chain predictions!

Perhaps the problem is that I started with a lot of constraints.
Maybe I should try again without them.


Tue June 29  1:00   Jenny Draper

I've been doing some research on the first domain of this protein. 
The first domain, approximately 1-95ish, I believe, is the DNA-binding
domain. It's helix-turn-helix motif is the helices 3&4 (39-45 and 52-64). 
Helix 4 (52-64) is the DNA "recognition helix", although both contact
the DNA. The sequence "SATIRN*M" in this helix is almost completely
conserved across all instances of this gene (it's really common in
bacteria). The residues S,T, and R in that motif have been 
experimentally verified as key for DNA binding. The R will definately
have to be freely available on the protein surface!

The protein recognizes the CIRCE dna sequence, which is a 9-bp repeat
separated by a 9-bp spacer -- indicating that it binds as a dimer
(or higher oligomer). There is experimental evidence that it does,
indeed, bind in this fashion (although dimer vs higher is not known).
There is speculation that it is not very stable unless it is bound
to DNA, and requires a chaperone to fold; it forms large aggregates
in solution if DNA is not present to stabilize it. 

In the closest full-lenth homolog (1mkm), the dimerization site is
in the linker helix of the DNA-binding domain.  

From all of this, my guess is that:
* The domain of approximately 1-94 will be the DNA binding site, with
  helices 3 & 4 forming a helix-turn-helix DNA-binding motif
* The region  95-? is a linker region, possiblly involved in dimerization.
* This protein will not be globular; it will most likely have an extended
  structure designed for oligimerization.


references:
--------------------------------------------------------------
Wiegert T, Schumann W. 
Analysis of a DNA-binding motif of the Bacillus subtilis HrcA repressor protein.
FEMS Microbiol Lett. 2003 Jun 6;223(1):101-6. 

Wiegert T, Hagmaier K, Schumann W.
Analysis of orthologous hrcA genes in Escherichia coli and Bacillus subtilis.
FEMS Microbiol Lett. 2004 May 1;234(1):9-17.  

Hitomi M, Nishimura H, Tsujimoto Y, Matsui H, Watanabe K.
Identification of a helix-turn-helix motif of Bacillus thermoglucosidasius HrcA essential for binding to the CIRCE element and thermostability of the HrcA-CIRCE complex, indicating a role as a thermosensor.
J Bacteriol. 2003 Jan;185(1):381-5.  
--------------------------------------------------------------


Fri Jul 16  5:30   Jenny Draper

I don't like the latest full-length model at all (try4), at least with
respect to the first domain; try4 has completely screwed up this
winged-helix dna-binding domain. Try3 sorta has the right idea.
I'm looking for something a lot like the structure of 1mkmA, as
can been seen pretty well in the best alignment structure:

T0199-1mkmA-t04-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5.a2m:1mkmA


Mon Jul 19 18:00:11 PDT 2004 Kevin Karplus

Subdirectories 1-95 and 115-end have been set up.  1-95 has completed
try1 and 115-end will soon.

The 1-95 is a comparative model with 1j5yA as the best template
(1mkmA, which Jenny likes, is the #3 template).

The 115-end is more difficult fold recognition.


Mon Jul 19 18:42:14 PDT 2004 Kevin Karplus

The "rr" contact prediction failed without error message for 115-end,
probably because the residue numbering doesn't start at 1 and
traincontactnn was not told what the starting column is.  When George
tells me what command line argument to give traincontactnn, I can fix
the Make.main file.


Tue Jul 20 	1:00 pm			Jenny Draper

I like the structure of the second domain in try4-opt2 for the full-chain
prediction. It's pretty much exactly what I'm looking for, with a good
sheet in a 5-6-7-1-2-3-4 pattern, with a long set of helices wrapping
around the sheet between stand 4 and strand 5. I've set up a "strands"
rasmol definition script, which also includes the helices:
define  s1    135-138
define  s2    146-154
define  s3    159-165
define  s4    171-177
define  s5    271-275
define  s6    289-295
define  s7    299-307

define  h1   2-13
define  h2   17-32
define  h3   39-45
define  h4   52-64
define  h5   81-91
define  h6   99-109
define  h7   116-130
define  h8   184-194
define  h9   200-210
define  h10 213-222
define  h11 247-266
define  h12 315-333    
define  wrapper 184-266  -- includes helices 8-11; wraps arounds sheet

I'll hold off on this domain for a little while, try to get a good 
structure for the first domain, then try a scaffold setup for the 
full structure.


Tue Jul 20  3:30 pm         Jenny Draper

I created a merged structure of T0199.1-95.try1-opt2.pdb (res 1-95)
and T0199.try4-opt2.pdb (res 96-338) by superimposing them on the
alignment model #1 from T0199.t2k.undertaker-align.pdb.gz
(T0199-1mkmA-t04-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5)
using DeepView's "Magic Fit". This is problematic, as this aligns
the helix around 80-95 with the helix 315-336... instead of with 
the helix around 95-110 (which is right next to the end helix). I
submitted this structure to VAST, hoping maybe it can provide a
better superposition. Job: VS59951  Pwd: T0199merged
 -- still running at 6:00pm


Tue Jul 20  7:45 pm         Jenny Draper
The VAST run didn't buy me much; it likes 1mkmA, and can't align
the middle linking region. I'll set up some scaffolding constraints
and see what undertaker can do with the merged file.
... which I will have to do tommorow, since I never uploaded the
merged structure from my work desktop to compbio...


Wed Jul 21  1:00 pm         Jenny Draper
I'm preparing an Undertaker run on the superpositioned structure
(decoys/superimposed-domains.pdb). I tried a superposition using
Undertaker; it had the same result as DeepView's magic-fit. Kevin
suggested just running the straight merge -- with the horrible
overlap of the end of domain 1 with the final helix -- and let
Undertaker straighten it out. I'm working on some scaffolding
constraints now, to keep the domains in order.

Wed Jul 21  4:00 pm         Jenny Draper
Try5 is now running on croak. I only included the superimposed
structure, which scores second-best (below try4-opt2) with the
try5 cost function (a good sign...)

Th Jul 22  12:30 pm         Jenny Draper
Try5 looks terrible; it's blown up both domains, though it holds
the helix-turn-helix and sheet together. The one thing I do like
is the two helices from 99-131. Maybe I can use these in my
superpositioning... I'll have to try this tommorow though, since
I've got Dr appointments all day today.


Fri Jul 23 18:26:44 PDT 2004 Kevin Karplus

I made an unconstrained.costfcn, and a try6.costfcn that is similar
but adds a single domain-separation constraint.  I also picked up a
lot of the top hits from both this directory and the 1-95 and 135-end
subdirectories and added them to MANUAL_TOP_HITS.

I'm running "make extra_alignments" and "make all-align.a2m.gz" to get
a rich set of fragments for the next optimization run.

Unconstrained, the try2-opt2 model scores best, followed by try1-opt2 and try5-opt2.
The extra constraint in try6.costfcn changes the order to try2, try4,
try1, try5

Jenny had broken the "str2" script by editing under windows.  The
automatically created rasmol scripts should NOT be edited---any edits
done that way are easily lost in a remake.  I added the "dna"
definition she was adding to the hand-created "strands" script.

It doesn't look like Jenny read in the superimposed-domains.pdb file
for try5, which would explain why it did so terribly---that was an
ab-intio run.

I'll try to get try6 to do roughly what Jenny wanted.


Sat Jul 24 10:44:04 PDT 2004 Kevin Karplus

try6 is VERY bad---it pulled the whole model apart.
The center linker is about how we want it though.

Maybe I can do a superposition of the two parts of
superimposed-domains on just the linker and use that superposition.
The superimpose-domains-2.under script does that fairly successfully.
Now, I'll edit down the the superimposed-domains2.pdb file to be a
single model and put it in decoys.

Wait, there is a problem! The residue numbering is messed up in the
second model, even though it was right in T0199.1-95-try1-opt2.
Y87 has somehow been changed into F121, Y88 into Y134, E89 into E188,
E90 into E202, ...

I think that the problem is in undertaker's reading of incomplete PDB
files and the crude alignment that is done: the
AlignAndSetConformation() routine in ReadPDBCommands.cc


Sat Jul 24 11:25:55 PDT 2004 Kevin Karplus

I made a crude patch to the global_align routine to use pdb numbers as
hints, which seems to have fixed the problem.


Sat Jul 24 16:54:24 PDT 2004 Kevin Karplus

The try7-opt2 model does not look too bad, though try6 scores better
with an unconstrained score file.  I'm convinced that try6 is trash,
so try7 is currently our best guess, though we could also submit
superimposed-domains2.pdb (the edited one in the decoys, not the
multiple-model one in this directory).  Maybe I'd better rename it
one-model-2domains.pdb

Rosetta really hates try7-opt2.repack-nonPC.

The try7-opt2 and one-model-2domains models are quite similar, but the
linker helix is straighter in try7 and has leaned over a bit, bringing
the two domains closer together.

I should probably do a polishing run to reduce breaks and clashes and
call it quits for this one.


Sun Jul 25 09:15:08 PDT 2004 Kevin Karplus

Other than strand s2, try8 looks pretty good.
Maybe try
N141-K146-T166
P140-I147-L165
Y134-I153-Y159

It would probably have been better to fix s2 before sticking the
decoys together (unless it was broken by subsequent RR
constraints---I'll have to go back and check).

The unconstrained cost fcn prefers try6-opt2 (which is junk) and
try7-opt2 to try8-opt2.  The difference in cost between try7 and try8
is tiny, and can be accounted for by slight differences in weighting
different components of the cost fcn.

Rosetta prefers try6-opt2.repack-nonPC also, though try8 does better
than try7.


If try9 makes a better half barrel, should I cut out 131-end and redo
the superposition to get a better first model?


Sun Jul 25  12:45 pm         Jenny Draper

I'm really unhappy with the linkage between the two domains. I think
the helix between 109-81 should be sticking out, not wrapping around
the sheet. This structure dimerizes, and I suspect that it does so
in a fashion like 1MKM, where the two linker helices cross in an "X".
The way try8 is formed, this dimerization would be impossible.
Could we try superimposing this on the dimer (1mkm), making this 
structure a dimer? Take a look at a funky superpositon of the
"superimpose-domains-2.pdb" structure that I put together, for an 
idea of what I'm looking for: T0199/dimer-superimpose-domains-2.pdb

From karplus@soe.ucsc.edu  Sun Jul 25 13:49:39 2004
Date: Sun, 25 Jul 2004 13:49:38 -0700
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: learithe@soe.ucsc.edu
CC: karplus@soe.ucsc.edu
Subject: unhappy dimer T0199


I'm not happy with T0199 either.

Take a look at casp6/T0199/decoys/one-model-2domains.pdb

That is the result of my superposition and meets most of your criteria.
At the moment it is our first model, but if you could fix strand s2 in
the second domain (see my notes in README)  we could re-superimpose.
It might be best to do the work in 131-end, so as not to have the
first domain and linker slowing things down.


Sun Jul 25  7:30 pm         Jenny Draper

Running 115-end try2, from the second-domain part of 
"one-model-2domains.pdb", using try9.under/costfcn, with all domain1
constraints removed.


Sun Jul 25 21:51:08 PDT 2004 Kevin Karplus

I don't see much difference between try8 and try9---strand s2 doesn't
seem to have budged.  I'd have to superimpose them to distinguish
between them.   The try9 costfcn likes try9 better, but the
unconstrained one likes try8 better (of course, it loves the terrible try6).


Mon Jul 26 12:25:45 PDT 2004 Kevin Karplus

The try2 on 115-end doesn't look much better.


Sun Sep 19 10:03:55 PDT 2004 Kevin Karplus

I put 1stzA in the Makefile as REAL_PDB and evaluated our predictions.

Our submitted models are ordered model5, model4, model1, model3, model2.
If we insert the robetta models, we get:
	model5, robetta3, model4, robetta1, robetta4, robetta2, model1, model3, model2, robetta5
None of these models are particularly good (22 Ang rmsd).
We are not doing significantly better than robetta on whole-chain
rmsd---indeed, robetta's model 1 is better than ours. 

The model5 rmsd is artificially good, because the model is incomplete.

The problem may be with the domain placement, though, rather than bad
domains, since superimposed-domains.pdb does do slightly better than model5.
It is annoying though, when the automatic methods (try4 and try5) do
better than the hand-tweaked models.

We do need an evaluation that looks at the domains separately to make
any real judgement of how well things worked here.


Wed Sep 22 05:11:28 PDT 2004 Kevin Karplus

I don't have separate domains yet, but I did look at
undertaker-computed GDT scores:
	model5	23.58%	
	model1	20.59%
	model2=model3	19.89%
	robetta3 17.49%
	robetta2 16.72%
	robetta1 15.40%
	robetta4 13.24%
	robetta5 12.62%
	model4	 10.37%

None of these are great, but we are beating robetta.
The incompleteness of model5 is still skewing the results, because I'm
not computing GDT score quite right---I was normalizing by the number
of CA atoms that were present in BOTH conformations, rather than just
the number in the real conformation.  I'll fix this and rerun.


Wed Sep 22 10:28:40 PDT 2004 Kevin Karplus

Fixing the bug lead to the correct ordering:
	model1, model2=model3, robetta3, robetta2, model5 15.48%,
	robetta1, robetta4, robetta5, model4
and model1 is the best model we created.
Having model3 be slightly better than model2 on all-atom rmsd
indicates that the Rosetta repacking made a small improvement.


Fri Sep 24 12:34:41 PDT 2004 Kevin Karplus

Changing smooth_GDT leads to the following:
name			length	missing_atoms	rmsd	rmsd_ca	GDT		smooth_GDT
model1.ts-submitted	338	 0.0000		26.0666	25.4335	-20.4334	-19.4723
model2.ts-submitted	338	 0.0000		27.6092	27.0085	-20.4334	-19.2314
model3.ts-submitted	338	 0.0000		27.5642	27.0085	-20.4334	-19.2311
robetta-model3.pdb.gz	338	 0.0000		23.7348	23.1228	-17.4923	-16.6723
robetta-model2.pdb.gz	338	 0.0000		24.8752	24.3236	-16.7183	-15.6841
robetta-model1.pdb.gz	338	 0.0000		24.2513	23.6755	-16.0991	-14.8817
model5.ts-submitted	338	1034.		22.8557	21.9430	-15.7121	-14.4855
robetta-model5.pdb.gz	338	 0.0000		28.4818	27.7490	-12.6161	-12.0560
robetta-model4.pdb.gz	338	 0.0000		24.2827	23.7282	-12.6935	-12.0547
model4.ts-submitted	338	 0.0000		24.0005	23.4880	-10.8359	-10.5238

None of the models are very good, but we did beat robetta.

We probably have to evaluate this protein in separate domains.


Fri Nov 26 17:07:00 PST 2004 Kevin Karplus

The assessors broke this into three different domains, with different difficulties:
Domain : T0199_1 : CM/hard : NT=74 : 14-87
Domain : T0199_2 : FR/H : NT=134 : 116-142,230-336
Domain : T0199_3 : FR/A : NT=82 : 145-226

The smooth-GDT scores for the whole chain and the 3 domains is

#Target	best	best	model1	auto	align	robetta	robetta
#	sam-t04	submit				best	1
T0199 	19.4505	19.4505	19.4505	10.5169	14.5432	16.6689	14.8804
T0199_1	73.1510	72.5353	72.5353	43.2916	53.1763	61.3333	56.4955
T0199_2	43.3686	43.0372	43.0372	16.8240	33.0152	39.9706	34.5972
T0199_3	25.7334	25.3971	24.6210	25.3972	15.5119	19.8216	19.8216

As expected, we did well on the CM domain, ok on the FR/H domain, and
not so great on the FR/A domain.  Interestingly, on T0199_3, we made
the final helix longer than the real structure, which has turns at
S214 and G206---the helix prediction was pretty strong for this
region, so the mistake is understandable.