Tue Jul 13 09:00:38 PDT 2004
T0237

DUE 16 Aug 2004

Tue Jul 13 12:03:21 PDT 2004

The t04 alignment is showing 8 hugely conserved CYS.  This sounds like
disulfide bridges to me.  We'll have to turn on the disulfide scoring.

I hope that there is enough similarity to known proteins to
disambiguate the bridges, otherwise we'll have 7*5*3 = 105 different
pairings to consider.  (We may be able to guess them separately, which
would reduce the complexity a lot.)


Good, it looks like we have a strong hit on 1hn6A, so we should be ok.

Tue Jul 13 12:26:28 PDT 2004	Kevin Karplus

Not so good as all that.  1hn6A has only 3 disulfides, not 4, and it
is an NMR structure with a whole lot of variability.  Only a hairpin
and a helix near the disulfides are conserved.  Still, if the pairing
of the disulfides is right  for the 1hn6A alignment, then the one
remaining pair is forced and we can use SSBond statements to
constraint the fold significantly.

This is a 2.0 Ang X-ray crystal, not an NMR result, so we aren't going
to have that much flop in the model we are trying to match.

Tue Jul 13 14:51:02 PDT 2004	Kevin Karplus

The first model in the T0237.t2k.undertaker-align.pdb.gz file is the
1hn6A alignment, for which only a small part is reasonable.

That part suggests that the disulfide pairing is 
C409-C392, C390-C407, C402-C346.  

Wait a sec---the t2k alignment has 13 conserved cys, and this is only
3 of them (390, 402, and 407).  There are still C52, C120, C150, C166,
C178, C205, C223, C240, C312, and C321---all of which have even more
conservation than the ones we matched.   We have 9*7*5*3 = 945 ways to
pair these CYS residues---that's too many for us to try to build a
models for each, even if we automate the ssbond construction.  (OK, if
we were running on the kilokluster we could do it, but it would take
up the kluster for a day or two.)

Maybe mutual information will help disambiguate the pairings.  The cys
residues are not going to get MI values, but maybe some of their
neighbors will.

Tue Jul 13 16:42:04 PDT 2004	Kevin Karplus

The try1-opt1 model is looking pretty scattered.  There are bits and
pieces of hairpins and helices, but nothing that will help us figure
out the disulfide pairings.

Thu Jul 29 17:55:36 PDT 2004	Martina Koeva

I looked at the mutual information files and the pairs indicated above 
showed up (not directly, but through neighboring residues). Two additional
pairs seemed to appear: C150-C178 and C205-C223. I will attempt to put those
5 pairs in try2. Additionally, I have rescaled the hbond parameters, increased
the constraints weight (from 10 to 30) and have added the rr constraints from 
George's 280.rr.constraints file.

Sun Aug 1 15:11:28 PDT 2004	Martina Koeva

Something new: looked at one of the original papers solving the structure 
of 1hn6A. Here is a link to it:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12270711

There is quite a bit of information in it. The paper mostly focuses on 
domain III of the Apical membrane antigen 1, as expected. However, there
is a schematic diagram of all 3 possible subdomains of the Pf AMA1 
ectodomain, and more imporantly it shows the position of exactly 8 
disulfide bonds (p.2 of paper).Assuming that the 3 pairs that Kevin 
mentioned above are correct (with that numbering scheme) and assuming that
there are no insertions or deletions, if one counts the number of residues 
from those 3 pairs of cysteine residues back and maps between structures, 
it turns out that all cysteine residue positions have been completely 
conserved (relative to each other).

The determination of the last three pairs (C346-C402, C390-C407, C392-C409)
has been documented in the paper noted above. The determination of the previous
5 cysteine residue pairs has been done in the following paper:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8910611

What this means for T0237 is that the two additional pairs that I thought 
I had found previously were incorrect and the pairing of the 16 cysteine residues 
goes as follows:

1.) C150 - C120
2.) C178 - C166
3.) C205 - C52
4.) C312 - C240
5.) C321 - C223
6.) C402 - C346
7.) C407 - C390
8.) C409 - C392

I will set these 8 pairs of residues up as SSBond constraints for try3 for now 
and see what happens. Also, I think there is a bit more information that could
be found in the literature on the structure of the AMA1 protein, so I will try 
and look for anything else useful that can give us hints in regards to T0237. 
 

Mon Aug  2 00:07:56 PDT 2004 Kevin Karplus


Don't put too much faith in my guesses at the disulphide mapping---if
you have any evidence that contradicts my guesses, go with the evidence.


From karplus@soe.ucsc.edu  Mon Aug  2 00:17:40 2004
Date: Mon, 2 Aug 2004 00:17:38 -0700
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: karplus@soe.ucsc.edu, sol@soe.ucsc.edu, ggshack@soe.ucsc.edu,
        learithe@soe.ucsc.edu, martina@soe.ucsc.edu, bbarnes@ucsc.edu,
        marcias@ucsc.edu, rph@soe.ucsc.edu
Subject: correction on T0237


Oops---on T2037 the number of disulphides is 8, not 4, so the search
would have been over 15*13*11*9*7*5*3 pairings----definitely not
feasible for us.

If some of the pairings we currently are using are not supported by
experimental evidence or homology, then the should be left out of the
cost function, and better guesses made once the known SS bonds have
been inserted.

From martina@soe.ucsc.edu  Mon Aug  2 01:23:01 2004
MIME-Version: 1.0
Date: Mon, 2 Aug 2004 01:22:59 -0700 (PDT)
From: Martina Koeva <martina@soe.ucsc.edu>
To: Kevin Karplus <karplus@soe.ucsc.edu>
Subject: Re: correction on T0237
In-Reply-To: <200408020717.i727Hc4A006048@cheep.cse.ucsc.edu>

The "maps" of the 3 pairings that I started off with (which were the ones 
you had 
suggested in the README file) were some of the ones mentioned the most in 
the paper for the structure of 1hn6A. I guess this was the case, because 
the disulfide bridges in subdomain III were the only ones that had not 
been confirmed experimentally up to that point. If those were incorrect, 
the mapping probably would not 
have worked as well as it did. I think we can more or less safely assume 
that the experimental evidence points to those pairings. Now, I just need 
to make those 3 pairs form a bridge. 

I also just saw another paper that just came out yesterday that confirms 
that all of the cysteines at least in the homolog Plasmodium f. AMAI 
protein form disulfide bonds (also experimental evidence).


--------------------------------------------------

Mon Aug  2 11:03:17 PDT 2004 Kevin Karplus

Try3 forms the 8 ss bonds, but has not folded the rest compactly.

I'll start a try4 run on cheep that reduces the number of strand and
sheet constraints to just the ones from t04.dssp-ehl2 and adds the rr
constraints, in an attempt to pack this little better.  Martina might
want to start an independent run with guesses about sheet topology, as
I have not had time to think about that.


Mon Aug  2 14:04:14 PDT 2004 Martina Koeva

I spent some more time going through the same original paper and trying
to gather information about structure that we can use in T0237. Here is
what I think we can use or conclude (directly taken from the paper):

1. All that follows below is only relevant to residues E339 (approx.) - end.
This segment of residues is classified to be subdomain III of this protein.
All the information is given for the structure of the template, but I have
mapped the residue numbers to those of T0237.

2. The N-terminal region of this subdomain is not structured over the first
4 residues (E339-L342) and starts to become more ordered after E343.

3. The structure has a turn of a helix between C346 and K350, followed by
a type I beta-turn, centered on E354 and R355 and stabilized by a backbone
hydrogen bond.

4. The structured regions E343-R355 and F380-N408 are separated by a largely
disordered loop of approximately 25-26 residues.

5. There is a completely conserved sequence P377-S382 (...PRIFIS...) between
all Plasmodium sequences. The paper indicates that this stretch adopts an 
extended (sheet-like) structure, but does not interact with the beta-hairpin.

6. An antiparallel beta-sheet is found between residues E395-S398 and 
N403-V406:	NFYV 
		SIRE
The residues between P394 and C407 form a beta-hairpin with a distorted type
I beta-turn centered on residues S400 and T401.

7. The last residues (after N408) are largely unstructured.

8. There is another conserved region of sequence, namely S(E)NNEV between 
residues 418-422. In the case of T0237, the sequence is ENNQV. It is supposed
to adopt a bent structure that contains some features of a reverse turn, but
does not cause a chain reversal.

9. Surface residues include: D384, S387,S400, T401 and possibly N403.

10.One face of this subdomain is highly charged and has a cluster of negative
potential towards the disulphide core. There is a basic cleft centered on:
K359, residue 360 (which used to be an R, but in T0237 is a Q), R362, R378
and K389, where the residues come in from both the loop and the structured 
region. The opposite side of the subdomain is less charged.


What does all of this mean for T0237: There is pretty much no structural 
information apart from the disulfide bonds for the rest of the protein 
(subdomain I and II). I tried using VAST on try3-opt2, which was a long shot 
anyway. 
VAST ID: VS60344
Password: T0237try3

As I discovered, VAST was smart about the search and separated TO237 into 
4 subdomains. There were no hits at all for the latter 3 subdomains, and there
were only a few hits for subdomain 1, which looked pretty trashy to me. I will 
go back and take a look at the aligned regions just to double-check, but I am 
sceptical if that is going to lead anywhere.

As far as subdomain III, one of the recurring themes in the paper has been the 
lack of secondary structure in the majority of this small subdomain. I am still
a little overt to believing that. My concern is (could be completely irrational)
that such a conclusion is correlated to the nature of the method used (NMR) and the 
not-so-high resolution. Is that possible?

Otherwise, I do believe and like the beta-hairpin element, even though 
we do not even predict a strand between 395 and 398. The strand there seems to want
to form itself. I will put in an explicit strand and sheet constraints for those
two antiprallel strands. The template also seems to have a turn of a helix, while 
our predictions seem to like the idea of extending that helix to another couple of 
turns (a helix of about 12 residues).  Finally, I am inclined to like the idea of 
the cleft of basic residues, which in T0237 does not show up as a cluster yet in 
try3. 

I will include those in try5 tonight. 
 

Mon Aug  2 17:05:38 PDT 2004 Martina Koeva

Try4 looks like it just finished and it strikes me as somewhat more compact, but 
it could be because I have been staring at this structure for too long. I am not 
seeing the cleft residues clustering yet. The goal for try5 will be to include 
explicit sheet constraints for the beta hairpin, as well as attempt to cluster the
basic residues noted above. 

However, I do need more secondary structure (possible sheet) conjectures, so I 
will focus next on subdomain I, since we have a couple of more or less strongly 
predicted strands. If I get some more sheet conjectures tonight, I will start try5
with them too. 

 
Mon Aug 2 21:49:32 PDT 2004	Martina Koeva

I started try5 with a few sheet constraints. It's a little bit of a shot in the 
dark, but I will have to wait and see what the structures show. I have not included
yet the cluster of positively charged residues in the small cleft in subdomain III.
Might need include that in try6. 
 
 
Wed Aug  4 02:11:35 PDT 2004 Kevin Karplus

T0237.try4 looks rather horrible, lacking even the hairpins of try1.
Maybe I should pick up the sheet constraints from try1-opt2.


Thu Aug  5 13:56:01 PDT 2004 Martina Koeva

I can't see anything that I was really looking for. Try5-opt2 scores worse than
both try4 and try3 (with try3 scoring the best) with the try5 cost function.
Try4-opt2 also scores worse than try3 with the try4 cost function. 

One thing that I noticed in the robetta models is a couple of hairpins that I 
quite like and that fit with our ehl2 predictions. I am putting in those as
sheet constraints for try6. I am also including the sheet constraints from 
try1-opt2:

SheetConstraint R207 K208	N212 G211 	hbond K208
SheetConstraint N301 D306 	N316 K311 	hbond W302
SheetConstraint L322 N324	I329 N327	hbond I323
SheetConstraint E395 S398	V406 N403	hbond E395

Hmm, never mind! The sheet constraints from the robetta model 3 and 
from try1-opt2 actually turn out to overlap, so the 4 constraints from
are the only one included for try6.

I've also increased the constraints weight from 10 to 30. 

 
Sat Aug  7 15:45:07 PDT 2004 Kevin Karplus

try6 has some hairpins and the disulphides.  Perhaps we need to
increase the break costs so that the backbone is not so
shattered---there are some truly horrific breaks (like 16 before K113
19 before C150, 21 before F179, 49 before C205, 51 before P206, ..).

I'll also include the T0237.t04.many.frag file in try7.under, after
redoing the make to create it.  This will mean re-creating the
Template.atoms file also, so after try7 we have to remember to comment
out the output of Template.atoms again.

I also added a few more of the very weak hits to MANUAL_TOP_HITS and
remade "extra_alignments" and "all-align.*" to try to get some more
long fragments to use.


Sun Aug  8 09:38:32 PDT 2004 Kevin Karplus

Although try7 scores better than try6, it still doessn't look much
like a protein. Several of the helices have unwound, and nothing is compact.
There are still bad breaks, though none as horrendous as in try6.
Perhaps the helix constraints, dry12, and phobic_fit parameters should
be increased. 

I'll leave this one for Martina to work on.


Mon Aug  9 16:43:05 PDT 2004 Martina Koeva

This is probably going to be the last attempt before maybe breaking the 
protein into subdomains. I have raised all strand, helix, sheet and rr
constraints in the try8 cost function. I have increased the break weight 
even further, turned down sidechain weight, turned up wet6.5 and all dry 
weights. Finally, I have also increased the phobic_fit weight.

I have commented out again the Template.atoms file.


Wed Aug 11 17:00:52 PDT 2004 Martina Koeva

I have decided to split the subdomains in the following way:

P1-V219
F215-E339
L334-L445(end)

As a first attempt, I have made the starter subdirectory for the 
first subdomain. If everything works out fine, I will need to do 
the other two later tonight. 

 
Thu Aug 12 00:28:22 PDT 2004 Martina Koeva

All of the subdirectories have been created and the initial runs 
have been started with the disulphide constraints already put in
for try1. I have rescaled the hbond parameters in try1.costfcn, 
as well as have included 'known_ssbond'.

From now on I will be commenting both in the main README file, as 
well as the subdirectory README files.  


Thu Aug 12 15:47:34 PDT 2004 Martina Koeva

I started all try2s in each subdirectory and looked at the initial models.
It seems that in each subdirectory the models are showing improvement in
secondary structure formation, but as a downside both subdomain I and III
are not forming the disulphide bonds, even though I had already specified
explicit SSBond constraints in try1.costfcn for each subdomain. 

From martina@soe.ucsc.edu  Thu Aug 12 20:30:51 2004
MIME-Version: 1.0
Date: Thu, 12 Aug 2004 20:30:50 -0700 (PDT)
From: Martina Koeva <martina@soe.ucsc.edu>
To: Kevin Karplus <karplus@soe.ucsc.edu>
Subject: T0237
In-Reply-To: <200408130256.i7D2uM2L021083@cheep.cse.ucsc.edu>

I was wondering whether you can take a look at T0237 (the big new fold 
with the 16 Cys) at some point. I have
split it into subdomains and I am pretty happy with subdomains II and III,
given that I've done two tries on each. There is quite a bit to work on in
subdomain I. I was wondering though if you would have any suggestions on 
it?


Thank you!

Martina

From karplus@soe.ucsc.edu  Fri Aug 13 16:02:34 2004
Date: Fri, 13 Aug 2004 16:02:32 -0700
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: martina@soe.ucsc.edu
CC: karplus@soe.ucsc.edu
In-reply-to: <Pine.LNX.4.44.0408122027390.13464-100000@bark.cse.ucsc.edu>
	(message from Martina Koeva on Thu, 12 Aug 2004 20:30:50 -0700 (PDT))
Subject: Re: T0237


In domain 1,
    If you are having trouble forming disulfides in the T0237 subdomains,
    it may be because you still have 

    InitMethodProbs ...
	    InsertSSBond 0 \
	    ImproveSSBond 0

    The "InsertSSBond" operator is almost certainly the one that caused
    the ssbonds to be formed (at the expense of almost everything else) in
    the main directory.  Set their initial weights to 1 or 2, and they
    should start being used.

In domain2, 
    it looks like you might want to add a hairpin:
    SheetConstraint A287 N289	K295 N292	hbond N289

    You might also want to strengthen your strand constraints relative to
    the helix constraints. 

    You could probably drop knwon_ssbond back down to 1, but increase the
    wet and dry weights, and reduce the sidechain weight to 1
	SetCost wet6.5 10 near_backbone 5 way_back 5 dry5 15 dry6.5 25 dry8 15 dry12 5 ...


In domain3,
    I don't see any constraints to add, but you might want to tweak
    the weights as for domain 2.
   
 
Sat Aug 14 00:06:29 PDT 2004 Martina Koeva 

I have incorporated all of the above suggestions into the try3s for the
appropriate subdomains. If I manage to get better packing on the first 
subdomain, I will try to make try4 an optimization run from existing 
models, so that I can have enough time to put the protein back together 
and optimize the whole structure.

Sat Aug 14 02:09:53 PDT 2004 Martina Koeva

Wow, that was quite fast. The third try for subdomain III is done and the
other two have already generated their opt1 models, so I will be able to 
start an optimization run from the existing models in the morning. As
I can see in the structures that have already been generated for try3
all disulphides in each subdomain are forming.


Sat Aug 14 14:46:31 PDT 2004 Martina Koeva

I have started try4 on each subdomain from previous models. Once those are 
done I can try putting the structure back together and optimizing.
Subdomain 1 still doesn't look very structured, but we are getting some 
strands into sheets. Subdomain II and III look pretty decent for being
separate pieces of a whole structure (except foamyness). 

I still need to put in as constraints the cluster of positively charged 
residues in subdomain III.

Sat Aug 14 21:31:12 PDT 2004 Martina Koeva

Hmm, try4 for subdomain I is not done yet. The other two have finished and 
they do not look very polished. Both subdomains II and III try4-opt2 models
look quite foamy, but at this point there isn't much time left, so I will 
wait for try4 in subdomain I to finish and superpose them.


Sun Aug 15 05:11:33 PDT 2004 Martina Koeva

Ok, so all try4s finished. I superimposed them and through cutting and pasting
made a chimera model. It has some terrible clashes, but I am hoping that 
the try9 run will be able to fix that. I am starting only from the chimera
model. With its own cost function the chimera model scores the worst right
now, but I am keeping fingers crossed that try9 (running on peep) will 
score reasonably well.

 
Once the undertaker run finishes, I will put in the README file the suggestions
for models to submit.


Sun Aug 15 13:25:49 PDT 2004 Martina Koeva

Try9 is finished and even though the structure still looks pretty bad, it 
looks better than before. Try9-opt2 now scores the best with its own cost 
function, as well as the unconstrained function (I wonder whether the 
unconstrained cost function looks all right? Do I keep in at least the 
SSBonds explicitly in? After looking at the uncostrained function for T0238,
I think I need to keep the SSBond constraints explicitly in.)

For try10, I will start from all existing models. It looks like the dry weights
are already up, I will raise near_backbone and way_back too. Phobic_fit also
seems to be quite high, but maybe I can raise it a little more. I can also 
rasie the dry weights a little more.

Finally, I am also including a few constraints for the cluster of basic 
residues.

Sun Aug 15 13:50:13 PDT 2004 Martina Koeva

Try10 is running on peep. 

I think we should submit:

try10-opt2 (when it is done, it should be scoring better than try9 with the
	unconstrained function)

try1-opt2 (fully automated one)  

T0237-1hn6A-t2k-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5

try8-opt2 (most compact model from before splitting into subdomains...it does
	not score as well as try7,try3,try4 with the unconstrained function,but
	it does have a more compact core)

try9-opt2 (scores best with unconstrained function, but it probably will 
	be scoring slightly worse than try10-opt2 when it is done. However,
	it is the first model after putting the subdomain back together.)

----------------------------------------------------------------------------
From karplus@soe.ucsc.edu  Sun Aug 15 14:47:06 2004
Date: Sun, 15 Aug 2004 14:47:05 -0700
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: martina@soe.ucsc.edu
CC: karplus@soe.ucsc.edu
Subject: T0237


I'd like to submit T0237 in the next few hours.

You've given me a list:

	try10-opt2	best unconstrained
	try8-opt2	most compact before splitting into subdomains
	try1-opt2	full auto
	T0237-1hn6A-t2k-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5

You also gave me try9-opt2, but that is expected to be very similar to try10-opt2.
Is there another possibility that gives us more diversity---a
distinctly different prediction, even if it not quite as good?

---------------------------------------------------------------------------- 

Date: Sun, 15 Aug 2004 14:57:12 -0700 (PDT)
From: Martina Koeva <martina@soe.ucsc.edu>
To: Kevin Karplus <karplus@soe.ucsc.edu>
Cc: martina@soe.ucsc.edu
Subject: Re: T0237

Try10-opt2 should be done hopefully in the next few hours.

Try7-opt2 is the one that scores the best with the unconstrained
function) from the earlier ones (before the split into subdomains).
It is not as compact and it has unwound some of the predicted helices,
but it does not have as many, and as bad breaks as all models before it.
So maybe try7-opt2, instead of try9-opt2?

--------------------------------------------------------------------------


Sun Aug 15 16:34:34 PDT 2004 Martina Koeva

Try10-opt1 has been generated and as expected, it is already scoring a little
better than try9-opt2 with the unconstrained cost function. 


Sun Aug 15 17:35:34 PDT 2004 Kevin Karplus

When try10 finishes, I'll do a submission, but I think I'll want to do
another run with just the ssbond constraints. I'm particularly worried
about the enormous weight on the rr constraints, which are not THAT
reliable.

Also, the .under file should include the alignments and fragments
from the subdomains, not just the whole protein.  I'm unlikely at this
point to find anything new, but perhaps we can get a better packing of
the current domains.

Before I do that, I'll add all the "reasonable" hits from the subdomains
to the MANUAL_TOP_HITS lists, and make extra_alignments.


Sun Aug 15 21:52:07 PDT 2004 Kevin Karplus

try11-opt1 over try10-opt2 made a much bigger difference in the
unconstrained costfcn than try10-opt2 over try9-opt2.
Unfortunately, I put too many jobs on abyss, so they are all running a
bit slowly.  I'll have to hope that try11-opt2 is ready in the morning!
Otherwise I'll have to submit try11-opt1 (which is still better than try10-opt2.)


From baertsch@soe.ucsc.edu  Mon Aug 16 14:30:03 2004
MIME-Version: 1.0
Date: Mon, 16 Aug 2004 14:30:02 -0700 (PDT)
From: Robert Baertsch <baertsch@soe.ucsc.edu>
To: Kevin Karplus <karplus@cse.ucsc.edu>
cc: Martina Koeva <martina@cse.ucsc.edu>
Subject: malaria

Kevin,
I think Res 409-417 on Target 237 is possibly the active site for the 
protein. It is a hydrophilic alpha helix and seems unlikely to add to the 
structure. Perhaps it slips into some cleft in the human protein. What 
type of surface would bind to it strongly?

-Robert