Thu May 11 08:40:44 PDT 2006
T0287
Make started Thu May 11 08:47:40 PDT 2006
Running on lopez.cse.ucsc.edu

Thu May 11 09:08:23 PDT 2006 Kevin Karplus

This target is not easy comparative modeling, since none of the
iterated searches found PDB files directly in the search.  It is an
ORFan protein, found only in Helicobacter pylori.

I sent email to Karen Ottemann, asking if she knows anything about
this protein.

I'm getting weak hits to 1hzgA (d.194.1.1) in the initial searches.

Date: Thu, 11 May 2006 09:15:46 -0700
From: Kevin Karplus
To: ottemann
Subject: H. pylori protein as CASP target


CASP7 target T0287 is CagS (HP0534), an ORFan protein found only in
Helicobacter pylori (CAG pathogenicity island protein).

If you know anything about this protein, it could be useful to us in
trying to predict its structure.  (ORFan proteins are the worst for
structure-prediction methods.)

Kevin Karplus

------------------------------------------------------------

Make started Thu May 11 11:49:03 PDT 2006
Running on lopez.cse.ucsc.edu

The best e-value is a terrible 11.9 (for 1b0b).  There are predicted
to be a bunch of helices, a 3-strand anti-parallel sheet, and more
helices.  The try1-opt2 prediction is awful, somehow conjuring up a
six-strand anti-parallel barrel.

This target is going to take some work!


Thu May 11 22:55:07 PDT 2006 Kevin Karplus

None of the sheet constraints from the initial alignments are worth anything.
The predicted strands are
	s1	I129-M133
	s2	I142-L144
	s3	L152-M157

The pair from 1josA are strands s1 ^v s3 ^v s2, but only by putting
s1 and s2 together into a single strand on one side of s3! (1josA has
a mixed sheet with a hairpin and later parallel strand).

The ones from 1dc1A are more plausible, but include a somewhat dubious
strand s0 (P123-F124) and do not include strand s3.

For try1 we were not using the str2 constraints, but rather the weaker
dssp-ehl2 constraints, which show only a single strand.

For try2, I'll stick in more of the helix and strand constraints from
the neural nets, and leave out the sheet constraints and rr constraints.
The rr constraints are just based on propensity and separation, so are
not worth much.

Fri May 12 06:58:24 PDT 2006 Kevin Karplus

try2-opt2 is even worse than try1-opt2.  Instead of too many strands
there are now none, and the helices created don't pack at all.

I think we're going to have to come up with some sheet constraints by hand.


Sat May 13 12:27:37 PDT 2006 Kevin Karplus

I downloaded the robetta models and scored them.  I think there is
something wrong, since both the phobic_fit and sidechain scores are enormous.
Ah---sidechains aren't reported, only backbone and CB.
I wonder if Baker's group knows this.

Perhaps I should add a PatchConform command to undertaker to put
sidechains on?  In any case I should improve undertaker not to compute
sidechain costs for missing sidechains.


Wed May 17 14:04:11 PDT 2006 George Shackelford

I have made a new rr prediction using the new 449a_45 contact predictor.
There are considerably more contacts predicted than found by 352_28. The new
predictions are also unexpectedly strong; there are a number of them >.60
probability. I worry that they have focused in on a family that has
contaminated the t04/t06 alignments. I may retry just using the t2k
alignment.

However what I am seeing when I plot the predictions using the str2 logo, I
find they indicate that the helices are broken into smaller helices that
form a bundle. I am looking for existing helical bundles that it could
resemble; the shortness of the suggested helices seems unlikely; I don't see
how they stay stable. Well, there are some examples of stable tight helices.
DNA bindings, globins, and others (1aow).

From: George Shackelford
To: Kevin Karplus
Subject: How does T0287 try3 look?
Date: Wed, 17 May 2006 20:02:12 -0700

I've generated a new try for T0287 (the ORFan) and it looks nice to me but I
would like your input/reaction.

- George

P.S. I'd really like to get 449a_45 as the new predictor in place of 352_28.
It apparently is a  lot better.

--------------------------------------------------------------------------------

Wed May 17 20:06:59 PDT 2006 Kevin Karplus

I assume George is talking about decoys/T0287.try3-opt2.pdb

It is clear that he did not use the T0287.do3 target, as this pdb file
has not been gzipped, and the rosetta and gromacs optimizations of it
were not done.  I'll gzip the pdb files and run the T0287.do3 target
to finish the job.

Wed May 17 20:13:08 PDT 2006 Kevin Karplus

Rosetta likes try3 better than try2, but the try3 costfcn still
prefers try2.

I can see why George prefers try3 to try2, but try3 is very similar to try1.
I'm not sure I know improvement George is seeing.

Wed May 17 20:58:53 PDT 2006 Kevin Karplus

I superimposed the models, and try1 and try3 are almost identical.
The only significant difference is which way the N-terminal helix points.

--------------------------------------------------------------------------------

Thu May 18 00:43:01 PDT 2006 George Shackelford

I thought to run a try using constraints generated by the 449a_45 predictor just
to see what results I would get.

The 449a_45 is like the 352 but has a window size of 5 for the local structure
predictions rather than 3, includes 'ent', the joint entropy rank which I find
adds about 1-2% to the results, and uses the z-value instead of the actual
value for MI e-values; that change seems to add about 1% better results. Most of
the improvement comes from the wider local structure predictions window.

added:
include T0287.449a_45.rr.constraints
to try3.costfcn

The results do look a lot better than try2 (which I thought would be an
improvement over try1). I had not yet taken a look at try1, so I didn't realize
that the new rr constraints might produce results similar to try1.

I was bothered by the fact that the results look so "good"/"protein-like". I
checked the structures for the list of possible templates to see if we had
duplicated one of them, but we had not.

The newer
449a_45.rr.constraints have been calibrated, but the values are still too high
and are likely to overwhelm other constraints. Specifically the burial /
phobic_fit is clearly wrong. I need to retry with higher values to phobic_fit,
and near-backbone. I'm going to do a new try4 with:
near-backbone 15
phobic_fit 5
constraints 7

Thu May 18 11:23:30 PDT 2006 George Shackelford

The results of try4 do succeed in burying some of the hydrophobics but is foamer
than ever. I think it is really trying to pick up on the earlier models. I want
a really fresh start; I'm going to comment out the TryAllAligns and see what I
get. First I'll check on the README's to make sure that is what I need to do.

Thu May 18 14:56:50 PDT 2006 George Shackelford

I talked to Kevin and I am using T0290 try2.under as a model on how to exclude a
template that appears to be taking over. I noted that the rr.449a_45.constraints
were the only difference between try2 and try4. Given the score in try4.log.gz,
I am excluding 2a9kB from the considered templates. I will also reduce the break
cost to 20 to allow for a more 'creative' decoy from undertaker.


Thu May 18 22:21:30 PDT 2006 Kevin Karplus

Looking at what alignments were used by TryAllAlign, we see
	try1	2a9kB
	try2	1josA
	try3	2a9kB
	try4	2a9kB
	try5	1g24A

Unfortunately, try1, 2, 4, and 5 are almost identical.
Probably 1g24A and 2a9kB are similar structures.  1g24A is scop class
d.166.1.1, but 2a9kB is not in SCOP.

According VAST's precomputed neighbors for 2a9kB, there are lots of
neighbors.
http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml
Here are the top 60 (which all need to be excluded if you want a
chance of a different structure):

PDB C D 	 Ali. Len. 	 SCORE 	 P-VAL 	 RMSD 	 %Id 	 MMDB Date 	 Description

1UZI B 		205 	23.0 	10e-26.3 	0.7 	100.0 	08/2004
	C3 Exoenzyme From Clostridium Botulinum, Tetragonal Form


1UZI A 		204 	23.0 	10e-25.9 	0.7 	100.0 	08/2004
	C3 Exoenzyme From Clostridium Botulinum, Tetragonal Form


1G24 B 		205 	23.0 	10e-25.8 	0.8 	100.0 	03/2001
	The Crystal Structure Of Exoenzyme C3 From Clostridium
	Botulinum


1G24 D 		205 	22.8 	10e-25.2 	0.8 	100.0 	03/2001
	The Crystal Structure Of Exoenzyme C3 From Clostridium
	Botulinum


2A78 B 		205 	22.8 	10e-25.0 	0.6 	100.0 	11/2005
	Crystal Structure Of The C3bot-Rala Complex Reveals A Novel
	Type Of Action Of A Bacterial Exoenzymey


1GZE C 		205 	22.4 	10e-23.0 	0.9 	99.5 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c
	Mutant)


2BOV B 		205 	22.2 	10e-22.1 	0.7 	100.0 	05/2005
	Molecular Recognition Of An Adp-Ribosylating Clostridium
	Botulinum C3 Exoenzyme By Rala Gtpase


1GZE A 		201 	22.0 	10e-21.9 	0.8 	99.5 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c
	Mutant)


1GZF C 		205 	22.0 	10e-21.8 	0.7 	100.0 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme
	(Wild-Type) In Complex With Nad


1R45 D 		199 	21.8 	10e-21.7 	0.9 	65.3 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Triclinic Form


1GZF D 		200 	21.7 	10e-21.5 	0.8 	100.0 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme
	(Wild-Type) In Complex With Nad


1GZF B 		201 	21.6 	10e-21.4 	0.7 	100.0 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme
	(Wild-Type) In Complex With Nad


1GZE D 		200 	21.7 	10e-21.0 	0.9 	99.5 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c
	Mutant)


1GZE B 		200 	21.6 	10e-20.9 	0.8 	99.5 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c
	Mutant)


1G24 C 		204 	21.9 	10e-20.8 	0.8 	100.0 	03/2001
	The Crystal Structure Of Exoenzyme C3 From Clostridium
	Botulinum


1R4B B 		200 	21.3 	10e-20.6 	1.0 	65.0 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Monoclinic Form


1OJZ A 		197 	21.4 	10e-20.6 	1.8 	36.0 	09/2003
	The Crystal Structure Of C3stau2 From S. Aureus In With Nad


1OJQ A 		195 	21.4 	10e-20.6 	2.0 	36.4 	09/2003
	The Crystal Structure Of C3stau2 From S. Aureus


1R4B A 		200 	21.4 	10e-20.4 	1.0 	65.0 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Monoclinic Form


1GIQ A 2		186 	21.0 	10e-20.0 	2.2 	23.7 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1G24 A 		205 	21.5 	10e-19.9 	0.9 	100.0 	03/2001
	The Crystal Structure Of Exoenzyme C3 From Clostridium
	Botulinum


1PWV A 3		178 	21.1 	10e-19.8 	2.0 	18.5 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Wild-Type Protein
	Complexed With An Optimized Peptide Substrate


1QS1 D 2		183 	20.9 	10e-19.6 	2.3 	29.5 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1J7N B 3		176 	20.8 	10e-19.5 	1.9 	18.2 	11/2001
	Anthrax Toxin Lethal Factor


1GIQ B 2		187 	20.8 	10e-19.5 	2.3 	23.5 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1GIR A 2		190 	20.8 	10e-19.5 	2.3 	23.2 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadph


1PWP B 3		180 	20.8 	10e-19.5 	2.0 	17.8 	02/2004
	Crystal Structure Of The Anthrax Lethal Factor Complexed With
	Small Molecule Inhibitor Nsc 12155


1QS2 A 2		183 	20.7 	10e-19.4 	2.4 	30.1 	03/2001
	Crystal Structure Of Vip2 With Nad


1ZXV B 3		178 	20.7 	10e-19.3 	1.9 	18.0 	07/2005
	X-Ray Crystal Structure Of The Anthrax Lethal Factor Bound To
	A Small Molecule Inhibitor, Bi-Mfm3, 3-{5-[5-(4-Chloro-
	Phenyl)-Furan-2-Ylmethylene]-4-Oxo-2-Thioxo-Thiazolidin-3-
	Yl}-Propionic Acid


1QS1 C 2		182 	20.7 	10e-19.3 	2.3 	30.2 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1QS1 B 2		185 	20.7 	10e-19.3 	2.4 	29.7 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1PWQ A 3		179 	21.0 	10e-19.1 	2.0 	17.9 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Complexed With
	Thioacetyl-Tyr-Pro-Met-Amide, A Metal-Chelating Peptidyl Small
	Molecule Inhibitor


1PWU B 3		175 	20.6 	10e-18.9 	1.9 	18.3 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Complexed With
	(3-(N-Hydroxycarboxamido)-2-Isobutylpropanoyl-Trp-
	Methylamide), A Known Small Molecule Inhibitor Of Matrix
	Metalloproteases


1PWW A 3		178 	20.7 	10e-18.9 	2.0 	17.4 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Active Site Mutant
	Protein Complexed With An Optimized Peptide Substrate In The
	Presence Of Zinc


1PWU A 3		176 	20.5 	10e-18.7 	2.0 	18.2 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Complexed With
	(3-(N-Hydroxycarboxamido)-2-Isobutylpropanoyl-Trp-
	Methylamide), A Known Small Molecule Inhibitor Of Matrix
	Metalloproteases


1R45 C 		200 	20.8 	10e-18.5 	1.0 	65.0 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Triclinic Form


1J7N A 3		179 	20.3 	10e-18.2 	2.0 	17.9 	11/2001
	Anthrax Toxin Lethal Factor


1QS1 A 2		182 	20.1 	10e-17.9 	2.3 	29.7 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1YQY A 1		181 	20.1 	10e-17.8 	2.1 	17.1 	06/2005
	Structure Of B. Anthrax Lethal Factor In Complex With A
	Hydroxamate Inhibitor


1PWV B 3		178 	20.0 	10e-17.7 	2.0 	17.4 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Wild-Type Protein
	Complexed With An Optimized Peptide Substrate


1GZF A 		203 	20.6 	10e-17.5 	0.6 	100.0 	09/2002
	Structure Of The Clostridium Botulinum C3 Exoenzyme
	(Wild-Type) In Complex With Nad


1QS2 A 1		180 	20.2 	10e-17.5 	2.2 	24.4 	03/2001
	Crystal Structure Of Vip2 With Nad


1QS1 D 1		183 	19.9 	10e-16.9 	2.4 	24.0 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1QS1 A 1		181 	19.9 	10e-16.9 	2.2 	23.2 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1QS1 B 1		180 	19.8 	10e-16.6 	2.3 	24.4 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1PWQ B 3		181 	19.4 	10e-16.6 	2.4 	16.0 	02/2004
	Crystal Structure Of Anthrax Lethal Factor Complexed With
	Thioacetyl-Tyr-Pro-Met-Amide, A Metal-Chelating Peptidyl Small
	Molecule Inhibitor


1QS1 C 1		183 	19.8 	10e-16.6 	2.5 	24.0 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1R45 B 		198 	18.4 	10e-15.2 	1.0 	65.2 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Triclinic Form


1R45 A 		200 	18.4 	10e-15.2 	1.1 	65.0 	01/2005
	Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum,
	Triclinic Form


1GIR A 1		186 	19.1 	10e-15.0 	3.0 	16.7 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadph


1GIQ A 1		187 	19.1 	10e-15.0 	3.1 	16.6 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1GIQ B 1		187 	19.1 	10e-15.0 	3.0 	16.6 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1GIQ A 		192 	21.0 	10e-14.9 	2.4 	24.0 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1QS1 D 		186 	20.9 	10e-14.8 	2.5 	29.6 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1QS2 A 		185 	20.7 	10e-14.6 	2.4 	29.7 	03/2001
	Crystal Structure Of Vip2 With Nad


1QS1 C 		184 	20.7 	10e-14.6 	2.3 	29.9 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1QS1 B 		184 	20.7 	10e-14.5 	2.4 	29.9 	03/2001
	Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2)


1GIQ B 		193 	20.8 	10e-14.5 	2.4 	23.8 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadh


1GIR A 		192 	20.8 	10e-14.5 	2.4 	22.9 	02/2003
	Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From
	Clostridium Perfringens With Nadph


1PWP A 3	127 	16.2 	10e-14.1 	2.0 	18.1 	02/2004
	Crystal Structure Of The Anthrax Lethal Factor Complexed With
	Small Molecule Inhibitor Nsc 12155


Thu May 18 23:30:21 PDT 2006 George Shackelford

I am going to do a try6 that eliminates the above from the list of templates.
I'm eliminating 1sz9A and 1g5aA as well; they were used for templates.

Starting try6 on peep. (I note that try5 appeared to get interrupted somehow.)

Fri May 19 11:26:30 PDT 2006 Kevin Karplus

Try6 finally got a different template, but it still doesn't look that
great either for predicted burial or secondary structure.  The score
functions keep liking try2, probably because the sidechain weights are
too high and the sidechains are very free to move when they are
improperly exposed.

The robetta models look like they were generated from a terrible
template and look pretty bad.

Fri May 19 11:46:30 PDT 2006 George Shackelford

Using the scorings for try6 as a guide, I've lowered the sidechain weight to 1,
the pred_alpha's to 0.3, the constraints to 1 and have only the 449a
constraints included. Break weight is still 20. This is simply another shot in
the dark.

starting try7 on peep

Sat May 20 23:31:55 PDT 2006 George Shackelford

Try7 had a failure during the processing similar to try5. I have kept the original log file as "try7.log.bk". I hope these were flukes.

Try7 looks nicely different. I don't believe in the parallel sheet even if it
is in the middle. After talking to Jenny Draper, I gather that since this comes
from a pathogenicity island, the protein operates outside the cell. It doesn't
have cysteines for disulfide bonds, but they are not necessary. Still I would
not expect to see any parallel sheets in such a protein.

We have fixed a bad bug in rr predictions and have the new 449a_45 predictor
on-line. There are fresh predictions and I am going to include them while
boosting constraints to 5; the rr predictions are all bonuses. Also I have put
a new file that is longer than the currently constructed rr.constraints file.
Otherwise try8 is the same as try7. I still would like to see the buried stuff
buried.

----------------------------------------------------------------------

Date: Sun, 21 May 2006 05:46:50 -0700
From: Kevin Karplus 
To: learithe
CC: karplus, ggshack
Subject: CASP7 T0287


Jenny,
Did George ever ask you about target T0287?  It is a Helicobacter
pylori protein that turns out to be an ORFan.

It is in /projects/compbio/experiments/protein-predict/casp7/T0287
(my home directory has softlink for casp7, if that is easier)

The protein is in swissprot as CAGS_HELPJ and CAGS_HELPY
which calls it CAG pathogenicity island protein 13
but has essentially no useful information beyond the sequence and the
location on the genome.

Can you find out anything more about this protein?
Can you explain to us what the "pathogenicity island" means? (I'm
guessing that it is a region of the genome that is associated with the
bacteria causing disease rather than being a benign parasite.)

I don't think we're going to get much on this protein, but any
thoughts would be useful.

----------------------------------------------------------------------

Sun May 21 08:59:59 PDT 2006 Kevin Karplus

try8 seems to be picking up the 1josA alignment, like try6 and try7.

I'm not sure why George excluded 1sz9A and 1g5aA if they were not
similar to  2a9kB and 1g24A.  They may have been the next best hits,
so ignoring them seems a bit strange.

--------------------------------------------------------------------------------

Sun May 21 13:20:27 PDT 2006 George Shackelford

The pathogenicity island contains genes for proteins for attack and/or defense
of the H. Pylori. Of interest is to note that the VAST hits contained proteins
that are "Enzymatic Componet Of Iota-Toxin", "Anthrax Lethal Factor",
"Vegetative Insecticidal", or similar. These characteristics are consistent
with what we would expect from a pathogenicity island. Nevertheless, the decoy
used for the VAST search has bad exposure of hydrophobics. Unless such exposure
is characteristic of pathogens.

Kevin, do toxins sometimes have hydrophobics exposed? Maybe for injecting into
foreign cells?

Per Kevin's comments, I am restoring 1sz9A and 1g5aA for try9. I'm
also restoring the constraint weights to 10, because I find that the
rr.constraints should pull towards a different structure, more like
try2...  I'm also reseting breaks back to 50. I get tired of the
breaks.

starting try9 on peep.

Sun May 21 23:40:30 PDT 2006 Kevin Karplus

Warning: peep is now the machine on Martin's desktop and should not be
heavily loaded at times when Martin is around.  Remember to nice
anything run on peep.  Orcas and lopez may be better default machines,
as they each have 2 processors.

Try9 seems to have picked up 1josA again.

Perhaps a run without 1josA as a possible template might be useful to
get another selection.

--------------------------------------------------------------------------------

Mon May 22 16:20:07 PDT 2006 George Shackelford

Kevin's right. Now is a time to try without 1josA. I'm also going to look at
server predictions for T0287 to see what others are offering. I suspect we're
all guessing.

I checked the score-all and try1,try3,try4 are at the top. Weighting seems to
be OK. I am going to push constraints (rr) up to 20 just to see if we can get
something else going. I'm also dropping 1josA.

try10 running on orcas (not peep!)

Make started Tue May 23 09:05:39 PDT 2006
Running on lopez.cse.ucsc.edu

Tue May 23 09:13:41 PDT 2006 Kevin Karplus

I accidentally started a new make in this directory (typo: I meant to
start it in T0297).  It should do no harm, so I'll let it finish so
that the summary.html file is up to date.  It is, in fact, creating a
few new alignments, since there are a few more t06 alignments finished
than last time it ran.

Tue May 23 09:25:04 PDT 2006 Kevin Karplus

I was wrong---the makefile wants to re-run try1.  I moved the old
decoys/*try1* files to decoys/first-... (then had to move all the
try10 files back---oops).

Tue May 23 09:58:57 PDT 2006 George Shackelford

Try10 is pretty ugly, but not as ugly as many of the server inputs. Frankly I
like the server results of SAM-t06 best of all. At least all the atoms are in
place.

At this point I'm wondering if I should just try a run without templates. It
will likely be a mess but I'm not sure what to do next. We may have exhausted
the possibilities. I'm going to go back and look at the actual 2ak9B to see if
I can get any ideas.

OK, I think I'll repeat try10 and turn off all constraints, push wet6.5 up to
20, and phobic_fit to 5. Bury Bury Bury.

try11 started on orcas

Tue May 23 14:57:09 PDT 2006 Kevin Karplus

I scored all the server results with the try1 costfcn, and one server
came out ahead of all of ours: Bilab-ENABLE
This is probably an illusion, because the actual costs in the log file
are given as NaN.  Looking in the score-all+servers.try1.rdb file, I
see that these models have "NaN" for "bad_peptide".  	I wonder how
that happened.

Tue May 23 15:14:33 PDT 2006 Kevin Karplus

By the way George, increasing wet6.5 will try to EXPOSE as much as
possible, not bury things.  Increase the dry weights or phobic_fit if
you want to increase burial.

Tue May 23 17:02:46 PDT 2006 Kevin Karplus

OK, I think I found the problem with the bad_peptide computation.
It was a bug in the coplanar_trans operation in Transform.cc, which
was only tickled in the case where the two sets of points were already
coplanar (perhaps only when they were coplanar but with opposite
orientations).

Tue May 23 17:43:43 PDT 2006 Kevin Karplus

I looked at try1 and try11 scoring of the server outputs, and nothing
scores particularly well.

I think we'll end up submitting 5 of our own predictions, picking ones
that are as different as possible from each other.

Wed May 24 11:36:47 PDT 2006 George Shackelford

I didn't know that about wet6.5.

I agree that nothing really looks all that good. I'm looking over which 5 to
use and I am going to try and close gaps, drop constraints, using the model as
the include. Otherwise there appears not much else we can do.

Try1(3,4,5) is obvious.
Try2 should get in as well. (it's different for sure)
Try11 does have a set of helices that I like, and a parallel sheet I don't.
Try6 has one large parallel sheet. I don't buy that for one moment but it's
different. bad breaks
Try7 is similar to try6 but different enough to see which is better. bad breaks
here too.
Try8 is another variation on Try6
Try9 ditto. try9 is the best scoring of try(6,7,8,9) according to
score-all+servers.try11. Needs its breaks closed if posssible.
Try10 is actually different. It has two parallel sheets for me not to believe.
break problems here.

I suppose we could put these all into includes, and see if we can get some
combination that is yet different. Otherwise we have the five I suggest are:
Try1
Try2
Try11
Try10
Try9

I'm going to try to repair breaks and do some polishing.

No. I've changed my mind. I like the helices of try11, and the sheets of try4.
Can we combine them and make something better?

I'm taking the helices of try11 set to wt. .5 and the last four sheet
constraints of try4 with weights of .5. Included rr.constraints. We'll see what
happens.

try12 running on lopez

Wed May 24 15:54:49 PDT 2006 George Shackelford

Try12 does pretty decently in scoring but I don't like the shape. I'm taking
out the rr.constraints, setting near-backbone to 5, wet6.5 to 15, break to 20,
constraints to 10, and phobic_fit to 2. Let's see what this does. At least
try12 comes up with a different shape.

try13 running on lopez


Try12 started on lopez.


Thu May 25 14:54:10 PDT 2006 Kevin Karplus

I don't like either try12 or try13---neither does the try13 costfcn,
which still favors try1,try3, and try4.

For some reason, there seems to be no grep-best-rosetta file, though
the repack files are there.  Perhaps we need to remake
grep-best-rosetta.

OK, fixed that, and Rosetta likes try3 best, though it really  hates
them all.

Thu May 25 21:59:40 PDT 2006 George Shackelford

I STILL like the helices at the start of try11. I'm going to copy and modify
the try11 pdb slicing out the residues after the helices. Time to use
ReadConformPDB.

I've built a T0287.try11-short.pdb using residues 1-122. This section has good
burial characteristics and decent structure. I'll leave it to undertaker to do
the rest.

starting try14 on orcas.

Fri May 26 12:28:15 PDT 2006 George Shackelford

Try14 aborted during the night due to some improper changes I made to
try14.under. I fixed those, but I am still get crashes. I'm commenting out my
ReadConformPDB line and seeing if it at least can run.

I am able to get try14 to run only by splicing part of try4 into "short.pdb"
which now contains 1-106 of try11. This works but it is such a distortion that
the final results is not particularly good.

Fri May 26 18:32:55 PDT 2006 Kevin Karplus

try14-opt1 looks ok up to about R66, but after that it's a mess.

This target is due June 1, so we have to make our decisions about it
by Tuesday afternoon, so that there is time for some polishing Tuesday
night and submission on Wed.

Sat May 27 23:14:39 PDT 2006 George Shackelford

I still like the starting part of try11. I'm going to replace
"short.pdb" with try14 which seems to get decent results. Maybe this
will get something reasonable.


Sun May 28 09:22:39 PDT 2006 Kevin Karplus

George, I was unable to send you e-mail, as pacbell.net (which you
forward your SoE mail to, insists that you don't exist.  Read the
recent messages that I sent to the group in casp7/group-mail.gz


In try15-opt2, I like some of hte pieces, though E92-P123 looks a bit awkward.
Perhaps you could add some sheet constraints for the three strands
that look like they are trying to form an antiparallel sheet:
K127-K132,  L182-Y190, L151-T158

125>	 fskpimfkmsi
192<	kmykeldfkmlld
150>	pllklfvmtdeevn

Perhaps
SheetConstraint	K127	K132	K189	K184	hbond	F185
SheetConstraint K153	T158	K189	K184	hbond	K184

Sun May 28 18:32:38 PDT 2006 George Shackelford

Perhaps is good enough for me. I'm going to use those SheetConstraints, removing
any other conflicting constraints.

I've also taken out other TryAllAligns.

Try16 started on orcas.

Sun May 28 22:55:50 PDT 2006 Kevin Karplus

Taking out the tryallaligns may have sped up try16,  but the search
space was too small---try16 just diddled around in the neighborhood of
try15 without moving anything very much.

Tue May 30 09:45:31 PDT 2006 George Shackelford
Yes, one final run with TryAllAligns back in.

I've been doing some research into seeing how good we can do contact predictions
by using only the target sequence for training and predicting. Using a simple
predictor based on "hotspot" distributions (1.0 for the AA, 0.0 otherwise) and a
distribution window of 11 around each of the i,j residues, I can get results of
L/2: .13 to L/10 .17. This is not quite as good as the results for MI e-values
using alignments but possibly good enough for as "bonus" constraints.

try17 running on orcas (with TryAllAligns back on).

Tue May 30 15:21:21 PDT 2006 George Shackelford

Try17 has failed and there are DEBUG lines at the end of the try17.log
file. I'm not sure what has happened but tempus fugit; we need to
start polishing what we want for submission. I am still not satisfied
with what we have but I can't seem to get what I suspect is the
structure - a helical bundle followed by a beta-sheet functional area
followed by some helices as a "cap." The six helix bundle of try11 and
the 4 helix bundle of try 15(?) are the likely starts. Both show good
burial though the six helix bundle is a bit foamy. The helical
structure in try4 looks possible but matching these two up has not
worked.

I would say it is time to polish some submissions. I am going to try
and polish try11 for a start - I think.

Tue May 30 19:34:46 PDT 2006 Kevin Karplus

undertaker was buggy today, because of faulty bug fixes I introduced
over the weekend.  It should be working again now.

George, I need a list of 5 distinctly different predictions for this
target (not minor variations on a theme).  Can you have a list before
noon tomorrow?

Tue May 30 22:54:22 PDT 2006 George Shackelford

Looking at the scoring for try18 (which had zero constraints, 80 for breaks, 40 for soft clashes) we have:

The top four:
try12 - This needs polishing. Interesting collection of helices on both ends. Best scoring.
try3 - One of the try1 variations. Actually in a dead heat with try1 in scoring
try18 - A polished version of try11, this has a six helix bundle at the start.but poorly developed from there on.
try16 - starts with a four helix bundle. Ends up a mess, but it's different

Then two trailing models:
try10 and try9. They are different from each other and from above, but both score down the list. Your call. The one you pick will need some polishing.

Finally there is always try2. Really different. Really really different. Three helices...

I'm going to start try19 as a polish run of try12. May as well restore 'sidechains' as weight 5. Otherwise like try18 in terms of polishing.

try19 running on orcas

I'm getting a polishing run of try16 as well.

try20 running on peep

Wed May 31 09:16:24 PDT 2006 Kevin Karplus

Although they are top scoring with the "unconstrained" costfcn, I don't
care that much for try2 or try19 (based on try12).  I think that try3 and
try18 are more promising.  try18 actually looks quite good up to about
F113.  try20 (based on try16) looks plausible, but not great.

Currently I favor
	try18
	try3		(because it came up automatically)
	try20
	try19
	try9

I'm going to increase the phobic_fit and dry weights of the
unconstrained costfcns, to see how that reorders things.
The order is now try19, try12, try18, try11, try12, try4, try1, ...
Wait: there are two different try12-opt2 files (one gzipped one not),
and they score differently.  I renamed the older set to be
try12-opt1-old, ...

It is the try12 and not the try12-old run that scores well.

Rosetta likes repacking try18, try3, try2, try19, first-try1, try11,
try10, try5, ... (actually, it hates them all, but these are the least hateful).

I superimposed a bunch of these models.
Try11 is just an earlier draft of try18 and try4 is just a poorer
version of try3.  The try19 run, though it scores well, it much less
compact than the other runs.  

Wed May 31 09:50:50 PDT 2006 Kevin Karplus

I tried superimposing
ReadConformPDB T0287.try18-opt2.pdb
ReadConformPDB T0287.try3-opt2.pdb
ReadConformPDB T0287.try20-opt2.pdb
ReadConformPDB T0287.try12-opt2.pdb
ReadConformPDB T0287.try9-opt2.pdb
ReadConformPDB T0287.try10-opt2.pdb
ReadConformPDB T0287.try19-opt2.pdb

I definitely prefer try12 to try19.

try9 has some very bad breaks, but looks more promising than try10.
I'll start a polishing run for try9.

Wed May 31 10:40:34 PDT 2006 George Shackelford

I suspect that try19 used the "old" try12.pdb rather than the newer compressed version. It do well if we rerun try19 with the newer try12.

When I first looked at your 9:15 comments, I wondered what had happened to try10. Try9 is somewhat similar and it does have bad breaks (I had reviewed the tries last wed.). In either case I don't like those parallel sheets.

Ok, I'm repolishing try12 as try22(?)

try22 started on peep.

Wed May 31 12:12:05 PDT 2006 Kevin Karplus

try22-opt1 is now scoring best, and try21 is beating try9 by a lot, though
still with a very bad break.

Wed May 31 12:28:45 PDT 2006 Kevin Karplus
try22-opt2 now surpasses try22-opt2.

There is something wrong with the sort-by-rosetta script when George
runs it (he gets 
	rm: Command not found.
	grep: Command not found.
error messages).  It causes no problems when I run it, so this must be
some sort of initialization problem with the path.  I can't read
George's .cshrc file to see if there are any problems in it.

Rosetta's order for repacking is try18, try3, try2, try`9, first-try1, try11,
try22, try10, ...

Wed May 31 12:43:38 PDT 2006 Kevin Karplus

I'm going to do one polishing run starting with all the decoys and
with CrossOver turned up high, to see if we can do any mixing and
matching to get better decoys.  I suspect that this will just polish
up the top-scoring try22-opt2 a little bit, but I'm hoping that a
crossover with try18-opt2 might pick up something good.

Wed May 31 12:47:21 PDT George Shackelford

I will look at the problem with my .cshrc later. First things first.

I noticed that both try22-opt2.pdb.gz and try22-opt2.gromacs0.pdb.gz
have negative phobic-fit. The big difference betweent them is the
pred-alpha cost factors. Given the problem with predictions on ORFans,
I wonder if these should not be set to zero (maybe along with
bystroff) when doing final scoring. If that is done, the differences
would certainly change a lot. I'm trying to learn how Kevin does those
quick re-scoring with specific constraints. Some make target...

Wed May 31 12:59:12 PDT 2006 Kevin Karplus

Create a new costfcn (say foo.costfcn) then
	make decoys/score-all.foo.pretty

I'll make up a try24.costfcn with the predicted alphas removed.

Wed May 31 13:02:17 PDT 2006 Kevin Karplus

try22-opt2 scores best with the try24 costfcn (no surprise there).
Then comes try19, try18, first-try1, try4, try12, try11, try3, ...

If I want crossovers with try18, I should probably do an optimization
run with all the models that score better than try18 removed.
I'll do that for try24. (started on orcas)

Wed May 31 13:20:23 PDT 2006 Kevin Karplus

try24 seems to be doing just minor polishing on try18-opt2.  It runs
with no improvement for several generations (probably enough to flush
out all the non-try18-opt2 models from the pool) then starts making
tiny improvements.

Wed May 31 13:37:20 PDT 2006 George Shackelford

(Sigh) Nice try. Short of polishing first-try1, I don't see that we
should spend too much more on questionable models. Is it about time
for a wrap-up? Something you'd like for me to look at?

13:58: GGS

I do wish we could redo some of try22. Like between:
112 -122 or 127
146 - 180

When I look at that final section 112-198, what I could see is a
anti-parallel sheet like:
2-3-1-4
where 1 is ~ 112-122, 2 is ~ 146-156, 3 is ~ 158-168,
4 is ~ 190-198 (possibly) with the helices connecting on the
outside.

HelixConstraint (T0287)S134	(T0287)Y145
HelixConstraint (T0287)D172	(T0287)K188


Wed May 31 15:37:47 PDT 2006 Kevin Karplus

I'm going to submit 5 models now, but if you can improve try22, we can
replace the submission later tonight or early tomorrow morning.
Here are the 5 I'll submit.
ReadConformPDB T0287.try24-opt2.pdb
ReadConformPDB T0287.try3-opt2.pdb
ReadConformPDB T0287.try20-opt2.pdb
ReadConformPDB T0287.try23-opt2.pdb
ReadConformPDB T0287.try21-opt2.pdb

Wed May 31 16:37:33 PDT 2006 George Shackelford

I've estimated it would take at least two more days of work to do what
I have in mind. I've got bigger fish to fry (and submissions of my
own).

From: Karen Ottemann 
Subject: Re: H. pylori protein as CASP target
Date: Tue, 30 May 2006 15:39:54 -0700
To: Kevin Karplus 

Hi Kevin--
I have to look into this protein, as i don't know anything about it 
in particular, other than the fact that other proteins of the CAG 
island make up a secretion system and at least one secreted protein 
(CagA). This secretion system secretes proteins directly from the 
bacterium to adjacent mammalian cells. It's of the Type IV type.  
This secretion system makes the mammalian hosts respond with more 
severe inflammation.  It sounds like Cag7/HP0534 doesn't participate 
in this aspect of the CAG island though, based on mutational studies, 
so that means they probably have no idea what it does...

I assume you saw:
Molecular Microbiology
Volume 42 Page 1337  - December 2001
doi:10.1046/j.1365-2958.2001.02714.x
Volume 42 Issue 5
  Systematic mutagenesis of the Helicobacter pylori cag pathogenicity 
island: essential genes for CagA translocation in host cells and 
induction of interleukin-8
Wolfgang Fischer J=FCrgen P=FCls Renate Buhrdorf Bettina Gebert Stefan
Odenbreit Rainer Haas*


On May 11, 2006, at 9:15 AM, Kevin Karplus wrote:

>
> CASP7 target T0287 is CagS (HP0534), an ORFan protein found only in
> Helicobacter pylori (CAG pathogenicity island protein).
>
> If you know anything about this protein, it could be useful to us in
> trying to predict its structure.  (ORFan proteins are the worst for
> structure-prediction methods.)
>
> Kevin Karplus
>

Date: Thu, 1 Jun 2006 20:19:39 -0700
From: Kevin Karplus 
To: ottemann
CC: karplus
Subject: H.pylori CASP target


Thanks for looking into the H. pylori CASP target for us.  The
deadline for the target was at noon today, and I did not get your
e-mail until this evening, so I'm afraid we were not able to use any
information you had.  

Since the deadline has passed, I've put up our predictions at 
http://www.soe.ucsc.edu/~karplus/T0287/summary.html
(the predictions we submitted are
	model1.ts-submitted
	model2.ts-submitted
	model3.ts-submitted
	model4.ts-submitted
	model5.ts-submitted
)


Kevin Karplus