Thu May 11 08:40:44 PDT 2006 T0287 Make started Thu May 11 08:47:40 PDT 2006 Running on lopez.cse.ucsc.edu Thu May 11 09:08:23 PDT 2006 Kevin Karplus This target is not easy comparative modeling, since none of the iterated searches found PDB files directly in the search. It is an ORFan protein, found only in Helicobacter pylori. I sent email to Karen Ottemann, asking if she knows anything about this protein. I'm getting weak hits to 1hzgA (d.194.1.1) in the initial searches. Date: Thu, 11 May 2006 09:15:46 -0700 From: Kevin Karplus To: ottemann Subject: H. pylori protein as CASP target CASP7 target T0287 is CagS (HP0534), an ORFan protein found only in Helicobacter pylori (CAG pathogenicity island protein). If you know anything about this protein, it could be useful to us in trying to predict its structure. (ORFan proteins are the worst for structure-prediction methods.) Kevin Karplus ------------------------------------------------------------ Make started Thu May 11 11:49:03 PDT 2006 Running on lopez.cse.ucsc.edu The best e-value is a terrible 11.9 (for 1b0b). There are predicted to be a bunch of helices, a 3-strand anti-parallel sheet, and more helices. The try1-opt2 prediction is awful, somehow conjuring up a six-strand anti-parallel barrel. This target is going to take some work! Thu May 11 22:55:07 PDT 2006 Kevin Karplus None of the sheet constraints from the initial alignments are worth anything. The predicted strands are s1 I129-M133 s2 I142-L144 s3 L152-M157 The pair from 1josA are strands s1 ^v s3 ^v s2, but only by putting s1 and s2 together into a single strand on one side of s3! (1josA has a mixed sheet with a hairpin and later parallel strand). The ones from 1dc1A are more plausible, but include a somewhat dubious strand s0 (P123-F124) and do not include strand s3. For try1 we were not using the str2 constraints, but rather the weaker dssp-ehl2 constraints, which show only a single strand. For try2, I'll stick in more of the helix and strand constraints from the neural nets, and leave out the sheet constraints and rr constraints. The rr constraints are just based on propensity and separation, so are not worth much. Fri May 12 06:58:24 PDT 2006 Kevin Karplus try2-opt2 is even worse than try1-opt2. Instead of too many strands there are now none, and the helices created don't pack at all. I think we're going to have to come up with some sheet constraints by hand. Sat May 13 12:27:37 PDT 2006 Kevin Karplus I downloaded the robetta models and scored them. I think there is something wrong, since both the phobic_fit and sidechain scores are enormous. Ah---sidechains aren't reported, only backbone and CB. I wonder if Baker's group knows this. Perhaps I should add a PatchConform command to undertaker to put sidechains on? In any case I should improve undertaker not to compute sidechain costs for missing sidechains. Wed May 17 14:04:11 PDT 2006 George Shackelford I have made a new rr prediction using the new 449a_45 contact predictor. There are considerably more contacts predicted than found by 352_28. The new predictions are also unexpectedly strong; there are a number of them >.60 probability. I worry that they have focused in on a family that has contaminated the t04/t06 alignments. I may retry just using the t2k alignment. However what I am seeing when I plot the predictions using the str2 logo, I find they indicate that the helices are broken into smaller helices that form a bundle. I am looking for existing helical bundles that it could resemble; the shortness of the suggested helices seems unlikely; I don't see how they stay stable. Well, there are some examples of stable tight helices. DNA bindings, globins, and others (1aow). From: George Shackelford To: Kevin Karplus Subject: How does T0287 try3 look? Date: Wed, 17 May 2006 20:02:12 -0700 I've generated a new try for T0287 (the ORFan) and it looks nice to me but I would like your input/reaction. - George P.S. I'd really like to get 449a_45 as the new predictor in place of 352_28. It apparently is a lot better. -------------------------------------------------------------------------------- Wed May 17 20:06:59 PDT 2006 Kevin Karplus I assume George is talking about decoys/T0287.try3-opt2.pdb It is clear that he did not use the T0287.do3 target, as this pdb file has not been gzipped, and the rosetta and gromacs optimizations of it were not done. I'll gzip the pdb files and run the T0287.do3 target to finish the job. Wed May 17 20:13:08 PDT 2006 Kevin Karplus Rosetta likes try3 better than try2, but the try3 costfcn still prefers try2. I can see why George prefers try3 to try2, but try3 is very similar to try1. I'm not sure I know improvement George is seeing. Wed May 17 20:58:53 PDT 2006 Kevin Karplus I superimposed the models, and try1 and try3 are almost identical. The only significant difference is which way the N-terminal helix points. -------------------------------------------------------------------------------- Thu May 18 00:43:01 PDT 2006 George Shackelford I thought to run a try using constraints generated by the 449a_45 predictor just to see what results I would get. The 449a_45 is like the 352 but has a window size of 5 for the local structure predictions rather than 3, includes 'ent', the joint entropy rank which I find adds about 1-2% to the results, and uses the z-value instead of the actual value for MI e-values; that change seems to add about 1% better results. Most of the improvement comes from the wider local structure predictions window. added: include T0287.449a_45.rr.constraints to try3.costfcn The results do look a lot better than try2 (which I thought would be an improvement over try1). I had not yet taken a look at try1, so I didn't realize that the new rr constraints might produce results similar to try1. I was bothered by the fact that the results look so "good"/"protein-like". I checked the structures for the list of possible templates to see if we had duplicated one of them, but we had not. The newer 449a_45.rr.constraints have been calibrated, but the values are still too high and are likely to overwhelm other constraints. Specifically the burial / phobic_fit is clearly wrong. I need to retry with higher values to phobic_fit, and near-backbone. I'm going to do a new try4 with: near-backbone 15 phobic_fit 5 constraints 7 Thu May 18 11:23:30 PDT 2006 George Shackelford The results of try4 do succeed in burying some of the hydrophobics but is foamer than ever. I think it is really trying to pick up on the earlier models. I want a really fresh start; I'm going to comment out the TryAllAligns and see what I get. First I'll check on the README's to make sure that is what I need to do. Thu May 18 14:56:50 PDT 2006 George Shackelford I talked to Kevin and I am using T0290 try2.under as a model on how to exclude a template that appears to be taking over. I noted that the rr.449a_45.constraints were the only difference between try2 and try4. Given the score in try4.log.gz, I am excluding 2a9kB from the considered templates. I will also reduce the break cost to 20 to allow for a more 'creative' decoy from undertaker. Thu May 18 22:21:30 PDT 2006 Kevin Karplus Looking at what alignments were used by TryAllAlign, we see try1 2a9kB try2 1josA try3 2a9kB try4 2a9kB try5 1g24A Unfortunately, try1, 2, 4, and 5 are almost identical. Probably 1g24A and 2a9kB are similar structures. 1g24A is scop class d.166.1.1, but 2a9kB is not in SCOP. According VAST's precomputed neighbors for 2a9kB, there are lots of neighbors. http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml Here are the top 60 (which all need to be excluded if you want a chance of a different structure): PDB C D Ali. Len. SCORE P-VAL RMSD %Id MMDB Date Description 1UZI B 205 23.0 10e-26.3 0.7 100.0 08/2004 C3 Exoenzyme From Clostridium Botulinum, Tetragonal Form 1UZI A 204 23.0 10e-25.9 0.7 100.0 08/2004 C3 Exoenzyme From Clostridium Botulinum, Tetragonal Form 1G24 B 205 23.0 10e-25.8 0.8 100.0 03/2001 The Crystal Structure Of Exoenzyme C3 From Clostridium Botulinum 1G24 D 205 22.8 10e-25.2 0.8 100.0 03/2001 The Crystal Structure Of Exoenzyme C3 From Clostridium Botulinum 2A78 B 205 22.8 10e-25.0 0.6 100.0 11/2005 Crystal Structure Of The C3bot-Rala Complex Reveals A Novel Type Of Action Of A Bacterial Exoenzymey 1GZE C 205 22.4 10e-23.0 0.9 99.5 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c Mutant) 2BOV B 205 22.2 10e-22.1 0.7 100.0 05/2005 Molecular Recognition Of An Adp-Ribosylating Clostridium Botulinum C3 Exoenzyme By Rala Gtpase 1GZE A 201 22.0 10e-21.9 0.8 99.5 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c Mutant) 1GZF C 205 22.0 10e-21.8 0.7 100.0 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (Wild-Type) In Complex With Nad 1R45 D 199 21.8 10e-21.7 0.9 65.3 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Triclinic Form 1GZF D 200 21.7 10e-21.5 0.8 100.0 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (Wild-Type) In Complex With Nad 1GZF B 201 21.6 10e-21.4 0.7 100.0 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (Wild-Type) In Complex With Nad 1GZE D 200 21.7 10e-21.0 0.9 99.5 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c Mutant) 1GZE B 200 21.6 10e-20.9 0.8 99.5 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (L177c Mutant) 1G24 C 204 21.9 10e-20.8 0.8 100.0 03/2001 The Crystal Structure Of Exoenzyme C3 From Clostridium Botulinum 1R4B B 200 21.3 10e-20.6 1.0 65.0 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Monoclinic Form 1OJZ A 197 21.4 10e-20.6 1.8 36.0 09/2003 The Crystal Structure Of C3stau2 From S. Aureus In With Nad 1OJQ A 195 21.4 10e-20.6 2.0 36.4 09/2003 The Crystal Structure Of C3stau2 From S. Aureus 1R4B A 200 21.4 10e-20.4 1.0 65.0 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Monoclinic Form 1GIQ A 2 186 21.0 10e-20.0 2.2 23.7 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1G24 A 205 21.5 10e-19.9 0.9 100.0 03/2001 The Crystal Structure Of Exoenzyme C3 From Clostridium Botulinum 1PWV A 3 178 21.1 10e-19.8 2.0 18.5 02/2004 Crystal Structure Of Anthrax Lethal Factor Wild-Type Protein Complexed With An Optimized Peptide Substrate 1QS1 D 2 183 20.9 10e-19.6 2.3 29.5 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1J7N B 3 176 20.8 10e-19.5 1.9 18.2 11/2001 Anthrax Toxin Lethal Factor 1GIQ B 2 187 20.8 10e-19.5 2.3 23.5 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1GIR A 2 190 20.8 10e-19.5 2.3 23.2 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadph 1PWP B 3 180 20.8 10e-19.5 2.0 17.8 02/2004 Crystal Structure Of The Anthrax Lethal Factor Complexed With Small Molecule Inhibitor Nsc 12155 1QS2 A 2 183 20.7 10e-19.4 2.4 30.1 03/2001 Crystal Structure Of Vip2 With Nad 1ZXV B 3 178 20.7 10e-19.3 1.9 18.0 07/2005 X-Ray Crystal Structure Of The Anthrax Lethal Factor Bound To A Small Molecule Inhibitor, Bi-Mfm3, 3-{5-[5-(4-Chloro- Phenyl)-Furan-2-Ylmethylene]-4-Oxo-2-Thioxo-Thiazolidin-3- Yl}-Propionic Acid 1QS1 C 2 182 20.7 10e-19.3 2.3 30.2 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1QS1 B 2 185 20.7 10e-19.3 2.4 29.7 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1PWQ A 3 179 21.0 10e-19.1 2.0 17.9 02/2004 Crystal Structure Of Anthrax Lethal Factor Complexed With Thioacetyl-Tyr-Pro-Met-Amide, A Metal-Chelating Peptidyl Small Molecule Inhibitor 1PWU B 3 175 20.6 10e-18.9 1.9 18.3 02/2004 Crystal Structure Of Anthrax Lethal Factor Complexed With (3-(N-Hydroxycarboxamido)-2-Isobutylpropanoyl-Trp- Methylamide), A Known Small Molecule Inhibitor Of Matrix Metalloproteases 1PWW A 3 178 20.7 10e-18.9 2.0 17.4 02/2004 Crystal Structure Of Anthrax Lethal Factor Active Site Mutant Protein Complexed With An Optimized Peptide Substrate In The Presence Of Zinc 1PWU A 3 176 20.5 10e-18.7 2.0 18.2 02/2004 Crystal Structure Of Anthrax Lethal Factor Complexed With (3-(N-Hydroxycarboxamido)-2-Isobutylpropanoyl-Trp- Methylamide), A Known Small Molecule Inhibitor Of Matrix Metalloproteases 1R45 C 200 20.8 10e-18.5 1.0 65.0 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Triclinic Form 1J7N A 3 179 20.3 10e-18.2 2.0 17.9 11/2001 Anthrax Toxin Lethal Factor 1QS1 A 2 182 20.1 10e-17.9 2.3 29.7 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1YQY A 1 181 20.1 10e-17.8 2.1 17.1 06/2005 Structure Of B. Anthrax Lethal Factor In Complex With A Hydroxamate Inhibitor 1PWV B 3 178 20.0 10e-17.7 2.0 17.4 02/2004 Crystal Structure Of Anthrax Lethal Factor Wild-Type Protein Complexed With An Optimized Peptide Substrate 1GZF A 203 20.6 10e-17.5 0.6 100.0 09/2002 Structure Of The Clostridium Botulinum C3 Exoenzyme (Wild-Type) In Complex With Nad 1QS2 A 1 180 20.2 10e-17.5 2.2 24.4 03/2001 Crystal Structure Of Vip2 With Nad 1QS1 D 1 183 19.9 10e-16.9 2.4 24.0 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1QS1 A 1 181 19.9 10e-16.9 2.2 23.2 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1QS1 B 1 180 19.8 10e-16.6 2.3 24.4 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1PWQ B 3 181 19.4 10e-16.6 2.4 16.0 02/2004 Crystal Structure Of Anthrax Lethal Factor Complexed With Thioacetyl-Tyr-Pro-Met-Amide, A Metal-Chelating Peptidyl Small Molecule Inhibitor 1QS1 C 1 183 19.8 10e-16.6 2.5 24.0 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1R45 B 198 18.4 10e-15.2 1.0 65.2 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Triclinic Form 1R45 A 200 18.4 10e-15.2 1.1 65.0 01/2005 Adp-Ribosyltransferase C3bot2 From Clostridium Botulinum, Triclinic Form 1GIR A 1 186 19.1 10e-15.0 3.0 16.7 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadph 1GIQ A 1 187 19.1 10e-15.0 3.1 16.6 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1GIQ B 1 187 19.1 10e-15.0 3.0 16.6 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1GIQ A 192 21.0 10e-14.9 2.4 24.0 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1QS1 D 186 20.9 10e-14.8 2.5 29.6 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1QS2 A 185 20.7 10e-14.6 2.4 29.7 03/2001 Crystal Structure Of Vip2 With Nad 1QS1 C 184 20.7 10e-14.6 2.3 29.9 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1QS1 B 184 20.7 10e-14.5 2.4 29.9 03/2001 Crystal Structure Of Vegetative Insecticidal Protein2 (Vip2) 1GIQ B 193 20.8 10e-14.5 2.4 23.8 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadh 1GIR A 192 20.8 10e-14.5 2.4 22.9 02/2003 Crystal Structure Of The Enzymatic Componet Of Iota-Toxin From Clostridium Perfringens With Nadph 1PWP A 3 127 16.2 10e-14.1 2.0 18.1 02/2004 Crystal Structure Of The Anthrax Lethal Factor Complexed With Small Molecule Inhibitor Nsc 12155 Thu May 18 23:30:21 PDT 2006 George Shackelford I am going to do a try6 that eliminates the above from the list of templates. I'm eliminating 1sz9A and 1g5aA as well; they were used for templates. Starting try6 on peep. (I note that try5 appeared to get interrupted somehow.) Fri May 19 11:26:30 PDT 2006 Kevin Karplus Try6 finally got a different template, but it still doesn't look that great either for predicted burial or secondary structure. The score functions keep liking try2, probably because the sidechain weights are too high and the sidechains are very free to move when they are improperly exposed. The robetta models look like they were generated from a terrible template and look pretty bad. Fri May 19 11:46:30 PDT 2006 George Shackelford Using the scorings for try6 as a guide, I've lowered the sidechain weight to 1, the pred_alpha's to 0.3, the constraints to 1 and have only the 449a constraints included. Break weight is still 20. This is simply another shot in the dark. starting try7 on peep Sat May 20 23:31:55 PDT 2006 George Shackelford Try7 had a failure during the processing similar to try5. I have kept the original log file as "try7.log.bk". I hope these were flukes. Try7 looks nicely different. I don't believe in the parallel sheet even if it is in the middle. After talking to Jenny Draper, I gather that since this comes from a pathogenicity island, the protein operates outside the cell. It doesn't have cysteines for disulfide bonds, but they are not necessary. Still I would not expect to see any parallel sheets in such a protein. We have fixed a bad bug in rr predictions and have the new 449a_45 predictor on-line. There are fresh predictions and I am going to include them while boosting constraints to 5; the rr predictions are all bonuses. Also I have put a new file that is longer than the currently constructed rr.constraints file. Otherwise try8 is the same as try7. I still would like to see the buried stuff buried. ---------------------------------------------------------------------- Date: Sun, 21 May 2006 05:46:50 -0700 From: Kevin Karplus To: learithe CC: karplus, ggshack Subject: CASP7 T0287 Jenny, Did George ever ask you about target T0287? It is a Helicobacter pylori protein that turns out to be an ORFan. It is in /projects/compbio/experiments/protein-predict/casp7/T0287 (my home directory has softlink for casp7, if that is easier) The protein is in swissprot as CAGS_HELPJ and CAGS_HELPY which calls it CAG pathogenicity island protein 13 but has essentially no useful information beyond the sequence and the location on the genome. Can you find out anything more about this protein? Can you explain to us what the "pathogenicity island" means? (I'm guessing that it is a region of the genome that is associated with the bacteria causing disease rather than being a benign parasite.) I don't think we're going to get much on this protein, but any thoughts would be useful. ---------------------------------------------------------------------- Sun May 21 08:59:59 PDT 2006 Kevin Karplus try8 seems to be picking up the 1josA alignment, like try6 and try7. I'm not sure why George excluded 1sz9A and 1g5aA if they were not similar to 2a9kB and 1g24A. They may have been the next best hits, so ignoring them seems a bit strange. -------------------------------------------------------------------------------- Sun May 21 13:20:27 PDT 2006 George Shackelford The pathogenicity island contains genes for proteins for attack and/or defense of the H. Pylori. Of interest is to note that the VAST hits contained proteins that are "Enzymatic Componet Of Iota-Toxin", "Anthrax Lethal Factor", "Vegetative Insecticidal", or similar. These characteristics are consistent with what we would expect from a pathogenicity island. Nevertheless, the decoy used for the VAST search has bad exposure of hydrophobics. Unless such exposure is characteristic of pathogens. Kevin, do toxins sometimes have hydrophobics exposed? Maybe for injecting into foreign cells? Per Kevin's comments, I am restoring 1sz9A and 1g5aA for try9. I'm also restoring the constraint weights to 10, because I find that the rr.constraints should pull towards a different structure, more like try2... I'm also reseting breaks back to 50. I get tired of the breaks. starting try9 on peep. Sun May 21 23:40:30 PDT 2006 Kevin Karplus Warning: peep is now the machine on Martin's desktop and should not be heavily loaded at times when Martin is around. Remember to nice anything run on peep. Orcas and lopez may be better default machines, as they each have 2 processors. Try9 seems to have picked up 1josA again. Perhaps a run without 1josA as a possible template might be useful to get another selection. -------------------------------------------------------------------------------- Mon May 22 16:20:07 PDT 2006 George Shackelford Kevin's right. Now is a time to try without 1josA. I'm also going to look at server predictions for T0287 to see what others are offering. I suspect we're all guessing. I checked the score-all and try1,try3,try4 are at the top. Weighting seems to be OK. I am going to push constraints (rr) up to 20 just to see if we can get something else going. I'm also dropping 1josA. try10 running on orcas (not peep!) Make started Tue May 23 09:05:39 PDT 2006 Running on lopez.cse.ucsc.edu Tue May 23 09:13:41 PDT 2006 Kevin Karplus I accidentally started a new make in this directory (typo: I meant to start it in T0297). It should do no harm, so I'll let it finish so that the summary.html file is up to date. It is, in fact, creating a few new alignments, since there are a few more t06 alignments finished than last time it ran. Tue May 23 09:25:04 PDT 2006 Kevin Karplus I was wrong---the makefile wants to re-run try1. I moved the old decoys/*try1* files to decoys/first-... (then had to move all the try10 files back---oops). Tue May 23 09:58:57 PDT 2006 George Shackelford Try10 is pretty ugly, but not as ugly as many of the server inputs. Frankly I like the server results of SAM-t06 best of all. At least all the atoms are in place. At this point I'm wondering if I should just try a run without templates. It will likely be a mess but I'm not sure what to do next. We may have exhausted the possibilities. I'm going to go back and look at the actual 2ak9B to see if I can get any ideas. OK, I think I'll repeat try10 and turn off all constraints, push wet6.5 up to 20, and phobic_fit to 5. Bury Bury Bury. try11 started on orcas Tue May 23 14:57:09 PDT 2006 Kevin Karplus I scored all the server results with the try1 costfcn, and one server came out ahead of all of ours: Bilab-ENABLE This is probably an illusion, because the actual costs in the log file are given as NaN. Looking in the score-all+servers.try1.rdb file, I see that these models have "NaN" for "bad_peptide". I wonder how that happened. Tue May 23 15:14:33 PDT 2006 Kevin Karplus By the way George, increasing wet6.5 will try to EXPOSE as much as possible, not bury things. Increase the dry weights or phobic_fit if you want to increase burial. Tue May 23 17:02:46 PDT 2006 Kevin Karplus OK, I think I found the problem with the bad_peptide computation. It was a bug in the coplanar_trans operation in Transform.cc, which was only tickled in the case where the two sets of points were already coplanar (perhaps only when they were coplanar but with opposite orientations). Tue May 23 17:43:43 PDT 2006 Kevin Karplus I looked at try1 and try11 scoring of the server outputs, and nothing scores particularly well. I think we'll end up submitting 5 of our own predictions, picking ones that are as different as possible from each other. Wed May 24 11:36:47 PDT 2006 George Shackelford I didn't know that about wet6.5. I agree that nothing really looks all that good. I'm looking over which 5 to use and I am going to try and close gaps, drop constraints, using the model as the include. Otherwise there appears not much else we can do. Try1(3,4,5) is obvious. Try2 should get in as well. (it's different for sure) Try11 does have a set of helices that I like, and a parallel sheet I don't. Try6 has one large parallel sheet. I don't buy that for one moment but it's different. bad breaks Try7 is similar to try6 but different enough to see which is better. bad breaks here too. Try8 is another variation on Try6 Try9 ditto. try9 is the best scoring of try(6,7,8,9) according to score-all+servers.try11. Needs its breaks closed if posssible. Try10 is actually different. It has two parallel sheets for me not to believe. break problems here. I suppose we could put these all into includes, and see if we can get some combination that is yet different. Otherwise we have the five I suggest are: Try1 Try2 Try11 Try10 Try9 I'm going to try to repair breaks and do some polishing. No. I've changed my mind. I like the helices of try11, and the sheets of try4. Can we combine them and make something better? I'm taking the helices of try11 set to wt. .5 and the last four sheet constraints of try4 with weights of .5. Included rr.constraints. We'll see what happens. try12 running on lopez Wed May 24 15:54:49 PDT 2006 George Shackelford Try12 does pretty decently in scoring but I don't like the shape. I'm taking out the rr.constraints, setting near-backbone to 5, wet6.5 to 15, break to 20, constraints to 10, and phobic_fit to 2. Let's see what this does. At least try12 comes up with a different shape. try13 running on lopez Try12 started on lopez. Thu May 25 14:54:10 PDT 2006 Kevin Karplus I don't like either try12 or try13---neither does the try13 costfcn, which still favors try1,try3, and try4. For some reason, there seems to be no grep-best-rosetta file, though the repack files are there. Perhaps we need to remake grep-best-rosetta. OK, fixed that, and Rosetta likes try3 best, though it really hates them all. Thu May 25 21:59:40 PDT 2006 George Shackelford I STILL like the helices at the start of try11. I'm going to copy and modify the try11 pdb slicing out the residues after the helices. Time to use ReadConformPDB. I've built a T0287.try11-short.pdb using residues 1-122. This section has good burial characteristics and decent structure. I'll leave it to undertaker to do the rest. starting try14 on orcas. Fri May 26 12:28:15 PDT 2006 George Shackelford Try14 aborted during the night due to some improper changes I made to try14.under. I fixed those, but I am still get crashes. I'm commenting out my ReadConformPDB line and seeing if it at least can run. I am able to get try14 to run only by splicing part of try4 into "short.pdb" which now contains 1-106 of try11. This works but it is such a distortion that the final results is not particularly good. Fri May 26 18:32:55 PDT 2006 Kevin Karplus try14-opt1 looks ok up to about R66, but after that it's a mess. This target is due June 1, so we have to make our decisions about it by Tuesday afternoon, so that there is time for some polishing Tuesday night and submission on Wed. Sat May 27 23:14:39 PDT 2006 George Shackelford I still like the starting part of try11. I'm going to replace "short.pdb" with try14 which seems to get decent results. Maybe this will get something reasonable. Sun May 28 09:22:39 PDT 2006 Kevin Karplus George, I was unable to send you e-mail, as pacbell.net (which you forward your SoE mail to, insists that you don't exist. Read the recent messages that I sent to the group in casp7/group-mail.gz In try15-opt2, I like some of hte pieces, though E92-P123 looks a bit awkward. Perhaps you could add some sheet constraints for the three strands that look like they are trying to form an antiparallel sheet: K127-K132, L182-Y190, L151-T158 125> fskpimfkmsi 192< kmykeldfkmlld 150> pllklfvmtdeevn Perhaps SheetConstraint K127 K132 K189 K184 hbond F185 SheetConstraint K153 T158 K189 K184 hbond K184 Sun May 28 18:32:38 PDT 2006 George Shackelford Perhaps is good enough for me. I'm going to use those SheetConstraints, removing any other conflicting constraints. I've also taken out other TryAllAligns. Try16 started on orcas. Sun May 28 22:55:50 PDT 2006 Kevin Karplus Taking out the tryallaligns may have sped up try16, but the search space was too small---try16 just diddled around in the neighborhood of try15 without moving anything very much. Tue May 30 09:45:31 PDT 2006 George Shackelford Yes, one final run with TryAllAligns back in. I've been doing some research into seeing how good we can do contact predictions by using only the target sequence for training and predicting. Using a simple predictor based on "hotspot" distributions (1.0 for the AA, 0.0 otherwise) and a distribution window of 11 around each of the i,j residues, I can get results of L/2: .13 to L/10 .17. This is not quite as good as the results for MI e-values using alignments but possibly good enough for as "bonus" constraints. try17 running on orcas (with TryAllAligns back on). Tue May 30 15:21:21 PDT 2006 George Shackelford Try17 has failed and there are DEBUG lines at the end of the try17.log file. I'm not sure what has happened but tempus fugit; we need to start polishing what we want for submission. I am still not satisfied with what we have but I can't seem to get what I suspect is the structure - a helical bundle followed by a beta-sheet functional area followed by some helices as a "cap." The six helix bundle of try11 and the 4 helix bundle of try 15(?) are the likely starts. Both show good burial though the six helix bundle is a bit foamy. The helical structure in try4 looks possible but matching these two up has not worked. I would say it is time to polish some submissions. I am going to try and polish try11 for a start - I think. Tue May 30 19:34:46 PDT 2006 Kevin Karplus undertaker was buggy today, because of faulty bug fixes I introduced over the weekend. It should be working again now. George, I need a list of 5 distinctly different predictions for this target (not minor variations on a theme). Can you have a list before noon tomorrow? Tue May 30 22:54:22 PDT 2006 George Shackelford Looking at the scoring for try18 (which had zero constraints, 80 for breaks, 40 for soft clashes) we have: The top four: try12 - This needs polishing. Interesting collection of helices on both ends. Best scoring. try3 - One of the try1 variations. Actually in a dead heat with try1 in scoring try18 - A polished version of try11, this has a six helix bundle at the start.but poorly developed from there on. try16 - starts with a four helix bundle. Ends up a mess, but it's different Then two trailing models: try10 and try9. They are different from each other and from above, but both score down the list. Your call. The one you pick will need some polishing. Finally there is always try2. Really different. Really really different. Three helices... I'm going to start try19 as a polish run of try12. May as well restore 'sidechains' as weight 5. Otherwise like try18 in terms of polishing. try19 running on orcas I'm getting a polishing run of try16 as well. try20 running on peep Wed May 31 09:16:24 PDT 2006 Kevin Karplus Although they are top scoring with the "unconstrained" costfcn, I don't care that much for try2 or try19 (based on try12). I think that try3 and try18 are more promising. try18 actually looks quite good up to about F113. try20 (based on try16) looks plausible, but not great. Currently I favor try18 try3 (because it came up automatically) try20 try19 try9 I'm going to increase the phobic_fit and dry weights of the unconstrained costfcns, to see how that reorders things. The order is now try19, try12, try18, try11, try12, try4, try1, ... Wait: there are two different try12-opt2 files (one gzipped one not), and they score differently. I renamed the older set to be try12-opt1-old, ... It is the try12 and not the try12-old run that scores well. Rosetta likes repacking try18, try3, try2, try19, first-try1, try11, try10, try5, ... (actually, it hates them all, but these are the least hateful). I superimposed a bunch of these models. Try11 is just an earlier draft of try18 and try4 is just a poorer version of try3. The try19 run, though it scores well, it much less compact than the other runs. Wed May 31 09:50:50 PDT 2006 Kevin Karplus I tried superimposing ReadConformPDB T0287.try18-opt2.pdb ReadConformPDB T0287.try3-opt2.pdb ReadConformPDB T0287.try20-opt2.pdb ReadConformPDB T0287.try12-opt2.pdb ReadConformPDB T0287.try9-opt2.pdb ReadConformPDB T0287.try10-opt2.pdb ReadConformPDB T0287.try19-opt2.pdb I definitely prefer try12 to try19. try9 has some very bad breaks, but looks more promising than try10. I'll start a polishing run for try9. Wed May 31 10:40:34 PDT 2006 George Shackelford I suspect that try19 used the "old" try12.pdb rather than the newer compressed version. It do well if we rerun try19 with the newer try12. When I first looked at your 9:15 comments, I wondered what had happened to try10. Try9 is somewhat similar and it does have bad breaks (I had reviewed the tries last wed.). In either case I don't like those parallel sheets. Ok, I'm repolishing try12 as try22(?) try22 started on peep. Wed May 31 12:12:05 PDT 2006 Kevin Karplus try22-opt1 is now scoring best, and try21 is beating try9 by a lot, though still with a very bad break. Wed May 31 12:28:45 PDT 2006 Kevin Karplus try22-opt2 now surpasses try22-opt2. There is something wrong with the sort-by-rosetta script when George runs it (he gets rm: Command not found. grep: Command not found. error messages). It causes no problems when I run it, so this must be some sort of initialization problem with the path. I can't read George's .cshrc file to see if there are any problems in it. Rosetta's order for repacking is try18, try3, try2, try`9, first-try1, try11, try22, try10, ... Wed May 31 12:43:38 PDT 2006 Kevin Karplus I'm going to do one polishing run starting with all the decoys and with CrossOver turned up high, to see if we can do any mixing and matching to get better decoys. I suspect that this will just polish up the top-scoring try22-opt2 a little bit, but I'm hoping that a crossover with try18-opt2 might pick up something good. Wed May 31 12:47:21 PDT George Shackelford I will look at the problem with my .cshrc later. First things first. I noticed that both try22-opt2.pdb.gz and try22-opt2.gromacs0.pdb.gz have negative phobic-fit. The big difference betweent them is the pred-alpha cost factors. Given the problem with predictions on ORFans, I wonder if these should not be set to zero (maybe along with bystroff) when doing final scoring. If that is done, the differences would certainly change a lot. I'm trying to learn how Kevin does those quick re-scoring with specific constraints. Some make target... Wed May 31 12:59:12 PDT 2006 Kevin Karplus Create a new costfcn (say foo.costfcn) then make decoys/score-all.foo.pretty I'll make up a try24.costfcn with the predicted alphas removed. Wed May 31 13:02:17 PDT 2006 Kevin Karplus try22-opt2 scores best with the try24 costfcn (no surprise there). Then comes try19, try18, first-try1, try4, try12, try11, try3, ... If I want crossovers with try18, I should probably do an optimization run with all the models that score better than try18 removed. I'll do that for try24. (started on orcas) Wed May 31 13:20:23 PDT 2006 Kevin Karplus try24 seems to be doing just minor polishing on try18-opt2. It runs with no improvement for several generations (probably enough to flush out all the non-try18-opt2 models from the pool) then starts making tiny improvements. Wed May 31 13:37:20 PDT 2006 George Shackelford (Sigh) Nice try. Short of polishing first-try1, I don't see that we should spend too much more on questionable models. Is it about time for a wrap-up? Something you'd like for me to look at? 13:58: GGS I do wish we could redo some of try22. Like between: 112 -122 or 127 146 - 180 When I look at that final section 112-198, what I could see is a anti-parallel sheet like: 2-3-1-4 where 1 is ~ 112-122, 2 is ~ 146-156, 3 is ~ 158-168, 4 is ~ 190-198 (possibly) with the helices connecting on the outside. HelixConstraint (T0287)S134 (T0287)Y145 HelixConstraint (T0287)D172 (T0287)K188 Wed May 31 15:37:47 PDT 2006 Kevin Karplus I'm going to submit 5 models now, but if you can improve try22, we can replace the submission later tonight or early tomorrow morning. Here are the 5 I'll submit. ReadConformPDB T0287.try24-opt2.pdb ReadConformPDB T0287.try3-opt2.pdb ReadConformPDB T0287.try20-opt2.pdb ReadConformPDB T0287.try23-opt2.pdb ReadConformPDB T0287.try21-opt2.pdb Wed May 31 16:37:33 PDT 2006 George Shackelford I've estimated it would take at least two more days of work to do what I have in mind. I've got bigger fish to fry (and submissions of my own). From: Karen Ottemann Subject: Re: H. pylori protein as CASP target Date: Tue, 30 May 2006 15:39:54 -0700 To: Kevin Karplus Hi Kevin-- I have to look into this protein, as i don't know anything about it in particular, other than the fact that other proteins of the CAG island make up a secretion system and at least one secreted protein (CagA). This secretion system secretes proteins directly from the bacterium to adjacent mammalian cells. It's of the Type IV type. This secretion system makes the mammalian hosts respond with more severe inflammation. It sounds like Cag7/HP0534 doesn't participate in this aspect of the CAG island though, based on mutational studies, so that means they probably have no idea what it does... I assume you saw: Molecular Microbiology Volume 42 Page 1337 - December 2001 doi:10.1046/j.1365-2958.2001.02714.x Volume 42 Issue 5 Systematic mutagenesis of the Helicobacter pylori cag pathogenicity island: essential genes for CagA translocation in host cells and induction of interleukin-8 Wolfgang Fischer J=FCrgen P=FCls Renate Buhrdorf Bettina Gebert Stefan Odenbreit Rainer Haas* On May 11, 2006, at 9:15 AM, Kevin Karplus wrote: > > CASP7 target T0287 is CagS (HP0534), an ORFan protein found only in > Helicobacter pylori (CAG pathogenicity island protein). > > If you know anything about this protein, it could be useful to us in > trying to predict its structure. (ORFan proteins are the worst for > structure-prediction methods.) > > Kevin Karplus > Date: Thu, 1 Jun 2006 20:19:39 -0700 From: Kevin Karplus To: ottemann CC: karplus Subject: H.pylori CASP target Thanks for looking into the H. pylori CASP target for us. The deadline for the target was at noon today, and I did not get your e-mail until this evening, so I'm afraid we were not able to use any information you had. Since the deadline has passed, I've put up our predictions at http://www.soe.ucsc.edu/~karplus/T0287/summary.html (the predictions we submitted are model1.ts-submitted model2.ts-submitted model3.ts-submitted model4.ts-submitted model5.ts-submitted ) Kevin Karplus