Mon Jul 26 10:37:48 PDT 2004 T0249 DUE 26 Aug Mon Jul 26 16:14:21 PDT 2004 Kevin Karplus This looks like a fold-recognition target with a.4.5 as the superfamily. Only models 5 and 7 in the T0249.t2k.undertaker-align.pdb file have the hairpin centered at G66, and try1-opt2 throws it away. I added a lot more hits from the top of the t04 str2+CB_burial_14_7 score file (all for a.4.5) and made extra_alignments. Mon Jul 26 22:15:22 PDT 2004 Kevin Karplus The rr constraints may not be too useful for this one---at least the ones that are visible in the alignment to the templates do not seem to be very good. I'll make try2 have the helix and strand constraints from the t04.dssp-ehl2 predictions, and the sheet constraints from align5.sheets. There is probably another sheet constraint needed, to pair I135-V137 with a somewhat later antiparallel strand, but the prediction is weak, so I'd like to see what comes up first. Thu Jul 29 09:50 PDT 2004 Kevin Karplus Observation: The reason (at least a reason) try1 was so messed up was that the alignment all-align.a2m.gz was not getting properly created before try1 was run. I moved try1 to no_align1, and am rerunning try1 after creating all-align.a2m.gz properly. Fri Jul 30 00:12:17 PDT 2004 Kevin Karplus try2 makes some progress--the N-terminal end looks fairly decent, but after that is pretty junky. Two of the strands are flipped over though, since we definitely want F70 and I61 on the same side of the sheet, quite likely lined up (maybe I61 pairs with T72 though), probably keeping the hbonds on H62. N24 should probably be paired with R69, with F70 hbonding. The fold recognition is not getting us much in the initial alignments, but we are getting 2 1/2 sheets in try2. We might want to add a sheet constraint for Q107-T110 with R138-R140 with R138 hbonding, probably antiparallel. Take a quick look at try1-opt2 first (now that it is done properly), to see if there is anything else not found in try2. Fri Jul 30 00:52:42 PDT 2004 Kevin Karplus The new try1 shows some hope, there is a little supersecondary structure---a couple of small sheets. There are a lot of charges on this protein, but they don't seem to indicate a DNA-binding role, since there are more D and E rather than R and K. Fri Jul 30 05:21:08 PDT 2004 Kevin Karplus Took another quick look at rerun try1, which has two 3-strand sheets, one of which may have a flipped strand. The volumes around the beta sheets are not too badly packed, but there are large holes. Fri Aug 13 14:28:43 PDT 2004 George Shackelford I've started to look at this one. I may do a try3 based on the new try1. Sat Aug 14 14:50:35 PDT 2004 George Shackelford I've started redoing try2 as try3 using the new try1. Now running on cluck Sun Aug 15 14:31:26 2004 George Shackelford Try3 is out and it is more compact than try2. However when I looked at the structures of 'close' hits (actually ~10^-4) all I see are spread-out structures. The problem is that all alpha (or almost) means a sturdy frame which can imply spread-out. Also those structures are not close; the str2 shows breaks where breaks should not be. What to do? If I could get the breaks to take place, perhaps I could do a VAST?? For the time being perhaps I should just 'roll the dice.' From karplus@soe.ucsc.edu Tue Aug 17 18:15:24 2004 Date: Tue, 17 Aug 2004 18:15:21 -0700 From: Kevin Karplus To: karplus@soe.ucsc.edu, sol@soe.ucsc.edu, ggshack@soe.ucsc.edu, learithe@soe.ucsc.edu, martina@soe.ucsc.edu, bbarnes@ucsc.edu, marcias@ucsc.edu, rph@soe.ucsc.edu Subject: [casp6@predictioncenter.llnl.gov: T0249 additional information] GEORGE, IMPORTANT INFORMATION FOR T0249: DO A SUBDOMAIN FOR 1-162 ------- Start of forwarded message ------- X-Authentication-Warning: rysy.llnl.gov: andriy set sender to casp6@predictioncenter.llnl.gov using -f Subject: T0249 additional information From: casp 6 Date: Tue, 17 Aug 2004 17:38:51 -0700 X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on coyote.cse.ucsc.edu X-Spam-Level: X-Spam-Status: No, hits=-99.4 required=3.0 tests=MISSING_HEADERS, USER_IN_WHITELIST autolearn=no version=2.64 Dear predictors, Believe it or not, but we have released all the targets intended for the CASP6 experiment. We ended up with 87 targets released (79 valid for predictions at the moment). Server predictors may have a break now but human expert groups still have a lot of work in front of them. Recently we received the information that can slightly reduce amount of work for those of you, who haven't finished their modeling of T0249 yet. Experimentalists warn us that T0249 protein was not solved in full length as it was proteolysed during crystallization for unknown reason. So, please be informed that residues 163-209 will be missing in the final structure of T0249. - -- Andriy Kryshtafovych CASP team ------- End of forwarded message ------- Tue Aug 17 23:35:43 2004 George Shackelford Wow! That sure changes things. So we only have part of the protein. Starting a domain 1-162 on cluck. Wed Aug 18 12:56:59 2004 George Shackelford Try1 looks semi-decent. Helices broke where I like. I'm taking out the t2k (since I like the t04 ehl2 better) and putting in a lot of the RR constraints. I'll see what it can do with those. I'm NOT taking out SCWRL runs this time. Try2 running on peep. Wed Aug 18 17:22:40 2004 George Shackelford Try2 does not score as well as try1 using try2's cost function. That's not too good. I need to study the two and find out what makes one better than the other and what adjustments I need to make. Fri Aug 20 10:10:09 2004 George Shackelford I finally have a try3 to run. I've respecified the two sheets to see what I can get. I need to pick up speed on this one. At least it has a nice structure to it. Try3 running on peep. Fri Aug 20 12:10:29 2004 George Shackelford Try3 doesn't look too bad, but I don't like the 138-148 sheet. It needs to be tighter at the loop (even though there are a couple of prolines in the works) and I'm extending the other sheet at ~64-67. I taking out SCWRL runs. I'm trying: SheetConstraint I135 S143 E154 R146 hbond V137 10.0 SheetConstraint T58 I64 E73 G67 hbond I64 10.0 Fri Aug 20 14:13:14 2004 George Shackelford Try4 doesn't do well with unconstrained scoring. It's not very good looking. Strand ~140-148 is bad. I'm going to back off the prolines in the middle of the loop and try again. Whatever I get, I may just do some polishing. It may be that try1 got a really good hit in predicting. Changed sheet: SheetConstraint I135 A141 E154 L148 hbond V137 10.0 I also increased the break cost and dry6.5. try5 running on peep. Fri Aug 20 16:57:53 2004 George Shackelford Try5 is getting further and further away from the best scoring tries such as try1 and try3. I'm going to do a polishing run on try1 to get improvement there and one on try3. try6 running on peep. Fri Aug 20 21:04:31 2004 George Shackelford Try6 is a disaster. It shows up as a bunch of helices and no sheets. Apparently in closing two breaks (and there are two breaks) it undoes what we have. I am going to put the sheet constraints back in, adjust the break amount carefully and see what we can get. I am also going to include try3 (which scores well) as another alignment. Oops. I found I misspelled 'ReadConformPDB.' Nevermind. Try7 running. Sat Aug 21 11:27:15 2004 George Shackelford Try7 is better; one break still and a bit foamy. I am going to try and build a couple of more sheets that Kevin notice were starting to form. This may result in more foam, but they make sense. I am still a bit concerned about the region 15-18 which 'near' suggests is moderately buried. The next stretch past that (19-22) is shown as exposed. I'm beginning to wonder if I could possibily use GROMACS to see where this structure is going. 2600 atoms is still a bit to analyze. So I went and tried it out and it handle the 2611 atoms well. It took the try7 pdb and healed the break in it. That's good enough for me. I tried to get a score for the pdb's including the GROMACS and the GROMACS scored badly. Turns out it did VERY poorly on sidechains. Is GROMACS that bad on rotamers?? Anyway I've tabled it for now. I'm going to try and close the break in try8 and also give those other sheets a chance to form. Sun Aug 22 10:44:23 2004 George Shackelford Try8 scores the best yet. It never quite formed the short sheets; I'm not going to push for them. I'm going to keep polishing from try8 and get the gap closed. I'm turning off the constraints again. I examined the gromacs decoy and the try7 decoy. GROMACS bent the arginines over towards the surface; a good move implying it is working. Try7 and try8 have the Arg's pointing outwards. I still can't believe it scored so badly on the sidechains. Try9 running on peep. Sun Aug 22 13:58:46 2004 George Shackelford Try9 scored the best ever, but it still has a break. I am going to try and use a gromacs pdb to help heal that break and get a final polishing. I wrote a simple shell script, 'dogromacs.sh' for running the various programs that are a part of GROMACS. If my results work I should offer it to others as a means of polishing. Sun Aug 22 16:40:45 2004 George Shackelford Try10 does just a bit better that try9 because it does a bit better on soft clashes otherwise it's no different. It did not pick up on the solution to the break that gromacs offered. I don't have anywhere to go except maybe do a superimpose of gromacs and cut and paste that section in. Even that might just cause problems. Need to check with Kevin. Mon Aug 23 22:24:40 2004 George Shackelford I have done a new run to get extra alignments. I am going to try and see what I can get by including them as part of an AllAlign. I'm using the try2 as a basis. Try11 running Tue Aug 24 09:05:50 2004 George Shackelford In try11 the results are different, and it score reasonably well. Now to start polishing and see what happens. Try12 running on peep. Tue Aug 24 14:25:17 PDT 2004 Kevin Karplus I was curious about the gromacs results. If the problem is just the sidechains, then repacking by rosetta should fix the problem. I had some trouble doing the repacking, so I tried editing a copy of the gromacs file. First I removed the hydrogens (since it uses a different subset than Rosetta uses), but that didn't help. Then I noticed that they had an O1 and O2 atom at the end of the chain, instead of O and OXT. I renamed them and Rosetta ran fine producing T0249.try7-opt2.gromacs-noH.repack-nonPC.pdb For the try9-opt2 gromacs file, I just did the O1,O2->O,OXT fix, and that seems to be all Rosetta needed. The repacking by rosetta does recover a lot from the damage done by gromacs. gromacs seems to really want to get rid of clashes, at almost all costs. Tue Aug 24 15:24:02 PDT 2004 Kevin Karplus I did a superposition of the try7-opt2 models to see what gromacs was doing. It seems that it is putting ILE CD1 atoms in random positions far from the main chain. This is clearly a serious bug. Other than that, the differences between the gromacs-optimized chain and the others seem fairly minor. Although the try7-opt2.gromacs file has fewer backbone breaks (only 2 instead of 16), the second one is worse than the second break in the try7-opt2 input. Similarly, in try9-opt2, gromacs replaced one bad break and 12 tiny ones with 3 bad breaks. Tue Aug 24 17:45:39 2004 George Shackelford The misplacing of ILE CD1 atoms is clearly a bug in gromacs. I wonder if it does that with the force field that is designed for proteins? The huge penalty for violating the Van Der Waals radius in a molecular dynamics simulation is not surprising. As for handling the breaks, I was hoping that gromacs might do an adequate job of repairing breaks for us to use, however, it appears we need to 'heat' the system for a while and then see where it might settle down. In any case, Kevin's observations are appreciated. Try12 scores much better than try11. It needs to fix some breaks and clashes and it might challenge try10 as a leader. I like the way it buries the hydrophobics, but the overall shape needs to compress more. Time for another round. Try13 running on peep. Wed Aug 25 09:11:26 2004 George Shackelford Try13 was only a little better on unconstrained scoring so I went ahead and ran a try14 based on it overnight. The results is again slightly better in scoring. This is about as good as it's going to get in the current configuration. I like the results a lot better because of the superior burial of hydrophobics. Also the shape is beginning to take on an intriguing symmetry based on the positions of the beta sheets and the general topology. The actual structure may likely be close to this, however I don't know how to get it to shift somemore properly. Perhaps by reducing the hbond and break until shifts occur then retightening? I think it's time to turn this over to Kevin. So here is my group: try1-opt2.pdb.gz # the automatic one Try10-opt2.pdb.gz # the scoring leader and the first configuration Try13-opt2.pdb.gz # the best of the second configuration Wed Aug 25 13:20:44 PDT 2004 Kevin Karplus Both try10 and try13 look pretty good as monomers. I lean toward try13 also, since this fold family is used for transcriptional regulators (usually as dimers) I think I'd like to match the tandem repeats that occur in 1lvaA. Tandem repeats also happen in 1p4xA, 1repC, 2fokA, 2fokB, 1fokA. We have hits to 1p4xA and 1repC, so let's make sure that we have alignments for them also. Let's try a run from alignments with just those alignments. Wed Aug 25 14:44:07 PDT 2004 Kevin Karplus try15-opt2 seems to be getting the second copy fairly nicely, but is messing up the first copy. It is quite similar to try10 for the second copy, and there is not much point to trying a crossover. I think I like try10-opt2 better than try14-opt2, but it needs a polishing run, eigher with the try15 constraints or unconstrained, since it is not scoring as well as try10-opt1. I think the break weights were set much too high (relative to the state of the inputs) on try10, which is why it doesn't do quite as well on the unconstrained models---it tried too hard to close breaks. The try10 conformation comes closer to the nearly parallel sheets of the other tandem winged helix proteins. Wed Aug 25 15:30:07 PDT 2004 Kevin Karplus I'll do a polishing run on all models using a copy of the try15 costfcn. That will take about an hour---after that I'll have George do a quick gromacs minimization and I'll do Rosetta repack to fix up the gromacs bugs. Wed Aug 25 16:34:41 PDT 2004 Kevin Karplus try16 finished. It is ok, and I'll probably submit it as our first choice, but the two helices that come just before the hairpins are not arranged so that they can fall into the major groove of a DNA molecule. They aren't so arranged in any of our models. Wed Aug 25 16:47:43 PDT 2004 Kevin Karplus George made a gromacs run and I ran Rosetta to repack the model (though it didn't tear apart the isoleucines this time). Still, the sidechain score for the gromacs models was really terrible. Repacking with rosetta helped, but Rosetta did not like this as well as T0249.try9-opt2.gromacs.OXT-fix.repack-nonPC.pdb, which is still its favorite. I suppose at some point we should superimpose all the try16-opt2 models, and find out what gromacs was doing wrong, but it is not really relevant at the moment. I'll submit try16-opt2 best unconstrained try14-opt2 alternative model try9-opt2.gromacs.OXT-fix.repack-nonPC best Rosetta energy try1-opt2 full-auto T0249-1jgsA-t2k-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5 best sam-t02 template Thu Nov 18 23:43:28 PST 2004 Martina Koeva Based on the smooth gdt scores: best sam-t04 32.7880 (decoys/1-162/try4-opt1) best submit ? model1 ? auto 31.1727 align ? robetta best 22.4961 (robetta model2) robetta1 22.4672 Note that this has been extracted from two diffrent evaluate.rdb files: one in the decoys/ directory and the other in the decoys/1-162/ directory. The pdb file is the same. However, which one is the full auto model? I used the one from decoys/1-162. Fri Nov 19 16:49:59 PST 2004 Kevin Karplus WARNING: the real work was all in 1-162.