Mon May 15 09:07:37 PDT 2006 T0285 Make started Mon May 15 09:09:17 PDT 2006 Running on orcas.cse.ucsc.edu Mon May 15 10:45:59 PDT 2006 Kevin Karplus This looks like a new-fold (or remote homology) target. Mon May 15 12:00:12 PDT 2006 Kevin Karplus The evalue for the best hit is around 20 and the top hits are all to different folds, so there really doesn't seem to be much chance of getting a decent alignment. Mon May 15 17:01:46 PDT 2006 Kevin Karplus Due to a typo in the Make.main file, the rr constraints were not computed before try1 was done, so the constraints were not used in try1. Mon May 15 17:26:33 PDT 2006 Kevin Karplus Although there were no strong hits (not surprising, since T0285 is an ORFan), the try1-opt2 structure does not look bad. Both secondary structure and burial look pretty good. There is really no point to using the residue-residue predictions, since there is no mutual information signal, and separation plus propensity is not very interesting. None of the top 5 alignments seemed to have anything interesting, so it is pretty amazing that undertaker managed to come up with anything reasonable. I should probably do another run with a slightly tweaked cost function, to see whether this fold is reliably generated. I'll start this try2 run on orcas. Mon May 15 20:11:44 PDT 2006 Kevin Karplus I ran VAST on the try1-opt2 structure to see where the structure came from. There are matches to 1skoB d.110.7.1 1f5mB d.110.2.1 1vetA d.110.7.1 1mc0A d.110.2.1 1stzA d.110.2.3 1ojGA d.110.6.1 1j3wB d.110.7.1 1vhmA d.110.2.1 The 1f5mA hit is 20 on T0285.best-scores.rdb---none of the higher hits are annotated as being fold d.110. I wonder it I should manually select some of the d.110 folds and do a run using just them. (Note: try2-opt2 is coming out fairly similar to try1-opt2, so I am encouraged that this is the fold undertaker wants to make.) Mon May 15 20:35:16 PDT 2006 Kevin Karplus I looked for the d.110 folds found in any of the rdb files with grep 'd[.]110[.]' *.rdb | sort -g +2 I then defined MANUAL_TOP_HITS in the Makefile to be the list MANUAL_TOP_HITS := 1f5mA 1p0zA 1skoA 1j3wA 1vetA 1acf 1stzA 1mc0A and ran make extra_alignments make read_alignments (This has to be done in two separate makes, since some of the directories needed for read_alignments don't exist until extra_alignments has created them.) I then created try3.under to read these alignments first (but I still included all-align.a2m). The try3.costfcn includes sheet and helix constraints from the try1-opt2 and try2-opt2 runs---this may be a mistake, since it won't allow other alignments to be tried. I may do another run without the constraints. Tue May 16 07:28:25 PDT 2006 Kevin Karplus The try3 run scores slightly better with the try3 costfcn, gaining on hbonds and sidechains over try1, but losing on packing terms. For try4, I'll use just the favored alignments and no constraints. After that, I'll probably try polishing all existing models (unconstrained). It might be a good idea to reduce the weight on sidechain cost when doing the intial packing--this term is currently the biggest constributor to the differences in cost between models. I did not reduce sidechain cost for try4. Tue May 16 09:36:50 PDT 2006 Kevin Karplus The try4 run did not do quite as well as the try3 run (which scores best with the try1,3, and 4 cost functions). I'll do try5 with a reduced sidechain cost. Then I'll do a polishing run with sidechain back up, and soft_clashes and break costs increased. Tue May 16 11:28:34 PDT 2006 Kevin Karplus The try5 run found a different solution, one that changes the c-terminal helix into a strand and makes two n-terminal helices. This one scores well on hbonds and packing terms, but loses a bit on predicted alpha, on bys, and on sidechains. try5-opt2 scores best on the try1, try4, and try5 cost functions, but not on try2 and try3, mainly because of the constraints. Thu May 25 14:53:02 PDT 2006 Kevin Karplus I'm worried a bit about the strand for the predicted helix R112-E122. Perhaps we need another run with strong helix constraint. Sun May 28 17:17:51 PDT 2006 Kevin Karplus I scored the server models as well as ours with try5 (with the recently fixed undertaker that handles pdb files with the ^M characters at the end). The best-scoring model is SAM_T06_server-TS1. The best-scoring model that isn't ours in ROBETTA_TS5. We do better on hbonds, they do better on sidechains, n_ca_c, and bad_peptide. SAM_T06_server-TS1 is a simple beta sheet with an alpha helix packed on either side. There is a bit of alpha helix near the N-terminus that looks bogus (extend the first strand back to P8). Otherwise the model looks pretty good (if we ignore secondary structure predictions). It probably scores so well because of how much sheet it makes, even though the packing is a bit loose and the secondary structure is not a great fit. Our predictions (from the server and from the tries in decoys/) are not in agreement, and they don't agree with Robetta either. Should we polish up some of the better models? Which models will be worth submitting later on? Wed May 31 14:00:03 PDT 2006 Kevin Karplus I'm starting a "polishing" run (try6 on camano) that will start with all the different models (including server models) and try improving them without constraints. This will probably mainly polish the SAM_T06_server model, but I've turned CrossOver up in the hopes of picking up good things from elsewhere. I think we will have 3 or 4 distinct sheets to submit for this target. Wed May 31 14:32:15 PDT 2006 Kevin Karplus The try6 polishing run is going very slowly, because of the number of models in the conformation pool. It seems like this run is just polishing the SAM_T06_server model (which is ok, since we can then submit it without duplicating a model from a server). We might want to do a polishing run from ROBETTA_TS5 (without all the other server models nor the better-scoring undertaker models), so that we can reasonably submit that also. I'm starting that as try7 on camano. Wed May 31 17:24:48 PDT 2006 Kevin Karplus try7 finished before try6 (smaller conformation pools) and is our new best-scoring model (though try6 may beat it in the end). Foo---try7 is *not* a polishing of the Robetta models, since they failed to be read. It is just recreated from alignment insertion (of 1p0zA) into a random conformation. It looks very similar to try5-opt2. Let me try again for try8, getting the file names right this time. Wed May 31 18:32:37 PDT 2006 Kevin Karplus try8 scores almost as well as try7 (better on the try1 costfcn). I'll submit 5 models for now, with the option of replacing them later: ReadConformPDB T0285.try7-opt2.pdb ReadConformPDB T0285.try8-opt2.pdb ReadConformPDB servers/SAM_T06_server_TS1.pdb ReadConformPDB T0285.try3-opt2.pdb ReadConformPDB T0285.try2-opt1.pdb Thu Jun 1 07:15:20 PDT 2006 Kevin Karplus try6-opt2 (based on the SAM_T06_server model) is the new best scorer, so I'll rearrange the models to ReadConformPDB T0285.try6-opt2.pdb ReadConformPDB T0285.try7-opt2.pdb ReadConformPDB T0285.try8-opt2.pdb ReadConformPDB T0285.try3-opt2.pdb ReadConformPDB T0285.try1-opt2.pdb Thu Jun 1 07:22:10 PDT 2006 Kevin Karplus I've sent this improved list, but try3-opt2 and try1-opt2 are too similar to each other. It would be good to get yet another fold. It would also be good to do a polishing run with breaks and soft-clashes turned up, as we still have a number of conflicts even in the top-scoring models. Thu Jun 22 14:44:53 PDT 2006 Kevin Karplus try9 polishing run started on the farm cluster. It will probably just polish up try8-opt2. We may have to do other runs to polish the other models. Thu Jun 22 16:59:25 PDT 2006 George Shackelford As I take a look at this protein and examine the ehl2, I am convinced we haven't got this right. There is an interesting symmetry in the ehl2 around the midpoint (ok, more like 78, but close). I am going to look about and see if I can find a better match to the secondary structure. Fri Jun 23 13:36:36 PDT 2006 Kevin Karplus try9-opt2 is just a polishing of try8-opt2 (from ROBETTA_TS5), as expected. Fri Jun 23 22:37:51 PDT 2006 George Shackelford I did a search for chains that match well to the best composite ehl2.rdb file. I checked what came up and selected the following: 1z54A 1e7kA 1s28A 1dbfA 1dytA 1xbiA I include these in the manual top hits, and get alignments. I have put them into try10 by themselves, scaled rr.constraints by .2, and started the new try. Just an effort to get something that agrees with ehl2. try10 running on lopez Sat Jun 24 01:53:30 PDT 2006 1z54A was the first and best choice for alignment. The results scored even better than the polished try9. This one is worth polishing some more and getting a better scoring; the 'breaks' in it could be improved. I'm commenting 1z54A out and trying to see what I can get next. try11 running on peep. I ended up stopping try11 before it got to opt2. The results of opt1 were so poor (it was basicall a mangled bunch of helices) thate were scoring badly. These results were based on 1e7kA. So I commented it out and went on to see what would happen next. try12 running on peep. Try12 actually looks decent but it has some bad breaks. It may be useful to do a run that tries to close those gaps and see if we get a usable results. I think I'll start a "polishing" run... Sat Jun 24 08:15:02 PDT 2006 Kevin Karplus With the unconstrained costfcn, the order is now try6, try7, try9, try8, try10, try3, try5, try1 With the try1 costfcn, the order is now try9, try8, try6, try1, try3, try2, try7 With the try12 costfcn, the order is now try10, try9, try1, try3, try8, try2, try4 I'll superimpose some of these and see what looks promising. Sat Jun 24 08:26:29 PDT 2006 Kevin Karplus I looked at try6, try7, try10, try1, try3, try9 I don't care much for try6 and try7---too poor a match to secondary structure prediction. The others look ok, but try1 and try3 are very similar. That leaves us with try1, try10, try9 as our plausible submissions. try6 and try7 are our next best, with try5 too similar to try7 to be a separate submission. Sat Jun 24 14:42:57 PDT 2006 Kevin Karplus George has still got something wrong with his .cshrc file so that his attempts to create grep-best-rosetta don't work. I created it and see that try4 and try10 are the models rosetta best likes, though it really hates all the backbones. Sat Jun 24 18:39:25 PDT 2006 Kevin Karplus I'm starting try14 on cheep with all the alignments (including the ones George added) and a cost function that has the secondary structure constraints but no others. I've also turned up the weight of the hbond_geom_beta and hbond_geom_beta_pair, to favor alignments that form sheets. Sat Jun 24 20:26:52 PDT 2006 Kevin Karplus try14-opt2 got the best score with the try14 cost function, apparently based on 1z54A. It does OK with the unconstrained costfcn, coming after try6, try7, try9. The resulting model is quite similar to the try10 model, but scores a bit better with our cost functions. Sun Jun 25 08:22:25 PDT 2006 Kevin Karplus The other two alignments that try14 seriously considered were to 1v8fA and 2cyeA, so I will try a run with just those two. On second thought, let me toss in some other templates that were considered but not used in other runs: 1tu1A, 1jyhA, 1fjrA from try1 (which ended up with 1f5mA) 1v8fA, 1iq3A from try2 (which ended up with 1f5mA) try3, try4 only used 1f5mA 1stzA from try5 (which ended up with 1p0zA) try6 worked from SAM_T06_server_TS1 model try7 ended up with 1p0zA try8 worked from ROBETTA_TS5 try9 worked from exiting models (mainly try8) try10 worked from 1z54A try11 worked from 1e7kA try12,try13 worked from 1dytA 1v8fa, 2cyeA from try14 (ended up with 1z54A) So the "extras" to try are 1v8fA, 2cyeA, 1tu1A, 1jyhA, 1fjrA, 1iq3A, 1stzA Sun Jun 25 08:36:22 PDT 2006 Kevin Karplus try15 started on cheep. Sun Jun 25 08:42:15 PDT 2006 Kevin Karplus Oops, forgot to update MANUAL_TOP_HITS, make extra_alignments, and make read_alignments. try15 restarted on cheep. Sun Jun 25 09:53:07 PDT 2006 Kevin Karplus RATS, I let in the all-align for try15, so it ended up with 1z54A again, but didn't even score as well as try14-opt2. I'll start try16 without the all-align. Sun Jun 25 09:58:45 PDT 2006 Kevin Karplus try16 started on cheep. Sun Jun 25 10:11:14 PDT 2006 Kevin Karplus OK, try16 is using 2cyeA Sun Jun 25 11:05:05 PDT 2006 Kevin Karplus Indeed, try16 comes out second to try14 with the try16 costfcn. Although they are based on different templates, they are clearly from the same superfamily, and the alignments of the sheets are similar, but not identical. Since this target has a hard deadline tomorrow, and George does not seem to be working on it this weekend, I'll have to make a decision about what to submit. I think I'll drop the one we polished from Robetta---it may be great, but we didn't really create it. That leaves me with try14-opt2 1z54A try16-opt2 2cyeA try1-opt2 1f5mA try6-opt2 SAM_T06_server_TS1 try7-opt2 1p0zA Sun Jun 25 11:23:48 PDT 2006 Kevin Karplus OK, I'm giving up. I've submitted those models. If George (or anyone), does more work on this target, e-mail me so I can resubmit. Sat Jun 24 21:36:52 PDT 2006 George Shackelford try14 looks great but it's wrong when you look at burial. I'm going to force a run using 1s28A. Unlikely except if forms a dimer. Actually I'm doing a run with 1s28A and 1xbiA. At the moment, 1s28A is coming out on top. Try17 running on peep. I did another search for possible ehl2 matches. These are possible so I'm doing a manual top hits and a run consisting of these only. 1h1hA 1quqB 2fm8A 1xs0A I've cranked up constraints to 20 to ensure that we get something that we can like. I consider these to be a further hit than I got with 1z54A, but I wish I could factor in the near-backbone and/or burial prediction as well. Try18 running on shaw. Sun Jun 25 16:22:03 PDT 2006 George Shackelford Try17 didn't do so hot, but try18 looks like a winner. Scores up near try16 and I haven't done a polishing run. Think I'll do that and revisit 1xbiA. Might just do a run alone with that. try19 running on shaw Actually I am going to revisit 1xbiA. I'm going to do a run with it alone. I have no idea if it will get anything new, just trying... try20 running on peep Sun Jun 25 18:34:31 PDT 2006 Damm. I didn't change the tries in the try20.under from try18. try20 has generated try18 files. That not only throws out the try18 results, but may have badly affected the try19 polishing of try18!! Now I'll need to redo try18, try19, and try20. at least I can run try18 and try20 simultaously. So be it! Try20 restarted on peep try18 restarted on shaw Sun Jun 25 22:09:18 PDT 2006 George Shackelford Well, restarting did not seem to work correctly. The restart of try18 resulted in trash. I decided to redo try18 as try21 and try20 as try22. I'll have to inform others that I had this problem. Kevin may know what I should have done and others need to know as well. Because of the restarts, I have not made any progress on finding other ehl2 matches. I also need to change the transition value to log odds. I also should get a second approach based on the actual sequence by reducing the amino acid alphabet. But which reduction?? The eight letter one? (I need reference here). Mon Jun 26 07:42:31 PDT 2006 Kevin Karplus George set me e-mail to look at try19. He has never explained (in README files or elsewhere) *how* he is doing searches based just on ehl information. I can think of a few ways, but I suspect he is doing something completely different. George, write up a couple of paragraphs on it for the main CASP7/README I'm not sure which reduction of the amino-acid alphabet George is thinking of, nor why he believes it will help---the Dirichlet-mixture regularizer already does a good job of generalizing to other amino acids, better than any reduced alphabet would. With the try16 costfcn (with constraints from the 2ry prediction) try19-opt2 scores adequately, after try14, try16, try15, try10. It forms a nice sheet, but the helix packing against the sheet is terrible. Still, we do have a near duplicate in our current set of models, so I can bump one of them to make room for try19. George has still not fixed his .cshrc file so that he can run the sort-by-rosetta script from a makefile, so I'll have to remake grep-best-rosetta. Naturally, rosetta hates all the models, but try16-opt2 is the one it hates least to repack. I'll resubmit ReadConformPDB T0285.try16-opt2.pdb ReadConformPDB T0285.try1-opt2.pdb ReadConformPDB T0285.try6-opt2.pdb # from SAM_T06_server ReadConformPDB T0285.try7-opt2.pdb ReadConformPDB T0285.try19-opt2.pdb Make started Tue Jul 18 09:50:03 PDT 2006 Running on lopez.cse.ucsc.edu Because of a typo, I accidentally redid the make for this target. This created a number of local structure predictions that were not available at the time we did the submission. Because we had no strong hits before, this may change which templates come out on top. Tue Jul 18 22:41:56 PDT 2006 Kevin Karplus NOTE: superimpose-best.under was overwritten by the accidental make. It will have to be re-created if best-models.pdb.gz ever needs to reflect the submitted models.