Thu Jul 6 09:10:35 PDT 2006 T0361 Make started Thu Jul 6 09:11:43 PDT 2006 Running on lopez.cse.ucsc.edu Fri Jul 7 23:52:01 PDT 2006 Kevin Karplus No strong hits with BLAST in PDB (best is 1z9hA, E-value 0.6) No good hits with HMMs either. Probably a new fold (169 residues). Make started Tue Jul 18 16:30:19 PDT 2006 Running on shaw.cse.ucsc.edu Thu Jul 20 14:13:57 PDT 2006 Kevin Karplus This is a new fold and no one has started work on it yet! Top scoring server model is SAM_T06_server_TS1, but ROBETTA_TS4 and ROBETTA_TS2 score highly. Thu Jul 20 16:21:11 PDT 2006 Firas Khatib this is the next target on my list... I'll do some runs after I submit T354 and T356 this evening Thu Jul 20 18:20:57 PDT 2006 Firas Khatib I think that SAM_T06_server_TS1.pdb and try1 have the best burial, compared to both Robetta4 and Robetta2, so I will start runs from those models. try2 and try3 are both running on shaw with str2 constraints and no rr cons. Fri Jul 21 14:42:41 PDT 2006 Firas Khatib try3 scores best (using both try2 & try3's costfcns) try2 scores best using the unconstrained costfcn neither try2 or try3 is packing as well as I would like it, so I will increase the dry weights try4 is running on whidbey I increased the dry weights to: dry5 20 dry6.5 25 dry8 20 dry12 6 and lowered the constraints from 20 to 10. try5 will be the same, but with try2 as input. try5 is also running on whidbey Sun Jul 23 22:04:37 PDT 2006 George Shackelford I reviewed what we have available to us. Looking over t2k.str2 and t06.str2 I find strong agreement on most of the helices. This is an all-alpha and the question is: how is it packed? Do we have a sandwich or a bundle? So far, the indications are for a bundle, but the models we have have bad phobic scores. The key is not trying to push the wet and dry weights but to determine the actual bundle. We need to examine the burial and near predictions to determine which helices are on the outside and which way each helix faces. The fact that the helices appear to be of different lengths can be useful; there should be patterns of solvent exposure. We can also examine the list of best predictions to see if we can determine a set of good templates. Sun Jul 23 23:11:35 PDT 2006 Firas Khatib great... let's work on this tomorrow (monday) in lab. We should look at the different servers as well to see if we like any of them as starting points. Robetta has 5 models based off 1p2yA and 10 de novo models that was can look at. Mon Jul 24 13:15:36 PDT 2006 Firas Khatib I downloaded all 15 Robetta models for T361 (since they pick 5 of those 15 to submit and therefore we don't know which one is which in the decoys/servers) I renamed them all .repack.pdb.gz to see how they score with best-rosetta. The best scoring (of our models and Robetta de_novo models) goes: Robetta_deNovo6 try1-opt2.gromacs0.repack-nonPC Robetta_deNovo4 Robetta_deNovo2 Robetta_deNovo5 Robetta_deNovo9 Robetta_deNovo8 Robetta_deNovo1 whereas our score-all.unconstrained has this order: all our models then: Robetta_deNovo5 Robetta_deNovo9 Robetta_deNovo10 Robetta_deNovo4 Robetta_deNovo2 with all the Robetta_parent models scoring as the 5 worst. Mon Jul 24 13:07:07 PDT 2006 George Shackelford I reviewed the rr.constraints and found they suffered from a interesting phenomena: the pairs were not close together but at different parts of a helix. This suggests that we get a correlation signal between two residues within a helix that might really be a correlation but one not due to being close together. This phenomena may be the reason that contact prediction does poorly on all-alpha proteins. If I could eliminate those false contacts, the results may be much better. This means getting the str2 or ehl2 predictions for helices and seeing if a predicted pair is on the same predicted helix. Looking for the templates for our tries: try1: al7+all-align.a2m:1po5A try2: polishing SAM_T06_server_TS1 try3: polishing try1 try4: polishing try3 try5: polishing try2 Polishing is not going to get the job done. I am hand curating the rr.constraints to remove suspect pairs and use the others. Mon Jul 24 14:25:48 PDT 2006 Firas Khatib I started looking at the Robetta de novo models to get some ideas. Just trying to find models with decent burial has been tough, so just by looking at both the 'near' script and 'burial': Robetta_deNovo4 has good burial (both with near and burial) Robetta_deNovo2 is pretty decent as well. denovo5 and denovo9 aren't as good using 'burial' denovo1 is good using burial, but not using near. de novo10 might be ok denovo8 is horrible, Mon Jul 24 14:34:15 PDT 2006 George Shackelford I took the curated rr.constraints, scaled by 0.1 and put them in. Here are the constraints I used: include T0361.t06.dssp-ehl2.constraints # include T0361.undertaker-align.sheets # include rr.constraints Constraint L71.CB L98.CB -10. 7.0 14.0 0.0572100036292 # Constraint M23.CB I34.CB -10. 7.0 14.0 0.402522770318 bonus # Constraint M23.CB F37.CB -10. 7.0 14.0 0.358971475763 bonus # Constraint I88.CB L98.CB -10. 7.0 14.0 0.323147978284 bonus Constraint I34.CB L73.CB -10. 7.0 14.0 0.0317241561247 bonus Constraint L98.CB V119.CB -10. 7.0 14.0 0.0312091148191 bonus # Constraint H27.CB F37.CB -10. 7.0 14.0 0.31188292182 bonus Constraint L73.CB Y94.CB -10. 7.0 14.0 0.0311364616568 bonus try6.under is simply try1.under (oops. forgot to comment out the PrintTemplateAtoms!) try6 running on peep (it seems to be based on 1v4eA rather than 1p05A.) I am going to build an *.under that focuses on the best scores templates only. We need a bit of different structures across the possible templates. Mon Jul 24 15:18:55 PDT 2006 Firas Khatib George suggested taking the model with the best phobic_fit and going from there. That is ROBETTA_deNovo5.pdb, so I will start a run from that one, trying to optimize the burial and phobic_fit. this will be try 8 running on shaw. Mon Jul 24 16:30:50 PDT 2006 George Shackelford Using phobic fit is preferable to wet and dry when dealing with all helix. Wet and dry will tend to form a globe and helices tend to from bundles or more complex structures. Nevertheless phobic_fit is still rather primitive compared to using 'burial' and/or 'near.' We need cost functions based on them. I'm trying to explore the base of templates we have since I am not convinced about try1 even if it scores well in best-rosetta. It appears try7 is based on 2gyqA. Mon Jul 24 19:34:10 PDT 2006 Kevin Karplus George, you have phobic_fit and wet/dry effects reversed. It is phobic_fit that tries to make things egg shaped. wet tries to solvate things, and dry tries to bury hydrophobics. If you have long helices and expect an unusual aspect ratio to the protein (as in a bundle) then you turn *down* phobic_fit and turn up the dry6.5 and dry8 terms. Tue Jul 25 01:19:31 PDT 2006 George Shackelford KEVIN HAS INTERNET! And I had phobic_fit and wet/dry reversed. arrrgh. Try7 does well on best-rosetta but only average on try7.costfcn. There are some interesting topologies out there, but the near and burial predictions would preclude those where helices are pointing out into water (assuming we don't have one of those dimers that use long pointy helices...). The description of the putative protein doesn't provide anything to indicate possible dimerization; we're on our own. Well I have commented out 2gyqA so to try yet another possible template via try9. try9 running on peep. I can already see it is focusing on 1ll2A and seems to have a poor starting score. I need to see if I can increase the number of usable predictions from rr.constraints. And I wonder if Firas could try a VAST search based on his ROBETTA choice. Tue Jul 25 13:46:47 PDT 2006 Firas Khatib so our try8-opt2, based on Robetta's deNovo5, is the best scoring model using the unconstrained cost function (as well as scoring best with the try8 costfcn). It scores much better than Rosetta's de novo model, due to less soft_clashes and better dry scores. I did a VAST search on try8-opt2 as George suggested. VAST results can be viewed here: www.ncbi.nlm.nih.gov/Structure/vast/VSNbr.cgi?reqid=865092301353863567& subsetstr=Non&sdid=29898 or here is easier: http://tinyurl.com/oyso8 In this case, I don't think that VAST is very useful, as you can see. Since it is new fold, and we are only aligning 20 residue segments, I don't think we can deduce much from it! Basically, every section that VAST found something is where we had a strong secondary structure prediction for helix. The exception is I49-A76, but that alignment is not continous in any of the three matches. Tue Jul 25 15:57:43 PDT 2006 George Shackelford The idea of the VAST hits is to provide us with new templates. We take the hits we found and put them in as MANUAL_TOP_HITS (assuming they aren't already amoung our set of templates!). We use make extra_alignments to generate some new alignments that undertaker may find useful. If all we are doing is polishing a ROBETTA model, then this is not really necessary, however the new templates may be useful if we are trying for our own ab initio prediction. Three hits is not a lot but at least it indicates that the ROBETTA model may not be total trash. No hits is a real indication of total trash. Not that our models are any better... Wed Jul 26 08:19:37 PDT 2006 Kevin Karplus Actually, no hits doesn't mean total trash, just that there are no fold-recognition matches. A genuine new fold might get no hits with VAST, even i we got it right. It looks to me like the lowest P-value VAST hit (to 1z7mB) is a fairly large alignment. We have 1z7mA as the chain id for this sequence. I stored the alignment as 1z7mA/try8-1z7mA-vast.a2m I added 1z7mA to the MANUAL_TOP_HITS and am running make extra_alignments. Wed Jul 26 08:41:25 PDT 2006 Kevin Karplus I think several of our models that Firas put into superimpose-best.under ReadConformPDB T0361.try8-opt2.pdb.gz ReadConformPDB T0361.try5-opt2.pdb.gz ReadConformPDB T0361.try2-opt2.pdb.gz ReadConformPDB T0361.try6-opt2.pdb.gz ReadConformPDB T0361.try4-opt2.pdb.gz are reasonable, but I don't like Y75-D81 in try6-opt2. I'll try an optimization run from the 1z7mA alignments, but I don't expect much to come of it. I see that George started several runs last night that he has not commented on yet. I wonder what they are doing. Wed Jul 26 09:00:45 PDT 2006 Kevin Karplus I found a typo in the try8-1z7mA-vast.a2m file and so am restarting try12. Wed Jul 26 09:31:06 PDT 2006 Kevin Karplus I'm singularly unimpressed with try12-opt1, so I don't think that the 1z7mA alignments by themselves are going anywhere. Quite frankly, I'm not sure that chasing down more an more remote fold-recognition targets has much value (particularly for all-helical proteins)---we should be looking more at the burial and secondary structure and trying to pack things cleanly. Wed Jul 26 10:08:41 PDT 2006 Kevin Karplus Sure enough, try12-opt2 looks terrible. Wed Jul 26 10:44:22 PDT 2006 George Shackelford I apologize for the missing notes. I thought they had been saved from last night but they apparently got lost. I am reconstructing them now. Kevin, please look at T0304.try34-opt2.pdb, its parent 1ew0A, and the solution 2h28A. The parent is the top scoring using "alphabetmatch." I believe that there is merit in examining a range of possible templates. Doing so takes some extra computer time but not much human time. [Wed Jul 26 20:21:17 PDT 2006 Kevin Karplus George why are you asking me to look at T0304 in the T0361 README file? This is very confusing. I'm not even sure what George's point is, because T0304.try34-opt2 was rather terrible (not even the best we generated and far worse than the server models we looked at). ] Wed Jul 26 16:22:44 PDT 2006 Firas Khatib I agree that try6 looks very bad at Y75-D81, in fact the burial for residues M80,L13, and I74 are all very bad and L10-Q16 is not packing very well anyway, so that region should probably lie on top of Y75-D81 rather than underneath it! I don't know if it's worth proteinshopping, or selecting a different model instead (which is probably the best answer at this point) Wed Jul 26 18:05:47 PDT 2006 George Shackelford The current superimpose appears to do the job. I can't really add to it. Wed Jul 26 20:34:09 PDT 2006 Kevin Karplus George left the following in superimpose-best.under ReadConformPDB T0361.try8-opt2.pdb.gz ReadConformPDB T0361.try5-opt2.pdb.gz ReadConformPDB T0361.try2-opt2.pdb.gz ReadConformPDB T0361.try6-opt2.pdb.gz ReadConformPDB T0361.try4-opt2.pdb.gz I modified the superimpose to start with the CB atoms of M80-D87 to try to ge a clearer view of what the different models are, but those residues were poorly chosen. Wed Jul 26 20:44:57 PDT 2006 Kevin Karplus Trying again with N118-T123 CB atoms. try5-opt2 and try2-opt2 seem almost identical. Surely we have more variety available than that. Rosetta likes best (other than Robetta models) decoys/T0361.try1-opt2.gromacs0.repack-nonPC.pdb Unconstrained likes best: try8-opt2, try5-opt1, try5-opt2, try2-opt2, try6-opt2, try4-opt2, I'll drop try2-opt2 and put in T0361.try1-opt2.gromacs0.repack-nonPC in its place. Wed Jul 26 21:09:51 PDT 2006 Kevin Karplus I have done a submission, but the 5th model is rather similar to the 2nd model, and I'd be glad of a replacement for it. Comment with submission: We got strong helix predictions, but no good templates were found. We made some attempt to get the helices to bundle or pack in a reasonable way, but none of our results were very convincing. We selected a few different ones Model 1 is try8-opt2, polished by undertaker starting with ROBETTA_deNovo5 from the robetta server. It was the best scoring with the unconstrained cost function, though we are sad that we did not generate the fold ourselves. Model 2 is try1-opt2.gromacs0.repack-nonPC, which is the automatically generated model, reoptimized by gromacs to close small gaps, then with sidechains repacked by rosetta. It is rosetta's favorite of the backbones it repacked (though some of the robetta server models score better). Model 3 is try5-opt2, polished from try2-opt2, polished from our SAM_T06_server_TS1. Model 4 is try6-opt2, optimized by undertaker from alignments (last alignment added from 1v4eA). Model 5 is try4-opt2, polished from try3-opt2, polished from try1-opt2. So this model is rather similar to Model 2. ------------------------------------------------------------ Wed Jul 26 21:16:47 PDT 2006 Kevin Karplus Firas sent me e-mail: I didn't want to step over you on the README. I thing we should get rid of try6, and replace it with a try based off 2gyqA. That would be try7 or try9. I think try9 is better, but a few things with it are odd. looking at it with "near" there is one residue that sticks out to me: I74 seems like it should be buried, so maybe rotating the helix would be good. But if you look at it with the 'burial' script, I74 is purple! Now I know that the scale off 'burial' and 'near' are very different, but purple is one extreme regardless of which script it is and I expect an Ile to be buried. Anyway, maybe this is too much nit-picking (especially for a new fold target) but basically I think that replacing try6 with try9-opt2 would be better (and I can try to rerun try9 to make it better, if that would be useful) Wed Jul 26 21:17:30 PDT 2006 Kevin Karplus Firas, I turn this over to you. I'd drop try4-opt2 (which is too similar to try1 anyway) before dropping try6-opt2. Wed Jul 26 21:37:57 PDT 2006 Firas Khatib Ok Kevin, I'll see if I can improve on try9. try13 is running on shaw using try9 as input try14 is also running on shaw using try9 as input as well as its helix constraints Thu Jul 27 00:54:54 PDT 2006 George Shackelford Those missing notes on try10 and try11. For try10 I am trying remote fold-recognition using 'alphabetmatch' # program: alphabetmatch # George Shackelford # # Target: T0361 # length: 169 # length range: 157 to 185 # alphabets used: # ehl2 near # gap start .1, extend .4 # scoring method: logo # id score per residue 5S 10N 10N 1fewA 160.281 0.94841 1.20.58.70-173 1dovA 154.379 0.913486 ,1.10.287.360-61,1.20.120.230-120 1ad6 151.909 0.898869 1.10.472.10-185 1g73A 149.743 0.886055 1.20.58.70-157 1fpoA 149.665 0.885593 ,1.10.287.110-79,1.20.1280.20-92 1huw 148.985 0.881566 1.20.1250.10-166 2sas 148.542 0.878946 1.10.238.10-185 1aep 148.019 0.875854 1.20.120.20-153 1kfuS 147.988 0.875667 1.10.238.10-184 1gu9A 147.711 0.874028 1.20.1290.10-170 try10 ended up using 1gu9A as its parent. It scores decently in unconstrained and best-rosetta, but it is weak in phobic_fit and breaks. Try10 is a five helix bundle with some foaminess and some burial residue exposure. It's half decent. I ran a try11 that had 1gu9A commented out so to find the next best of the remotes. It finished with 1fpoA as the parent template. In unconstrained it does better in breaks, soft clashes, and phobic_fit than try10, but falters because of dry scores. It is 10 points behind the leader. Try11 is also a five helix bundle although there is one bad break and one oddly formed helix. It is more foamy than try10 and has some more exposure problems. It is not clear how much either try10 or try11 would benefit from any reruns that try to compact them. Their initial scores are not as good as either try1 or try2. Thu Jul 27 04:07:28 PDT 2006 Kevin Karplus George also wrote (in e-mail but not in the README file) > Frankly the whole I74-L77 stretch bothers me. Makes me think of > criss-crossing helices. or -- > > As for try6, I just wish we could move I74-W83 beneath the S21-S28 > helical section. Now THAT would bury the I74-L77 sequence. Thu Jul 27 10:37:41 PDT 2006 Firas Khatib well, try13 and try14 are certainly scoring better than try9 (which is where they started from) One of these will probably be good to replace try4 Thu Jul 27 16:52:11 PDT 2006 Firas Khatib I like try14's terminal helix better than try13's because the predictions seem to have this as one long helix with no kink or break. Thu Jul 27 22:09:44 PDT 2006 George Shackelford I am going to see if these ferritin templates could make a better fit for the sequence. The following four are in the top ten remote hits: 2fha 442.47 2.61817 1.20.1260.10-172 1aew 439.711 2.60184 1.20.1260.10-170 1rcd 439.098 2.59822 1.20.1260.10-171 1dpsA 421.854 2.49618 1.20.1260.10-159 try15 running on peep. Fri Jul 28 12:10:11 PDT 2006 George Shackelford Try15 refused to look like ferritin. I'm going to force a ferritin alignment and see if we can get that. (I wish I could save the auto-fill-mode from session to session.) Granted it is a bit far-fetched since the target is a putative transcriptional regulator in the bacteria Shigella flexneri. Try16 running on vashon. Damm. After running a while, I fould that I had not changed the try15's to try16's in the try16.under file. I've stopped the process, and I'm going to clear the two log files and restart. It has already written over the try15-opt1.pdb file, but the try15-opt2 is ok. try16 restarted on vashon Fri Jul 28 15:11:41 PDT 2006 George Shackelford Try16 score reasonably well. It matches the consensus ehl2 perfectly. It has some burial problems which suggests that part of it is not folded as well as it could be (a few exposed deep burials do have some chance of being close to each other.) I am going to try the ferritins that are longer than the one used as a template and see if we can't get that resolved. Try17 running on cheep. Fri Jul 28 17:28:37 PDT 2006 Kevin Karplus Firas replied to George: > Date: Thu, 27 Jul 2006 08:50:56 -0700 > From: "Firas Khatib" > To: "George Shackelford" > Subject: Re: T361 readme > > If you want to try that with Proteinshop, George, that would be great. > > Proteinshop is the one thing that I cannot do from Seattle! > > If you need help, you can ask Grant to show you, but it shouldn't be > too tough to move I74-W83 under S21-S28 in try6. It would be nice to have these discussions in the README file, without my having to pull them out of my e-mail stream. Fri Jul 28 18:21:21 PDT 2006 George Shackelford Have been struggling with ProteinShop - wish you (Firas) were here. Try17 was awful. Try16 does score quite well in grep-best-rosetta. I'd like to see it as an alternative in place of the current Model 5 (which is already polished as Model 3). The deadline is tomorrow - who is taking care of it? I have made a "T0361.try6-opt2.ps.pdb.gz file in the 'George' directory. I have no idea what will come out of this. I have simply using try6's *.costfcn to "polish" this file. If this works, we'll have a better version of try6. Try18 running on peep. Fri Jul 28 20:21:07 PDT 2006 George Shackelford Well, try18 scores better than try6, but it managed to 'ghost' back through the section I74-W83!!! Undertaker is so neat. I didn't know it could do that. [Fri Jul 28 21:01:32 PDT 2006 Kevin Karplus What does "'ghost' through the section" mean? ] So we can replace try6 with try18. I just wish I could get it to do what I want. I think I could if I put a constraint along I74-W83 to hold it down to the helices. But it's aftre 8pm. I assume that is past the deadline - or is it? Firas? Fri Jul 28 21:00:02 PDT 2006 Kevin Karplus I will try to get up early in the morning to do the submission, if you can get the superimpose-best.under and T0361.method files edited appropriately. All complete models should be left in the decoys directory (even ugly proteinshopped ones), so that we can evaluate them when the real structure becomes available. Models hidden away in other directories will not be evaluated. Fri Jul 28 22:25:12 PDT 2006 George Shackelford Below are updated comments based on the early submission Comment with submission: We got strong helix predictions, but no good templates were found. We made some attempt to get the helices to bundle or pack in a reasonable way, but none of our results were very convincing. We selected a few different ones Model 1 is try8-opt2, polished by undertaker starting with ROBETTA_deNovo5 from the robetta server. It was the best scoring with the unconstrained cost function, though we are sad that we did not generate the fold ourselves. Model 2 is try1-opt2.gromacs0.repack-nonPC, which is the automatically generated model, reoptimized by gromacs to close small gaps, then with sidechains repacked by rosetta. It is rosetta's favorite of the backbones it repacked (though some of the robetta server models score better). Model 3 is try5-opt2, polished from try2-opt2, polished from our SAM_T06_server_TS1. Model 4 is try18-opt2, an effort to modify try6 by using ProteinShop. Try6 was optimized by undertaker from alignments (last alignment added from 1v4eA). Model 5 is try14-opt2, polished from try9-opt2, etc, from try1-opt2. So this model is rather similar to Model 2. ------------- I moved the pdb in George to the decoys, and removed the George directory. I tried to force some conformance to ferritins in try19 and try20 by eliminating optional fragments. Both ended up as crap. Just ignore them. I still wish we didn't have a polished ROBETTA model as an entry... Sat Jul 29 05:54:16 PDT 2006 Kevin Karplus I submitted this morning, with the following corrected comment in T0361.method: We got strong helix predictions, but no good templates were found. We made some attempt to get the helices to bundle or pack in a reasonable way, but none of our results were very convincing. We selected a few different ones Model 1 is try8-opt2, polished by undertaker starting with ROBETTA_deNovo5 from the robetta server. It was the best scoring with the unconstrained cost function. Model 2 is try1-opt2.gromacs0.repack-nonPC, which is the automatically generated model, reoptimized by gromacs to close small gaps, then with sidechains repacked by rosetta. It is rosetta's favorite of the backbones it repacked (though some of the robetta server models score better). Model 3 is try5-opt2, polished from try2-opt2, polished from our SAM_T06_server_TS1. Model 4 is try18-opt2, which was optimized by undertaker from a model modified from try6 using ProteinShop, in an attempt to move a badly placed loop (I74-W83) which appears to be on the wrong side of the N-terminal helix. The N-terminal helix was rather badly mangled in the attempt to use Proteinshop, so undertaker fixed things by restoring the helix and loop pretty much to where they were in try6. Try6 was optimized by undertaker from alignments (last alignment added from 1v4eA). Model 5 is try14-opt2, polished from try9-opt2, which was created by undertaker from alignments (last alignment was to 1ll2A). We are not real fond of models 2-4, as the strongly predicted turn near G62 is in mid-helix. Model 5 avoids that problem but has other problems. with bent helices and a non-compact structure. We regret that the best we have to offer on this target is a slight polishing of a Robetta model.