Thu Jun 8 08:36:23 PDT 2006 T0324 Make started Thu Jun 8 08:37:40 PDT 2006 Running on lopez.cse.ucsc.edu Tue Jun 27 18:05:39 PDT 2006 Martin Madera There's almost perfect agreement on the main domain, but quite a lot of disagreement on the insertion (16-82). Try1-opt2 is really foamy in the insertion, unlike the two top BLAST hits, 2fi1A and 2ah5A (which, incidentally, are very similar). It also changes helix boundaries with respect to the two PDB structures. Looking at best models, I can see what happened: it chose one of the other alignments and tried to complete a gap caused by a deletion with respect to the template, but in doing so it managed to block a conserved crevice. Looking at the PDB structures, this may be a problem, because undertaker will try to fill up the crevice. Here are the HMM alignments are (t06-local-adpstyle1): >T0324 FG7348A, Lactobacillus plantarum, 208 res MTYQALMFDIDGTLTNSQPAYTTVMREVLATYGKPFSPAQAQKTFPMAAE QAMtelGIAASEFDHFQAQYEDVMASHYDQIELYPGITSLFEQLPSE-LR LGIVTSQRRNELEsGMRSYPFMMRMAVTISADDTPKRKPDPLPLLTALEK VNVAPqnALFIGDSVSDEQTAQAANVDFGLAVWGMDPNADHQ-kvahrfq kpldilelfk >2fi1A mkgMKYHDYIWDLGGTLLDNYETSTAAFVETLALYGITQDHDSVYQALKV STPFAIETFAPNLENFLEKYKENEARELEHPILFEGVSDLLEDISNQGGR HFLVSHRNDQVLEILEKTSIAAYFTEVVTSSSGFKRKPNPESMLYLREKY QISSGLVIGDRPIDIEAGQAAGLDTHLFTSIVNLRQVLDI >T0324 FG7348A, Lactobacillus plantarum, 208 res MTYQALMFDIDGTLTNSQPAYTTVMREVLATYGKPFSPAQAQKTFPMAAE QAMTELGIAASEFDHFQAQYEDVMASHYDQIELYPGITSLFEQLPSELRL GIVTSQRRNELESGMRSYPFMMRMAVTISADDTPKRKPDPLPLLTALEKV NVAPQNALFIGDSVSDEQTAQAANVDFGLAVWGMDPNADHQKVAHRFQKP LDILELFK >2ah5A MTsITAIFFDLDGTLVDSSIGIHNAFTYTFKELGVPSPDAKTIRGFMGPP LESSFATCLSKDQISEAVQIYRSYYKAKGIyEAQLFPQIIDLLEELSSSY PLYITTTKDTSTAQDMAKNLEIHHFFDGIYGSSPEAPHK--ADVIHQALQ THQLAPEQAIIIGDTKFDMLGARETGIQKLAITWGFGEQADLlnYQPDYI AHKPLEVLAYFQ ... as far as I can see, there's possibly a small insertion (64-66 in T0324) w.r.t. 2fi1A, and a perfect match for 2ah5A. I'd like to see the alignment of the two PDB structures, 2fi1A and 2ah5A. I will do this using undertaker. Tempted to run the insertion on its own and see what happens. Will also try to force undertaker to stick to the alignments to 2fi1A and 2ah5A. Tue Jun 27 22:30:36 PDT 2006 Martin Madera Did the 2fi1A-2ah5A superposition using the CE webserver at: http://cl.sdsc.edu/ce/ce_align.html and the result is in T0324/align_2fi1A-2ah5A/align.pdb [P.S. Changed to: T0324/align/align_2fi1A-2ah5A.pdb] I only aligned the insertions, so one can see quite clearly the hinge motion between the insertion and the main domain that changes the size of the crevice. For the record, the CE alignment is: 2ah5 18:SSIGIHNAFTYTFKELGVPSPDAKTIRGFXGPPLESSFATCLSKDQISEAVQIYRSYYKA 2fi1 20:NYETSTAAFVETLALYGIT-QDHDSVYQALKVSTPFAIETFAPNLE--NFLEKYKENEAR 2ah5 78:KGIYEA 2fi1 77:ELEHPI The RMSD is 2.1A, sequence identity is 11.1%. Tue Jun 27 22:47:30 PDT 2006 Martin Madera Let's have a look at more BLAST structures, and be more systematic about it. PDB BLAST E region 2fi1A 1.8e-08 A:18-83 2ah5A 1.3e-06 A:17-84 1te2A 2.1e-04 A:19-93 1x42A 4.7e-04 A:14-101 2go7A 0.002 A:15-85 1o08A 0.005 A:1015-1087 I've done all the alignments to 2fi1A and saved them in T0324/align/. Notes: - 1x42A is the baddie that try1-opt2 seems to be based on. But it's unrealistic, because try1-opt2 cuts out one whole helix-loop-helix region that makes the result look very foamy, and the shortcut blocks the crevice. But, interestingly, 1x42A itself does a pretty good job of blocking the crevice. - All others should be OK as templates. Interestingly, 2fi1A seems to be the odd one out: all others show hinge motion with respect to it, although the extent seems to vary. So, try2: increased the weight of hbond to: hbond_geom 20 \ hbond_geom_backbone 50 \ hbond_geom_beta 100 \ hbond_geom_beta_pair 200 \ but so far left breaks the same. Removed dssp-ehl2 and rr constraints. Most importantly, did: cp T0324.t04.undertaker-align.under T0324.edit.t04.undertaker-align.under cp T0324.t06.undertaker-align.under T0324.edit.t06.undertaker-align.under cp T0324.t2k.undertaker-align.under T0324.edit.t2k.undertaker-align.under cp T0324.undertaker-align.under T0324.edit.undertaker-align.under edited the lists to only include: 2fi1A, 2ah5A, 1te2A, 2go7A, 1o08A and edited try2.under to point to these rather than the original files. Try2 started at 23:46pm. Wed Jun 28 04:24:52 PDT 2006 Martin Madera Damn, try2 looks almost the same as try1. Read the CASP7 README file, realized that the bad structure may be coming from all-align.a2m which I didn't edit. So copied it to edit.all-align.a2m and ran perl -i'.bak' -ne 'if(/^>(T0324|2fi1A|2ah5A|1te2A|2go7A|1o08A)/){print; $_=<>; print}' edit.all-align.a2m to remove anything that I wasn't interested in. Try3 started at 4:43am. Wed Jun 28 17:42:09 PDT 2006 Try3-opt2 looks much better than try1 and try2, it's the best model by far. It's more foamy than the templates and there are chain breaks, so some polishing is in order. I will do that in try4. Try4: commented out all TryAllAlign, reading in all the try3 structures, increased break from 50 to 200. Try4 started at 6pm. Wed Jun 28 21:02:07 PDT 2006 Kevin Karplus Rather than editing included files for new tries, it is easier just to copy the contents of the file into the try2.under file. This reduces the proliferation of new files. The main reason for doing includes is to make the initial try1.under file easier to create (and, later on, for read-pdb.under to be different on different runs as more models are added to the decoys/ directory). Wed Jun 28 21:08:36 PDT 2006 Try4 is basically the same as try3: foamy and none of the chain breaks got fixed. I'm more worried about the chain breaks, so try5 with break set to 1000. For the soft deadline, try3 is the best so far; try5 should finish around midnight. Wed Jun 28 21:17:56 PDT 2006 Kevin Karplus I will submit try4-opt2 try2-opt2 try1-opt2 align 1 (from 1ah5A) align 5 (from 1x42A) We can do another preliminary submission in the morning, or we can wait and do the final submission on Friday. Wed Jun 28 21:19:58 PDT 2006 Martin Madera OK. Just noticed -- somehow I forgot to increase breaks to 200 for try4. DAMN, no wonder nothing happened. Will set them to 300 for try5 and will see. Thu Jun 29 00:09:37 PDT 2006 Martin Madera Try5 managed to close 2 of the 3 breaks, which I suppose is a success. It also did a good job of destroying the crevice, and hence, probably shifted the structure away from the correct one. Sigh. Thu Jun 29 01:02:32 PDT 2006 Martin Madera Try6: break set to 1000, to see if it can close all three gaps. Like try4,5 this is an attempt to polish try3. Running on peep. Try7: ... this is a wild experiment! The problem with try1 wasn't that there weren't any good alignments (because there were lots), but that the scoring function made undertaker choose the bad ones -- the ones that fill up the crevice. Well, unless I change the scoring function, the polishing runs will *always* try to fill up the crevice (see try5), which is bad. So try7 is a copy of try1, except I've modified the scoring function from: SetCost wet6.5 15 \ way_back 5 \ dry5 15 \ dry6.5 20 \ dry8 15 \ dry12 5 \ phobic_fit 2 \ to: SetCost wet6.5 5.0 \ way_back 1.6 \ dry5 5.0 \ dry6.5 6.3 \ dry8 5.0 \ dry12 1.6 \ phobic_fit 0.6 \ (i.e. /3). We'll see what if effect (if any) it will have. Thu Jun 29 13:50:32 PDT 2006 Kevin Karplus In try7-opt2, the middle domain is very foamy (an expected consequence of the low weight for packing terms). Even with the try7 costfcn it scores badly, with try5 scoring better. Another way to keep a crevice open is to put in some "props", distance constraints that keep atoms on opposite sides of the crevice apart. For example, one might choose Constraint Y21.OH T104.OG1 7.0 8.3 20.0 1. Constraint Y70.CB A130.CB 14. 15.1 30 1. Constraint F54.CA T104.C 10. 12.5 20 1. I would not be too concerned about getting the hinge angle right, but more about getting the domain on either side of the hinge as clean as possible. Fri Jun 30 11:11:19 PDT 2006 Martin Madera Try6 managed to close the last chain break and looks good, so that's the best model so far. Try7 was a failure: as Kevin noticed, the middle domain is foamy. However, the important point is not that it's foamy but that it's very similar to try1, so my interpretation is that the scoring function still failed to make undertaker pick the "right" structure automatically (it's there in the alignments). The try7 main domain looks pretty much the same as in try1 (possibly a tiny bit more foamy, but hard to tell). Why am I trying to keep the crevice open? I'm not worried about the hinge angle but about the structure of the insertion. If undertaker stopped worrying about the hole and pulling the helices towards it, it could pack them a bit tighter. I like the constraints Kevin suggested. So try8 = try6 + Kevin's constraints. I set the individual constraint weights to 20, and the overall weight to 50. Running on squawk. Ha, just noticed: # Constraint Error: residue specified as F54.CA doesn't match (T0324)T54 # Error: can't parse first atom name The rest of the residues are correct, so I assume that this is just a typo. Fixed, restarted. Another idea: maybe in this case, polishing = bad. We want to keep close to the original structures. So try9 = try3 but with breaks set to 500, to try and get a good struture in one go. Running on peep. Fri Jun 30 13:29:14 PDT 2006 Kevin Karplus The F54 was a typo, but I don't know whether it was the F or the 54 that was right. Setting breaks very high initially is usually a bad idea---everything gets sacrificed to make the chain contiguous. Generally it works better to increase break costs gradually, so that other good stuff is not discarded along the way. One *can* do this in a single run, by changing the costfcn between calls to OptConform, but I have not done that yet. Looking at try6-opt2, neither T54 nor F45 look like great choices for constraints. A constraint between M47 or A49 and S105 might be more to the point. Fri Jun 30 13:49:54 PDT 2006 Kevin Karplus try9 just finished. The break weight was set too high relative to the clashes and other terms, so it doesn't look that great. Martin, you should comment out the PrintTemplateAtoms from runs that don't read in new PDB files for templates---jus tuse the existing Templates.atoms.gz file without creating a new one. Fri Jun 30 17:27:18 PDT 2006 Kevin Karplus With the unconstrained costfcn, the best models are try4-opt2 and try5-opt2. More resent runs have reduced breaks at the cost of almost everything else. I'll do a submission tonight, but I'll also set up one more run with a more reasonable cost fcn. Fri Jun 30 17:54:50 PDT 2006 Kevin Karplus try10 started on cheep. Also submitted try5-opt2, try4-opt2, try2-opt2, try1-opt2, align1 (2ah5A) Fri Jun 30 19:31:53 PDT 2006 Kevin Karplus try10-opt2 is new best score (polishing try5), but it still has a bad break: T0324.try10-opt2.pdb.gz breaks before (T0324)E82 with cost 2.86465 I'll try another polishing run, increasing the break penalty and adding SheetConstraint T15 S17 L83 I81 hbond E82 Fri Jun 30 19:39:44 PDT 2006 Kevin Karplus I just noticed a typo in try10.costfcn, I had hbond_geom_beta set at 500, rather than 50, rather overdoing the optimization of hbonds. I'll do another polishing run on cheep (try11) with the extended sheet constraint SheetConstraint L14 S17 Y84 I81 hbond N16 10 I'll also take out the bogus # SheetConstraint L14 T15 I10 D11 hbond T15 1 Fri Jun 30 19:45:14 PDT 2006 Kevin Karplus try11 started on cheep. Fri Jun 30 21:22:57 PDT 2006 Kevin Karplus try11-opt2 reduced breaks and clashes considerably from try10-opt2, but still has a bad break in the same place as try10-opt2: T0324.try11-opt2.pdb.gz breaks before (T0324)E82 with cost 2.40591 The problem here is not a distance one, but an omega angle that is too far from 180. I could do more polishing, but I'm not sure it's worth it. Martin, you should take over, and let me know what models I should resubmit in the morning. ------------------------------------------------------------ Date: Fri, 30 Jun 2006 20:00:41 -0700 From: "Martin Madera" To: "Kevin Karplus" Subject: Re: T0324 Hmm. Apart from try10, not much else to do. The fundamental problem is that compared to real structures (see e.g. T0324/align/2ah5.pdb, in spacefill and e.g. color by group), all of our models always look more foamy. And that includes even the main domain, where all the alignments agree 100%. It's clear that we have the right structure, but we seem unable to make the final step and pack it real tight. To quantify this accurately: take a sphere half the size of the carbon atom, roll it around the outside and calculate the volume enclosed. I bet our models have a 10-20% larger volume than the real structures. Now I remember Cyrus mentioning that there's a *very* clever algorithm that someone bright came up with back in the 70s that allows one to do this very efficiently. (And if Cyrus says it's "very clever", I think a lot of people were very impressed at the time). Do you know the work? If not, we absolutely must look it up. Well, the good thing is that we can *see* the difference between what real proteins look like and what we produce. I assume that the scoring function picks this up. But does it? (If not, the first step is to accurately calculate the volumes, as above.) Otherwise it's just an optimization problem, right? Moreover one with a big test set: pick two close structures, exclude anything in the PDB that's closer to the true structure than the target, and see how close you can get to the target. And you *know* that you won't be more than a few angstroms away from the starting structure, so the search space is limited. Martin On 6/30/06, Martin Madera wrote: > It's basically done, I'm pretty happy with try6. That said, I've > started two more runs and will probably do another two later on today. > > M. > > On 6/30/06, Kevin Karplus wrote: > > > > We have to send off T0324 this evening, and I haven't seen any notes > > from you on it for about 33 hours. ------------------------------------------------------------ Sat Jul 1 03:41:34 PDT 2006 Kevin Karplus I guess Martin gave up on polishing this target even before I did. He's right, of course, that one of the hardest things is to get a densely packed model. We don't have a volume measurement per se. We have the hydrophobic radius of gyration (which hurts us on T0324, because it wants to close the crevice). We also have the dry5, dry6.5, dry8, and dry12 burial measures, which attempt to get dense packing on a local basis. We *don't* have attractive Van der Waals terms, which we certainly could use to drive denser packing. I'm not sure that a global volume measure would add much to the mix of packing terms, and computing it is not very fast. I started try12 with a few more "prop" constraints to keep the crevice open and with larger break and bad_peptide terms to try to fix the break before E82. I also took out the sheet constraints and helix constraints---the hbond functions should be strong enough now to hold sheets together. Sat Jul 1 04:40:06 PDT 2006 Kevin Karplus I made a "dry" costfcn, which increased the weights of the "dry" packing terms a lot, to see which models were least foamy. I think it would be a terrible function to optimize, but it does let us know where we started adding foaminess to avoid clashes. It turns out that the recent, polished models, are the *least* foamy by that measure. It looks like try12 is going to close the last gap, so I expect to submit try12-opt2 our latest and greatest. try6-opt2 (Martin's previous favorite, reoptimized from try3-opt2.gromacs0) try5-opt2 an early leader, optimized from try3-opt2 try1-opt2 the automatic one align1 from 2ah5A Sat Jul 1 05:08:59 PDT 2006 Kevin Karplus try12-opt2 has removed essentially all the breaks. The worst one left is T0324.try12-opt2.pdb.gz breaks before (T0324)P134 with cost 0.284141 try12-opt2 is now Rosetta's favorite to repack, though it hates them all, because of some bad clashes it can't resolve: other-bump: 1.28587 Ang (T0324)F91.O and (T0324)P95.CD threshold= 2.82818 cost= 0.991166 other-bump: 1.85549 Ang (T0324)L14.CD1 and (T0324)I87.CG2 threshold= 3.02582 cost= 0.946826 other-bump: 1.70681 Ang (T0324)F91.O and (T0324)P95.CG threshold= 2.69748 cost= 0.935826 other-bump: 2.07052 Ang (T0324)V24.CG2 and (T0324)V73.CG1 threshold= 3.08864 cost= 0.909245 other-bump: 1.70343 Ang (T0324)M115.O and (T0324)Y118.O threshold= 2.48656 cost= 0.896642 other-bump: 2.64217 Ang (T0324)F45.CD1 and (T0324)M53.SD threshold= 3.33788 cost= 0.753996 other-bump: 1.96997 Ang (T0324)T2.O and (T0324)Q155.O threshold= 2.48656 cost= 0.752734 other-bump: 2.27791 Ang (T0324)Q42.O and (T0324)P46.CD threshold= 2.82818 cost= 0.726988 other-bump: 2.45542 Ang (T0324)F91.C and (T0324)P95.CD threshold= 3.04681 cost= 0.726045 other-bump: 2.73139 Ang (T0324)R108.CA and (T0324)S129.CB threshold= 3.17496 cost= 0.594611 other-bump: 2.66748 Ang (T0324)L143.CD2 and (T0324)A173.CB threshold= 3.07686 cost= 0.575423 other-bump: 2.7433 Ang (T0324)L179.CD2 and (T0324)V193.CG1 threshold= 3.13291 cost= 0.549232 other-bump: 2.79903 Ang (T0324)F91.C and (T0324)P95.CG threshold= 3.1728 cost= 0.528601 gromacs eliminates most of the clashes, but at rather high cost in other terms, like putting in bad breaks and messing up sidechains. Since undertaker seems to be better at fixing sidechains and closing gaps than relieving clashes, I wonder if it would be worth optimizing from the gromacs models, to get tighter packing with fewer clashes. Sat Jul 1 05:30:27 PDT 2006 Kevin Karplus I submitted try12-opt2 our latest and greatest. try6-opt2 (Martin's previous favorite, reoptimized from try3-opt2.gromacs0) try5-opt2 an early leader, optimized from try3-opt2 try1-opt2 the automatic one align1 from 2ah5A but I am doing a try13 run from all the gromacs0 models, to see if undertaker can fix them up. Sat Jul 1 07:10:45 PDT 2006 Kevin Karplus try13 has a new best score, though there were clashes reintroduced in undertaker's attempts to fix gromacs damage. For conformation number 23 T0324.try13-opt2.pdb.gz: T0324 has 36 clashes, 0 disulphide bonds, and 168 hydrogen bonds in 208 residues other-bump: 1.73756 Ang (T0324)F91.O and (T0324)P95.CD threshold= 2.82818 cost= 0.946224 other-bump: 1.86443 Ang (T0324)M115.O and (T0324)Y118.O threshold= 2.48656 cost= 0.822299 other-bump: 2.26051 Ang (T0324)F91.O and (T0324)P95.CG threshold= 2.69748 cost= 0.653674 other-bump: 2.12683 Ang (T0324)T2.O and (T0324)Q155.O threshold= 2.48656 cost= 0.608435 other-bump: 2.40238 Ang (T0324)L179.CD2 and (T0324)H190.O threshold= 2.79602 cost= 0.597647 Sat Jul 1 07:25:07 PDT 2006 Kevin Karplus I resubmitted, replacing try12-opt2 by try13-opt2. I don't really know whether it is any better---they look quite similar except for a couple of sidechains. Date: Sun, 2 Jul 2006 03:46:39 -0700 From: "Martin Madera" To: "Kevin Karplus" Subject: Re: packing density Cc: rph, ggshack, bort, thiltgen, jsanborn I've gone back to check what they mean by 'well packed': it's the proportion of atoms in the PDB file that are buried according to their criteria. Not what I wanted! So I looked at their raw outputs. And it turns out that the T0324/try6 core (i.e. the buried residues) is smaller (5500 A^3) but tighter (90% of reference volume) than the 2ah5A core (8200 A^3, 96% of reference volume). Both proteins have the same length (210 a.a.) and the alignment contains only two insertions and two deletions. The reference volume is calculated by summing the reference volumes for all atoms in the core, and the reference volume for an atom is the average Voronoi volume for that particular atom type (e.g. THR/CG2) in their set of high-resolution protein structures. However, their assignment of which atoms are buried is a bit strange, because no atoms have a zero accessible surface area (which is clearly wrong). So they're using some approximation. So I went to a random web server that calculates ASA to check their numbers: http://www.scsb.utmb.edu/cgi-bin/get_a.cgi 2ah5: 723 out of 1620 atoms buried = 44.6% try6: 650 out of 1631 atoms buried = 39.9% where buried = zero ASA with a 1.4A spherical probe. Hmmmm, nothing like what they claim. But let's try a smaller probe, 1.0A: 2ah5: 449 ~ 27.7% try6: 302 ~ 18.5% which is the other extreme. So whatever they're doing, it's roughly equivalent to a probe of about 1.2A. M. > Mark Gerstein's group at Yale has a server that does the Voronoi calculations: > > http://www.molmovdb.org/cgi-bin/voronoi.cgi > > Just out of interest, I looked at our predictions for T0324 and some > actual PDB structures. > > The four BLAST hits with E-value better than 0.001 are: > > 2fi1A: 32.9% > 2ah5A: 35.4% > 1te2A: 30.8% > 1x42A: 31.7% > > ... the numbers indicate how "well packed" the structures are using > standard settings on the server. I have no idea what the definition of > "well packed" is. To do this I had to strip the PDB files of all > HETATMs (mostly water molecules), which significantly affect the > result (the numbers go up to 50%), and also restrict them to one > chain. > > Now let's compare to the models you submitted: > > try13-opt2: 26.5% > try12-opt2: 26.3% > try6-opt2: 27.7% > try5-opt2: 27.0% > try1-opt2: 27.8% > > I think the numbers speak for themselves! > > (I'm surprised try1 is so high. I guess if you look at the structure > it's actually pretty well packed, it's just that the connecting helix > in the insertion is basically exposed on all sides. I suspect that > means they ignore it in the calculation.) > Fri Mar 23 17:31:07 PDT 2007 Kevin Karplus Domain 1: We actually did pretty well on this structure: our best model try12-opt2.repack-nonPC (beaten by FOLDpro_TS1, 3Dpro_TS1, Zhang-Server_TS2) our best submitted: model1=try13-opt2. I should have stuck with try12-opt2, though the difference is small. SAM_T06_server did not do well. Domain 2: S17-I81. Not so good here. our best model try6-opt2.repack-nonPC was beaten by lots of servers (not even in the top 1/3). our best submitted was model2=try6-opt2 The align1 alignment was much better than our best complete model---we should have kept it! Perhaps it was not very compatible with our best model for the main domain??