Tue Jun 6 09:11:51 PDT 2006 T0318 Make started Tue Jun 6 09:12:15 PDT 2006 Running on orcas.cse.ucsc.edu Tue Jun 6 09:14:33 PDT 2006 Kevin Karplus There is a good blast hit for at least one domain: # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score T0318 2ewbA 33.33 291 180 7 175 459 169 451 1.8e-30 129.0 T0318 1lam 33.33 291 180 7 175 459 169 451 1.8e-30 129.0 T0318 1bllE 33.33 291 180 7 175 459 170 452 1.8e-30 129.0 T0318 1gytA 32.23 301 192 8 166 462 180 472 2.2e-28 122.1 but the protein is long (491 residues) so is most likely a multi-domain protein. The other domain may require new-fold methods. Tue Jun 6 11:15:08 PDT 2006 Kevin Karplus With the t06 HMMs, 1lam is coming up as an excellent hit (2ewbA and 1gytA are not in any of the template libraries, and 1bllE is only in the t2k library). Wed Jun 7 07:32:43 PDT 2006 Kevin Karplus The SAM_T06_server model was penalized for too many gaps and clashes. Let's make sure that we clean it up for the hand submission! Sun Jun 25 08:44:41 PDT 2006 Kevin Karplus Nothing done on this since the automatic run! Best scoring server models with the unconstrained costfcn are SAM_T06_server_TS1, Pmodeller6_TS2, ROBETTA_TS5, Pmodeller6_TS4, ROBETTA_TS3, ... Let's put some of them into the "best-models" file and see if there is any disagreement among the best models. (The other servers have fewer clashes and breaks than us.) Sun Jun 25 08:51:44 PDT 2006 Kevin Karplus There is excellent agreement over the C-terminal domain, but the N-terminal domain has varying alignments. I should probably do a subdomain prediction for M1-I177, then use the constraints from that prediction to guide the alignment of the whole protein. M1-I177 make started on lopez. Sun Jun 25 09:44:29 PDT 2006 Kevin Karplus M1-I177 does not get any great BLAST hits (best is 2b3yA at e-value 0.84) The HMMs are all over the map also. Sun Jun 25 09:49:50 PDT 2006 Kevin Karplus The doc.html tells us this is dimeric in solution but hexameric in crystal (we should build dimers first, then try to make a hexamer out of three of them). We also know there are no SS bonds and that the metal-binding site is occupied by water. Sun Jun 25 17:15:06 PDT 2006 Kevin Karplus M1-I177 does not have any good fold-recognition hits. The highest scoring is 1kmqA (Evalue 12.4) from fold c.37.1.8, but the next best is 1tqyB (E-value 21) from fold c.95.1.1. M1-I177 try1-opt2 is built from 1j5sA, which is c.1.9.8, but it doesn't look like a TIM barrel, so perhaps it really only got a tiny bit from that model, and more came from 1b1cA (c.23.5.2) or 1v7pC (c.62.1.1) The strongest RR constraint in M1-I177 is between I75 and V122, which are paired on adjacent strands. We should probably make a chimera of M1-I177 try1-opt2 and try1-opt2, to see if this looks any better than try1-opt2 alone. Mon Jun 26 17:11:06 PDT 2006 Martin Madera Making the M1-I177 try1-opt2 / whole try1-opt2 chimera according to the CASP7 README file. Will try superposing the two PDB files on the final N157-I177 helix. The undertaker script is superimpose-martin.under in decoys/, added the following atom entries: atom N157.CA atom E158.CA atom D159.CA atom A160.CA atom A174.CA atom R175.CA atom L176.CA atom I177.CA The superposition is in superpose.pdb, the emacs-edited chimera in chimera.pdb. There's a small chain break. Running try2 to polish the chimera... Keep getting error messages like: try2.log:# Constraint Error: residue specified as A234.CB doesn't match (T0318)V234 try2.log:# Constraint Error: residue specified as A234.CB doesn't match (T0318)V234 try2.log:# Constraint Error: residue specified as A234.CB doesn't match (T0318)V234 grepping for 'A234' gives rr.constaints files as the only ones that ever mention A234 (my chimera.pdb certainly doesn't!). Removed rr.constaints from the try2.costfcn, now fine. Last attempt at try2 started at 18:55. 8pm: still running, sorriieee... Mon Jun 26 20:42:58 PDT 2006 Kevin Karplus I mode the bad residue-residue files to bad-constraints/ and remade the constraints with make contact_prediction as this error was due to a bug that George had already fixed. I also moved the superpose* files out of decoys, since they are not single-model files like the other pdb files. I also removed the backup chimera.pdb~ since it confuses the script that creates read-pdb.under I remade decoys/score-all.try1.pretty and decoys/score-all.try2.pretty, and the rr constraints no longer cause problems. Mon Jun 26 21:15:31 PDT 2006 Kevin Karplus I sent in try1-opt2 chimera align1 (1lam) align2 (2ewbA) align3 (1tkjA) If try2 finishes, I'll probably move it to the number 1 position, and try1 to the number 2 position. I think we may need to optimize the first domain a bit more before building the chimera. I don't think that we can get a hexamer built by tomorrow morning, so we'll miss the soft deadline on that one. Tue Jun 27 00:26:39 PDT 2006 Martin Madera Try2 still running. Hmmmmm. Will do some work on the first domain now (see comments in the M1-I177/README). Tue Jun 27 18:13:27 PDT 2006 Martin Madera Try2-opt2 looks much better than try1-opt2 -- it's the best model so far -- but lots of chain breaks in the first domain. I'll do some more work on the first domain on its own before worrying about the chimera. ------------------- old comments from M1-I177/README ------------------- Sun Jun 25 08:55:22 2006 split-into-domains created subdirectory for M1-I177 of T0318 Make started Sun Jun 25 08:56:07 PDT 2006 Running on lopez.cse.ucsc.edu Tue Jun 27 00:28:39 PDT 2006 Martin Madera The T06 and T04 alignments only contains three sequences. The T2k alignment has more sequences, with the right functional annotation (aminopeptidases), but they look quite distant, so the alignment may not be very good. The str2 predictions for T2k agree with try1-opt2 reasonably well, but there are some differences: 3-6 should be a strand, not a helix, and 47-57 should be a helix, not a strand. So I'll run try2 with secondary structure constaints from T0318.t2k.str2.constraints. I figure that the way to do this is to edit try2.costfcn and change the default include T0318.dssp-ehl2.constraints into include T0318.t2k.str2.constraints Tue Jun 27 17:04:03 PDT 2006 Try2 didn't work. The topology is different from try1 (though the spatial arrangement of secondary structure elements is quite similar), but not what I wanted. It did a fairly good job of making sure that the strands were really strands, but it didn't assemble them into a sheet (3-6 and 72-76). Overall try1 looks better than try2. I'll try doing the sheet constraints by hand. I think I'll start with the sheet in try1. The three central strands look pretty good, so I'll keep that. Then the last strand (150-154) needs to be shifted, but that shouldn't be a problem. Not sure what to do about the first strand, there isn't enough space to make it parallel like the rest so I'll have to make anti-parallel. So: SheetConstraint (T0318)L20 (T0318)V23 (T0318)V6 (T0318)Q3 hbond (T0318)L20 5 SheetConstraint (T0318)S19 (T0318)K25 (T0318)A72 (T0318)P78 hbond (T0318)I21 5 SheetConstraint (T0318)H73 (T0318)D79 (T0318)H110 (T0318)E116 hbond (T0318)L74 5 SheetConstraint (T0318)V109 (T0318)F111 (T0318)G140 (T0318)R142 hbond (T0318)F111 5 SheetConstraint (T0318)L113 (T0318)Y117 (T0318)V150 (T0318)V154 hbond (T0318)A115 5 note the weight of 5 -- I really want to force these. Try3 started 17:55. Wed Jun 28 00:06:02 PDT 2006 Martin Madera Try3 looks identical to try1. Sigh. Noticed a typo: V154 should be K154 in the last line. So try4: increased the weight of the sheet constraints to 30 (from 5), and bumped up the overall constraint weight from 10 to 30. ------------------------------------------------------------------------ Mon Jul 10 18:02:36 PDT 2006 Martin Madera All new comments now go here (i.e. the main README). Looking at the ab initio domain in M1-I177/. The t06 and t04 HMMs are really poor, the t2k one is OK but not outstanding. Set PREFERRED_AL_METHOD:=t2k and re-ran the make. Re-read what I did last time, and it actually sounds sensible. Tue Jul 11 14:03:40 PDT 2006 Martin Madera M1-I177/Try4 didn't work: it didn't manage to assemble the correct sheet. There are two things I'd like to do for M1-I177/try5: 1) Generate more t2k alignments and switch to t2k, following what Kevin said in T0329/README: ---------------------------------------------------------------------- One way to do this would be to put all the reasonable templates (basically the top 10 or 20 hits in T0329.t06.best-scores.rdb) into MANUAL_TOP_HITS in the Makefile, do make extra_alignments make read_alignments foreach x (*/read-alignments-scwrl.under) grep -h t06 $x > $x:s/scwrl/t06-scwrl/ end Then include each of the read-alignments-t06-scwrl.under files to read in the alignments in the try.under script. I'll do this as try10 for T0329. ---------------------------------------------------------------------- The structures in T0318.t2k.best-scores.rdb are: 1zq1A 1jb9A 1yyaA 2bijA 1aoxA 2bv5A 1o5hA 2b5oA 1yb5A 1v7pC 1a8p \ 1ve3A 1aqjA 1jztA 1ogiA 1j3bA 1w34A 1lxjA 2fh7A 1j5sA Made the extra alignments, added them to M1-I177/try5.under. 2) Bump up both helical and sheet constraints, and include them directly in the cost function. M1-I177/Try5 running on peep. Tue Jul 11 18:36:41 PDT 2006 Martin Madera M1-I177/Try5 split up the sheet. Hmmm, that's bad. Wed Jul 12 13:22:42 PDT 2006 Martin Madera M1-I177/Try6: same as M1-I177/try5, but got rid of the sheet constraints (& rr.constraints and all alpha predictions apart from t2k -- which I increased from 2 to 6), bumped up the secondary structure constraints (from 2 to 5). Running on peep. M1-I177/Try7: similar to M1-I177/try6, but kept the sheet constraints from M1-I177/try5 (with weight increased from 3 to 10), increased constraints to 50 (from 30), and increased the secondary structure constraints from 5 to 10. Running on lopez. Wed Jul 12 16:17:33 PDT 2006 M1-I177/Try6 looks OK, but I don't like it... two str2 helices turned into a hairpin, etc. M1-I177/Try7 blew up. For M1-I177/try8 I'm trying to get something out of the top server models, which I got from the main directory and manually restricted to 1-177 using rasmol. I actually quite like some of the server models, so I'll try to do something more with them. The nice models are: RAPTORESS_TS1 RAPTORESS_TS4 SP4_TS2 SPARKS2_TS2 ... ooops, except these four are nearly identical! Oh never mind. Wed Jul 12 17:42:51 PDT 2006 What do the str2 predictions actually mean? A ..... two anti-parallel partners M ..... mixed, parallel and anti-parallel P ..... two parallel partners Q ..... one parallel partner Y,Z ... one anti-parallel partner (Y=H-bonded, Z=no H-bond) The strands we've got are: A/YZ/M ....... at least one anti-parallel P/M .......... at least one parallel A/YZ/M ....... at least one anti-parallel A ............ TWO ANTI-PARALLEL M/A/Q/P/YZ ... a mess ... and notor predictors think that the strongly anti-parallel strand (according to str2) is actually parallel!!! This is a mess. M1-I177/Try8 worked, but scores much worse than M1-I177/try1. Thu Jul 13 14:58:38 PDT 2006 Martin Madera According to the unconstrained cost function, M1-I177/try1 is still the best! So clearly it did something right. Trying to enforce constraints clearly doesn't work (at least for this particular target). I need to start working on the dimers, so the last two runs on this domain: M1-I177/try9: like M1-I177/try1, but with t2k alignments ... peep M1-I177/try10: like M1-I177/try8 (based on server models) but with M1-I177/try9 cost function Thu Jul 13 17:45:29 PDT 2006 Kevin Karplus In the above discussion, I'm not sure when Martin is referring to the whole-chain, and when he is referring to M1-I177. I don't see any whole-chain models newer than try2, so I assume he has been working entirely in M1-I177. Not only do we need the dimer/hexamer, but we also need to have the best M1-I177 domain tacked onto the C-terminal domain and reoptimized. That should have been done almost as soon as M1-I177/try1 was finished. The hard deadline for this is noon tomorrow, and I don't see anything I can submit that is newer than June 27. Thu Jul 13 18:03:26 PDT 2006 Martin Madera I've been trying to get something better for the N-terminal domain, but have failed horribly. M1-I177/Try9 and M1-I177/try10 still score worse than M1-I177/try1, though I do like M1-I177/try10. OK, enough on the domain, I will work with M1-I177/try1 and M1-I177/try10. --- The chimera for M1-I177/try1 is in decoys/chimera.pdb.gz. Making chimera2, which will superpose M1-I177/try10-opt2 with the main domain from try1-opt2. The script is superimpose-martin2.under, the superposition is in superpose2.pdb, the edited chimera is in decoys/chimera2.pdb.gz. Chimera got optimized in try2, but it didn't do a very good job as breaks and clashes are still very high. So I'll try again as try3 with higher breaks and clashes (x2 for both). Try4 will do the same for chimera2. [Aaaaah, I see why the polishing didn't work for try2 -- I commented out all includes for alignments! Well no wonder!!!] Thu Jul 13 18:33:30 PDT 2006 Kevin Karplus You may also want to include the alignments from the subdomain. Thu Jul 13 18:34:57 PDT 2006 Martin Madera Ah, good point! Added. Try3 running on peep, try4 running on orcas. Thu Jul 13 21:18:53 PDT 2006 Kevin Karplus Nothing ready to submit. I'm not going to be able to stay up late enough tonight to do the submission tonight. I'll try to do it first thing in the morning. Please have it ready. Thu Jul 13 23:55:01 PDT 2006 Martin Madera The -opt1 runs for try3 and try4 have finally finished -- almost 5h after I started them! And that's the *monomer* -- we need to make a dimer first, and then a hexamer. Ooops. The good news is that they score better than try2-opt2, and significantly better than the comparable try2-opt1. Fri Jul 14 00:34:41 PDT 2006 Martin Madera The PDB file 1gyt (one of the BLAST hits) is a hexamer! The structure is quite complicated. However, at first sight I would describe it as a dimer of trimers rather than a trimer of dimers. Any one subunit makes contacts with *four* of the remaining five, two within its own trimer and two from the other trimer (one big interface and one small one). The interfaces are large and well-packed (I had a good look at them in slab mode), so my guess is that this is what's present in solution, at least for 1gyt. The intra-trimeric interfaces have about the same area as the bigger one of the inter-trimeric ones. The N-terminal domain in chimera2 seems to be based on 1gyt (and the other top BLAST hits), so this should work nicely. Hmmm, so maybe I should go for the hexamer straight away. Except I can't, because undertaker doesn't understand non-cyclic symmetry! So I'll do the dimer first, and then attempt to trimerize it. Should be fun!!! Fri Jul 14 01:51:22 PDT 2006 Martin Madera I see that try3 and try4 have finished at last -- after SEVEN HOURS! Try4 is better than try3 -- ha, this is the other way around from the individual domains! Interesting, because try4 (based on chimera2) is essentially based on 1gyt. Maybe that helped. So, the dimer: - made dimer/ and dimer/decoys/ - copied and edited the .a2m file & Makefile - 1gyt isn't in T0318.undertaker-align.under, but in general the best alignments are: t06-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m however there is no such alignment for 1gytA, so added 1gytA to the Makefile and made extra_alignments. Then copied it to 1gytA.dimer-a2m and edited. - copied T0284/make-dimer.under to the main dir and edited - ran $ undertaker < make-dimer.under &> dimer.log - ... this was for try4; repeated the same for try3. - checked both dimers and they look fine. So I have two dimers, but the interface looks... ahem, bad. This will need some optimization. So editing dimer/try1.under based on try4.under but following the suggestions in casp7/README. Seems straight-forward... hope I did everything. Now the monomer took 7h, the dimer is likely to take 20h (3-4x as slow?!), but I want it to finish in 1h max ... I still need to do the hexamer!!! So I'm using num_gen 30 gen_size 50 \ super_iter 2 super_num_gen 50 \ instead of num_gen 60 gen_size 100 \ super_iter 5 super_num_gen 100 \ ... a factor of 20 speed-up for opt1, and then for opt2: num_gen 20 gen_size 50 \ super_iter 2 super_num_gen 100 instead of num_gen 60 gen_size 100 \ super_iter 2 super_num_gen 300 ... a factor of 18 speed-up. dimer/try1 (based on try3) running on cheep. Will wait for a bit to see if it goes OK, and then will start dimer/try2 (based on try4) on peep. Fri Jul 14 02:56:35 PDT 2006 Martin Madera Ooops, forgot to change try4->try1! Fixed, restarted. Fri Jul 14 03:06:05 PDT 2006 Martin Madera dimer/try1 reading in lots of alignments, time to start dimer/try2 (based on try4). Running on peep. Fri Jul 14 05:43:27 PDT 2006 Martin Madera This is taking a really long time! But at least opt1 finally finished. And, surprise surprise, in both cases undertaker pushed the two domains apart. I think it's too late to correct this, because I still need to make the final hexamers. Fri Jul 14 07:16:29 PDT 2006 Martin Madera dimer/try2 just finished, no idea about try1. dimer/try2 is a disaster. I mean it looks fine, except the packing is completely different. When you look down the sheets in the superposition in cartoon mode, you can see an "eye" going through both subunits... the two sheets basically stack up. In dimer/try2 the interface is shifted somewhere else. Also, realistically... the *monomer* is 491 residues. The dimer, at close to 1000 residues, took forever -- and the hexamer is close to 3000 residues. The amount of optimization that can be done in 2h isn't worth it, and it will only blow the subunits apart anyway. So... I will build two hexamers based on try3 and try4, but won't optimize them. Fri Jul 14 07:51:46 PDT 2006 Martin Madera Building the hexamer using make-6mer.under failed... oh, actually, it didn't! It worked correctly, the N-terminal domain just moved so far "into space" that it completely disrupted the minor inter-trimer contact. The stacking of the C-terminal domains is correct, even though there are many clashes. Fri Jul 14 08:15:15 PDT 2006 Kevin Karplus Yes, building dimers and hexamers is slow---that's why it is a really bad idea to start on them with only 11 hours to go. We need to start work on targets when they first come in an revisitn them from time to time, not the day before they are due. I still don't see a clear list of what monomers and hexamers to submit, neither in the README file nor in suprimpose-best.under, so I'm going to have to guess. I hat having to come in at the last minute and try to guess what someone else has been doing. DOCUMENT! DOCUMENT! DOCUMENT! For monomers, I see that Martin says that try3 and try4 score better than try2, but which does he prefer? and why? Since they aren't in superimpose -best.under, I can only assume that he never compared them. Since we are down to the wire, I'll probably have to just go based on scores: unconstrained: try4-opt2, try3-opt2, try2-opt2, try1-opt2, chimera2 try4: try4-opt2, try1-opt2.repack-nonPC, try3-opt1, Fri Jul 14 08:15:15 PDT Martin Madera best-models updated -- sorry was trying to say so in the README but you overwrote me. (My fault -- I stole the file from you.) Fri Jul 14 09:13:33 PDT 2006 Kevin Karplus One problem---Martin has put the hexamers and the monomers in the same file---we have to do separate submissions: (up to) 5 monomers, (up to) 5 hexamers So, what I gather from Martin is: 2 hexamers: ReadConformPDB 6mer-1gytA-from-try4-opt2.pdb.gz ReadConformPDB 6mer-1gytA-from-try3-opt2.pdb.gz 3 monomers: ReadConformPDB T0318.try4-opt2.pdb.gz # the top monomer, based on server models ReadConformPDB chimera2.pdb.gz # chimera that the top model was optimized from ReadConformPDB T0318.try3-opt2.pdb.gz # the best monomer we came up with independently I'll add 2 more monomers, but there are no other hexamers. Fri Jul 14 09:31:06 PDT 2006 Kevin Karplus I had to copy the 6mer directory, since Martin had left it unwriteable, and I had add a Makefile, a methods file, and modify the 6mer models (the dimer submission script requires a MODEL record, and the models didn't have one----I need to modify undertaker (or the unpack-multimer script) to add one if there is none. hexamer submissions done. Fri Jul 14 09:44:30 PDT 2006 Kevin Karplus I think I fixed the unpack-multimer script, for future submissions. Rosetta likes best gromacs0.repack-nonPC models: try4-opt2, try1-opt2, try3-opt2, try2-opt2 try1 likes best try3-opt2, try2-opt2, try1-opt2, try4-opt2 unconstrained: try4-opt2, try3-opt2, try2-opt2, try1-opt2, chimera2 try4: try4-opt2, try3-opt2, try1-opt2, try2-opt2, chimera2 try3: try4-opt2, try3-opt2, try1-opt2, try2-opt2, chimera2 try2: try2-opt2, try2-opt2, try1-opt2, try4-opt2, chimera2 try4 < chimera2 < try1-opt2 + M1-I177/try10-opt2 try3 < try2-opt2 < chimera < try1-opt2 + M1-I177/try1-opt2 try1 < alignments (1lam) M1-I177/try10 < RAPTORESS_TS1 (not try8 as Martin says above) M1-I177/try1 < alignments (1j5sA) The M1-I177 try1 model scores better than the try10 model with many of the score functions (though rosetta likes repacking try10 better). it's too bad that Martin didn't do a chimera qith M1-I177/try3-opt2, since it usually scores better than M1-I177/try1-opt2, but there isn't time now to optimize another chimera. I'll order the models try3-opt2 chimera2 try1-opt2 chimera align1 (from 1lam) Dropping try4-opt2.gromacs0.repack-nonPC (from RAPTORESS_TS1 for N-terminal, rosetta's favorite) because I don't like the way try4 has moved the N-terminal domain. It looks like try3 and try4 were optimized without the sheet constraints from the subdomain models they were copied from, resulting in some unnecessary damage to the subdomains. It also looks like doing a little cleaning up---merging try1-opt2 and the first alignment, could have produced a decent first domain. It doesn't look like anyone looked at this---too late now. Fri Jul 14 10:42:56 PDT 2006 Kevin Karplus Submitted with comments We're reasonably happy with the automatic prediction of the domain from R175 on (though there are some bad breaks still to be fixed), but the first domain seems to have been copied poorly from templates that had the second domain, even though the first domain did not match well. We tried predicting the first domain independently and making a chimera of the two domains, but optimizing the chimeras was done poorly, so the first domain was damaged in the process. Model 1 is try3-opt2, the result of optimizing chimera.pdb, but without using sheet constraints from the subdomain prediction. The C-terminus is probably improved, but the N-terminal domain may have been damaged. Model 2 is chimera2, a chimera of M1-I177/try10-opt2 and the whole-chain try1-opt2. The N-terminal domain is polished from the RAPTORESS_TS1 model. The attempt to polish this chimera resulted in moving the N-terminal domain in an unlikely way, so the unoptimized chimera is being submitted. Model 3 is try1-opt2, the automatic prediction. Model 4 is chimera.pdb, our first attempt at patching a different domain 1 on the protein. It consists of automatic prediction for M1-I177 pasted onto the try1-opt2 model. Model 5 is just sidechain replacement by SCWRL on an alignment to 1lam. Had we had more time and attention put on this model, we would probably have tried to fix up the N-terminal domain---we're really only on the first day or two of working on this model, so have not gotten to the point where anything sensible has emerged yet. Fri Jul 14 10:44:51 PDT 2006 Kevin Karplus CASP rejected our hexameric submission, saying that they weren't taking oligomeric submissions for T0318, even though the doc.html file clearly says "arrangement in the crystal is hexameric". Perhaps they were put off by "likely homodimer in solution, but arrangement in the crystal is hexameric". We would have been better off spending more time making a better monomer.