Mon May 22 09:36:51 PDT 2006 T0295 Make started Mon May 22 09:37:39 PDT 2006 Running on lopez.cse.ucsc.edu Mon May 22 09:42:45 PDT 2006 Kevin Karplus The submitters mention that t0295 has a dimer in the unit cell. BLAST finds very high similarity to 1zq9A (47%id over 276 residues) for an E-value of 4.7e-69. Mon May 22 10:29:24 PDT 2006 Kevin Karplus The t06 multiple alignment finds 27 PDB sequences in the multiple alignment, and the t04 alignment finds 21. Mon May 22 11:21:44 PDT 2006 Kevin Karplus 1zq9A is coming out on top in the t04 scorings (it isn't in the t06 template library, at least not yet). It looks like there are at least 74 templates in the c.66.1 SCOP superfamily, so there should be plenty of variety available for modeling any variable loops. Mon May 22 14:17:52 PDT 2006 Kevin Karplus As expected, 1zq9A scores best in T0295.best-scores.rdb. Not far behind is 1qyrA, then a big jump to 1qamA and others. The conservation in the multiple alignments is focussed mainly on the first 100 residues, particularly in the t06 alignment. The t2k alignment also shows considerable conservation for F158, P161, and P163, but I don't know whether the difference is in what sequences are aligned or the quality of the alignment. Most likely the number of sequences, as t06 has 4174 sequences and t2k has only 3372. The target protein is "dimethyladenosine transferase, putative [Plasmodium falciparum 3D7]", 1zq9A is "probable dimethyladenosine transferase" from Homo sapiens, and 1qyrA is E coli's "High level kasugamycin resistance protein KsgA", which is also in the SCOP family "rRNA adenine dimethylase-like". This appears to be a fairly ancient fold, appearing both bacteria and eukaryotes (which is not surprising for something associated with the ribosome). Mon May 22 15:53:38 PDT 2006 Kevin Karplus The alignments to templates are in excellent agreement for the N-terminal sheet, but the C-terminal helices seem to be a bit scattered. I hope that the top template really indicates where they go! Mon May 22 20:13:16 PDT 2006 Kevin Karplus This is clearly a 2-domain protein, with a domain break somewhere around S177-T181. I will do a subdomain prediction for S177-F275. It looks like T0295.try1-opt1 has a bad misalignment of one strand: S136 to C146 should probably be antiparallel to S167-P174 in some alignment rather than being wound into a helix, though none of the alignments in undertaker-align.sheets has such a pairing. Looking at the models from alignments, only model 1 (from 1zq9A) has the strand wound into a helix. Maybe I should not believe so strongly in 1zq9A, and see what happens if I take out the sheet constraints from that alignments. (Or maybe I should believe 1zq9A and ignore the secondary-structure prediction.) Mon May 22 20:48:35 PDT 2006 Kevin Karplus The S177-F275 region is clearly based on 1zq9A---nothing else comes close. This region is 43.6% identical over 94 residues (BLAST e-value 1.2e-16). Perhaps I should chop off the C-terminal domain and see whether that causes the first domain to find a different template. I'll start a prediction for H1-T181. It looks like C113, C88, and C146 may be coordinating a metal ion, if C146 is in the right place. But C88 and C113 are on opposite sides of the sheet, suggesting a misalignment by 1 of some strand (either C113-Q119 or V87-N91. Of course, none of these CYS residues are conserved, so there may be nothing here of interest. There is a strand misalignment in try1-opt2 (based on burial patterns). I think that L172 should be antiparallel to V115, not A114. When looking at the space-fill structure, I'm much less convinced of this, as V115 is covered by a helix, but K171, which anti-parallels it, is very exposed. Perhaps the current alignment is the best one. According to Blast, 1zq9A is the best match for the first domain also, with 50% identity over 176 residues. The next best is 1qyrA at only 29% identity. Thu May 25 15:08:39 PDT 2006 Kevin Karplus The S177-F275 try1-opt2 prediction looks good, as does the H1-T181. We should do a superposition with the main try1-opt2 and see if we can make a chimera and optimize as a whole. Thu May 25 16:41:36 PDT 2006 Kevin Karplus Other than at the very edges of the domains, the subdomain predictions superpose nicely with the full try1-opt2. If we want to make a chimera, we can take H1-L172 from domain1 I173-D184 from whole chain E185-F275 from domain2 and then reoptimize with break weights turned up to clean up the joins. It's not clear that the effort would be worth it. Sat May 27 18:51:00 PDT 2006 Kevin Karplus I picked up the server predictions, and SAM_T06_server_TS1 scores best (even better than try1-opt2) with the try1 costfcn. Several other servers do fairly well (BaysHH_TS1-scwrl, GeneSilicoMetaServer_TS2, RAPTOR-ACE_TS4-scwrl, ... Maybe I should do a polishing run that takes in all the server models (and current models) and tries polishing them up. Sat May 27 19:07:48 PDT 2006 Kevin Karplus I'm trying a polishing run as try2, but I'm not sure what undertaker will do with incomplete conformations---it may cause a crash in the optimization. Sat May 27 19:41:29 PDT 2006 Kevin Karplus undertaker seems to be coping ok with the missing residues, but there were some PDB files not read, probably because of Windows trash (^M at the ends of lines) that were not properly read by the crude PDB parser that we borrowed from UCSF. In any case, the try2 run is progressing, very occasionally making a tiny improvement to SAM_T06_server_TS1 Sun May 28 09:43:20 PDT 2006 Kevin Karplus Both undertaker and rosetta like try2-opt2 better than try1-opt2. Sun May 28 09:50:55 PDT 2006 Kevin Karplus I just noticed that try2-opt2 puts the second domain in a *very* different place than try1-opt2 and the alignments. While I'm willing to have one structure that moved the domain like this, I think we need to do an optimization with some constraints to maintain the packing of the domains. Possible constraints (taken from try1-opt2): Constraint V130.CA N254.CA 5 5.8 7 Constraint L144.CA E257.CA 7 7.8 9 Constraint L144.CA F183.CA 9 9.9 12 Constraint I140.CA E257.CA 7 7.8 9 Constraint I140.CA L190.CA 8 8.7 11 Sun May 28 10:56:49 PDT 2006 Kevin Karplus Even with these constraints added, SAM_T06_server_TS1 scores best, so the problem arose in the try2 optimization, not in copying from the server model. Perhaps I should pick up constraints from that model instead of try1-opt2. Constraint Y135.CA I253.CA 6 6.8 8 Constraint R137.CA R191.CA 6 6.5 8 Constraint I140.CA E257.CA 6 7.1 8 Constraint I140.CA L190.CA 8 8.6 10 Constraint N141.CA D187.CA 7 7.5 9 Constraint L144.CA E257.CA 7 7.7 9 Constraint L144.CA F183.CA 8 9.5 11 Constraint F145.CA S178.CA 5 5.6 7 I put these into try4.costfcn and started a try4 run. For the try4 run, I selected the models we generated, plus the top 200 server models (according to the try4 scoring). Actually, since the top models were a mix of plain models and scwrled models, I ended up requesting 112 distinct server models and 112 scwrled models from them, in addition to the 10 models from decoys. Some of the requests failed. For example, all the ones that claim to be from RAPTORESS are not, probably because of ^M at the end of the lines for RAPTORESS, which breaks the PDB reading in undertaker. Sun May 28 12:03:37 PDT 2006 Kevin Karplus I tried fixing libpdb (used by undertaker) to remove the extraneous ^M characters and will rerun the scoring of the servers with try4.costfcn. This will probably show that I picked a few of the wrong servers for the long try4 run, but I doubt that it will make any difference, since the top server results will be the same. The RAPTORESS read failures were hiding the real RAPTOR-ACE_TS5-scwrl scores (because of misnaming), but RAPTOR-ACE_TS5 was included in the try4 run anyway. Sun May 28 12:18:18 PDT 2006 Kevin Karplus Foo! the fix to libpdb did not work. Either I patched libpdb wrong or I misdiagnosed the problem. I'll have to try again. Sun May 28 12:29:35 PDT 2006 Kevin Karplus The fix was done only to pdb_read_record, but also needed to be done to pdb_gzread_record. It is now ok. Sun May 28 13:12:27 PDT 2006 Kevin Karplus The RAPTORESS models (which were not read successfully for the try4 run) are not bad, but there are several better server models, so there is not much lost by omitting them. Sun May 28 13:59:07 PDT 2006 Kevin Karplus The best scoring of the server models (other than ours) with the try4 costfcn is BayesHH_TS1-scwrl. Interestingly, the unscwrled model scores much worse. Clashes go up slightly with the scwrling and one or two non-backbone Hbonds are lost, but the sidechain cost improves greatly with scwrl. Sun May 28 15:14:37 PDT 2006 Kevin Karplus Looking at T0295.try4-opt1, it occurs to me that we might want to pack I192.CG2 and CD1 closer to V210.CG1 and CG2. Currently we have Distance VAL210A.CG1-ILE192A.CG2: 4.298 Distance VAL210A.CG1-ILE192A.CD1: 4.316 Distance VAL210A.CG1-ILE192A.CG1: 4.985 Distance VAL210A.CG2-ILE192A.CG2: 3.957 Distance VAL210A.CG2-ILE192A.CG1: 4.575 Distance VAL210A.CG2-ILE192A.CD1: 4.651 and we could reduce them to around 3.3 (except the largest one, which we should probably not try to constrain). Sun May 28 23:04:12 PDT 2006 Kevin Karplus try4-opt2 is the best-scoring so far, both with try4.costfcn and try5.costfcn. grep-best-rosetta likes it best also. Note: there was no try3 run, as that costfcn was rejected in favor of try4.costfcn. Mon May 29 08:41:28 PDT 2006 Kevin Karplus try5 died with a segmentation fault (not even an assertion failure!) I suspect that the problem is the number of conformations that are missing a lot of atoms---the optimization routines are not set up to handle incomplete conformations. Hmm---that doesn't seem to be right, as the OptConform command only adds conformations that are complete (at least if use_all is set). Mon May 29 11:35:45 PDT 2006 Kevin Karplus The bug seems to have been one I introduced last night---I deleted a Segement that was being replaced in find_breaks, but forgot to include a check to make sure that there was something to replace it with before doing the deletion. Mon May 29 12:32:30 PDT 2006 Kevin Karplus try5 seems to be running ok, and it looks like it will succeed in closing all the gaps, which would be nice. Mon May 29 14:11:27 PDT 2006 Kevin Karplus Looking at try5-opt1, I see that there *is* still a break between S136 and R137, even though undertaker has lost sight of it. I wonder how that happened, as the CA-CA distance of 4.473 is much larger than the 3.8024 that is the ideal CA-CA distance. Mon May 29 15:57:22 PDT 2006 Kevin Karplus At least the clashes seem to be getting reduced in try5, so the results maybe ok even if there is an error in the reporting of breaks. Mon May 29 17:04:25 PDT 2006 Kevin Karplus The breaks in try5-opt2 seem to be identical to those in try4-opt2. Why did undertaker lose track of them?? The number of clashes was reduced and the number of H-bonds went up, but the worst clash is the same: other-bump: 1.68792 Ang (T0295)K84.CD and (T0295)R108.O threshold= 2.72575 cost= 0.943612 With the unconstrained.costfcn, try2 and try5 score almost the same. Rosetta likes repacking try5-opt2 better. Tue May 30 16:59:18 PDT 2006 Kevin Karplus I think I've fixed undertaker, so I'll do another polishing run (including the same server models as in try5, plus all the decoys models) to see if I can reduce the breaks for real. I've decreased the constraint weight and increased the soft_clashes and breaks for try6. Tue May 30 19:56:36 PDT 2006 Kevin Karplus Although try6-opt1 scores slightly better than try5-opt2, it really hasn't reduced the breaks and clashes---the tiny changes are from sidechain repacking. We seem to be trapped in a local minimum (albeit a pretty good one). Mon Jun 5 14:07:31 PDT 2006 Kevin Karplus try6-opt2 slightly increased beaks and slightly decreased clashes relative to try6-opt1. Rosetta likes repacking try5-opt2 better than try6-opt2. Mon Jun 5 14:26:41 PDT 2006 Kevin Karplus The biggest difference between try1-opt2, try6-opt2, SAM_T06_server_TS1, and the first undertaker alignment (to 1z9qA) is in the hinging between the two domains. try6-opt2 has the most "closed" of the four hinge positions, though they are all quite similar. I think that we can do one polishing run and submit. For try7 I increased soft_clashes and breaks (and, slightly, pred_alpha2k, pred_alpha04, pred_alpha06). I also decreased phobic_fit, since it favors the incorrect domain orientation of try2-opt2. Starting polishing run as try7 on orcas. Mon Jun 5 16:13:04 PDT 2006 Kevin Karplus try7-opt2 made tiny improvements in breaks and clashes, but rosetta still prefers repacking try5-opt2. I think I'll submit ReadConformPDB T0295.try7-opt2.pdb ReadConformPDB T0295.try5-opt2.repack-nonPC.pdb ReadConformPDB T0295.try1-opt2.pdb ReadConformPDB T0295.undertaker-align.pdb model 1 ReadConformPDB T0295.undertaker-align.pdb model 2 Mon Jun 5 16:21:51 PDT 2006 Kevin Karplus Models submitted. Wed Jun 14 10:12:10 PDT 2006 Kevin Karplus solution released as 2h1rA. Wed Jun 14 16:27:30 PDT 2006 Kevin Karplus Our best submitted model is model3 with a GDT of 70.6% and our best created model was try4-opt1-scwrl with 72.5%, but *lots* of servers did better. 3Dpro_TS2 had GDT of 82.15% ! Our server is right in the middle of the servers. The best TS1 model was HHpred2_TS1, with a GDT of 77.9% (FUNCTION_TS1 had slightly higher GDT, but worse on other functions, like RMSD). I made a "best-post.pdb" file that has our models, then the best server models. The differences seem to be almost entirely hinging between the domains, which I suspect is luck as much as anything else. Fri Jul 14 11:47:06 PDT 2006 Kevin Karplus With the improved evaluation in evaluate.unconstrained.pretty, our best submitted model is still model3 (-0.90), but our best model is try4-opt1-scwrl (-0.93). SAM_T06 is 33rd of 54 TS1 models---pretty feeble (-0.84). The best server model was 3Dpro_TS2 (-1.22) though it could have been improved slightly be scwrling. Thu Sep 14 12:26:32 PDT 2006 Kevin Karplus With the latest revisons to the real_cost evaluation, model3 is our best submitted (-161.79), try4-opt1-scwrl is our best generated (-168.16), and SAM_T06_server is -148.36 (19th of 54).