Wed May 17 09:19:55 PDT 2006 T0290 Make started Wed May 17 09:20:29 PDT 2006 Running on camano.cse.ucsc.edu Wed May 17 17:00:39 PDT 2006 Kevin Karplus Despite all the problems with the servers today, I've managed to get a fold-recognition result for T0290. It appears to be SCOP domain b.62.1.1, with at least 37 templates available. Picking out the best template may be a little tricky, as 35 of the PDB sequences appear in the multiple alignments already, so the HMM scoring may be more for the consensus of the family than for the specific sequence. The simplesw scores may be a better test of which templates are closest (as would BLAST scores). There are thousands of sequences in the thin90 multiple alignments, so there should be enough diversity for the rr predictions to be fairly good. Wed May 17 19:13:15 PDT 2006 Kevin Karplus The SCOP family found is Cyclophilin (peptidylprolyl isomerase). RPS Blast finds several hits to cyclophilins, the best two being cd01926 Cyclophilin A, B and H-like cyclophilin-type peptidylprolyl cis- trans isomerase (PPIase) domain. This family represents the archetypal cystolic cyclophilin similar to human cyclophilins A, B and H. PPIase is an enzyme which accelerates protein folding by catalyzing the cis-trans isomerization of the peptide bonds preceding proline residues. These enzymes have been implicated in protein folding processes which depend on catalytic /chaperone-like activities. As cyclophilins, Human hCyP-A, human cyclophilin-B (hCyP-19), S. cerevisiae Cpr1 and C. elegans Cyp-3, are inhibited by the immunosuppressive drug cyclopsporin A (CsA). CsA binds to the PPIase active site. Cyp-3. S. cerevisiae Cpr1 interacts with the Rpd3 - Sin3 complex and in addition is a component of the Set3 complex. S. cerevisiae Cpr1 has also been shown to have a role in Zpr1p nuclear transport. Human cyclophilin H associates with the [U4/U6.U5] tri-snRNP particles of the splicesome.. pfam00160, Pro_isomerase, Cyclophilin type peptidyl-prolyl cis-trans isomerase. The best hits found by NCBI blast in PDB are 1c5fO =1a58 1iipA =1ihgA 1qngA 1qnhB =1qnhA 1e3bA =1dywA 1cynA 1vdnA 2cfeA 1mzwA =1qoiA 1xq7C =1xo7A 1zcxA 1zmfA ... The 1cf5O hit is 60% identical (79% positives) over a gapless alignment of 172 residues (all but the final P). That's going to be a hard template to do better than! We have 1a58 as the identical sequence in the dunbrack-pdbaa set, which comes out sixth on the T0290.best-scores.rdb list. A 1a58 alignment does appear to be the chosen one in TryAllAlign for try1. Since this is such a close homolog, we'll probably want to raise the weights of soft-clashes, breaks, and sidechain, to do fancy polishing on a basically good backbone. Wed May 17 20:24:40 PDT 2006 Kevin Karplus The try1-opt2 model and all the models from alignment are very close in backbone. There are a couple of loops that vary, but we seem to have picked up decent templates for them. We may have messed up the sidechains, since we are relying on scwrl to clean them up from the alignment, and it may have changed some critical residues. Perhaps I should do try2 from just a subset of the top hits and *not* SCWRL the intial alignments (we'll still run scwrl later on to allow it to clean up stuff we mess up). I put MANUAL_TOP_HITS:= 1a58 1ihgA 1qngA 1qnhA 1dywA 1cynA 1vdnA 2cfeA 1qoiA 1xo7A 1zcxA 1zmfA into the Makefile and made 'extra_alignments' and 'read_alignments' (with separate makes) Some of them are not in the template library, so have more limited number of alignments. (No matter, since alignment is fairly trivial on this template.) Wed May 17 20:55:26 PDT 2006 Kevin Karplus try2 has started on camano. It looks like the 1a58 alignments are still favored. We're getting a few more clashes than with the SCWRL'd alignments, but not too many, so I think that there is a chance that this will produce a better final result than try1-opt2. Thu May 18 18:11:53 PDT 2006 Kevin Karplus The try2-opt2 run certainly scored better than try1-opt2, and Rosetta likes it better after repacking. I will increase soft_clashes and breaks for try3 and eliminate constraints, polishing up existing models. There are a lot of CYS and HIS residues in this protein, suggesting metal-binding sites. It might be worthwhile to look at the templates and see if they have metal ions in them, then add constraints to the residues that coordinate the ions. I won't do this for try3, but it seems like the right next step for try4. Thu May 25 15:01:55 PDT 2006 Kevin Karplus try3-opt2 looks pretty good. We should do one more polishing run and claim this is done. Scoring the server results would be interesting also. The polishing run should include constraints on the cys and his residues, if appropriate. Thu May 25 15:10:37 PDT 2006 Kevin Karplus I scored the server runs with try3.costfcn, and other than our hits, the top models are ROBETTA_TS2 and 3Dpro_TS2-scwrl. Oops---I forgot to add "missing_atoms" to the cost function. Thu May 25 15:31:34 PDT 2006 Kevin Karplus Putting in missing_atoms did not change things much, as both the server models mentioned above were complete. Sat Jun 3 07:08:22 PDT 2006 Kevin Karplus There are 4 HIS residues in a cluster (H98, H127, H131, H132). In 1a58 (61% identical) they do not coordinate a ligand. Nor in 1ihgA (62%id). There is a paper discussing histidines in cyclophilins, which indicates that the chemistry of histidines in cyclophilins may be a bit unusual: Yu L, Fesik SW. pH titration of the histidine residues of cyclophilin and FK506 binding protein in the absence and presence of immunosuppressant ligands. Biochim Biophys Acta. 1994 Nov 16;1209(1):24-32. In any case, I don't see that I'm going to get any constraints on these residues from the PDB files, so I might as well submit. Wed Jun 14 10:06:57 PDT 2006 Kevin Karplus Now released as 2gw2, running evaluation to see how we did. Wed Jun 14 10:45:42 PDT 2006 Kevin Karplus Oops---the evaluation script did not include the models actually submitted by number (though it did evaluate all the tries, so we can figure it out). I'll rerun on the farm cluster. For T0290, there are a lot of SERVERS that beat us, with the best being ROBETTA_TS2. Our server came out about 53rd, and that was after running scwrl (which did better than the server without SCWRL). Even our server did better than we did by hand (we submitted try3-opt2): kno cle rmsd log_ rmsd log_ GDT smooth missi real_co t ns rmsd _ca rmsd _GDT ng_at st _ca oms WEIGHTS--> 0.1 0.5 0.0 0.1 0.0 0.1 0.0 0.0 0.1 2gw2A 0.0 0.0 0.0 -0.9 0.0 -0.9 -100.0 -100.0 0.6 -1.22 ROBETTA_TS2 0.0 0.0 1.2 0.0 0.5 -0.1 -99.0 -97.7 0.0 -0.03 ... 3Dpro_TS2 0.0 0.1 1.2 0.0 0.7 -0.1 -98.8 -97.6 0.0 0.02 ... SAM_T06_server_TS1-scwrl 0.0 0.1 1.5 0.1 0.7 -0.0 -98.0 -95.2 0.0 0.07 SAM_T06_server_TS1 0.0 0.1 1.6 0.1 0.7 -0.0 -98.0 -95.2 0.0 0.07 T0290.try1-opt2.repack-n 0.0 0.1 1.6 0.1 0.7 -0.0 -97.4 -94.3 0.0 0.08 T0290.try1-opt2.gromacs0 0.0 0.1 1.6 0.1 0.7 -0.0 -97.4 -94.3 0.0 0.08 T0290.try1-opt2.pdb 0.0 0.1 1.6 0.1 0.7 -0.0 -97.4 -94.3 0.0 0.08 T0290.try3-opt2.pdb.gz 0.0 0.1 1.8 0.1 0.9 -0.0 -96.7 -94.2 0.0 0.12 Both the server models that we identified as best before (ROBETTA_TS2 and 3Dpro_TS2 do better than us, though scwrling 3Dpro_TS2 makes it worse). In terms of model1 results (all they will look at), our server is 14th, coming just behind ROBETTA and beating 3Dpro. It looks like we might do better by selecting the best server model for comparative modeling, since we would then have done much better, even than the best of the servers. The server with the best TS1 model is FUGMOD. Wed Jun 14 12:53:44 PDT 2006 Kevin Karplus It is hard to see how we can fix undertaker's cost fcn to favor ROBETTA_TS2 over try3-opt2. In every cost function component we scored except n_ca_c and hbond_geom_beta_pair, try3-opt2 scored as well or better. Making an improvement would require improving the components of the cost function, and not just reweighting them. The conserved residues are almost all identical in all the good models, since they are in the core. My arginines and lysines tend not to stick as far into the solvent as ROBETTA's, which I suspect indicates some poor values in the sidechain cost function---it may be a result of the Gaussian mixutre I use, as a wide peak will score worse than a narrow peak, even if it has higher total probability mass. Fri Jul 14 11:14:14 PDT 2006 Kevin Karplus In evaluate.unconstrained.pretty, ROBETTA_TS2 is still the best model. SAM_T06_server is the 13th of the server TS1 models (beating ROBETTA_TS1). Our best hand model is try1-opt2.repack-nonPC, which wasn't quite as good as our server model. SCWRLing our server model improved it :-(.