Wed May 17 09:19:55 PDT 2006
T0290
Make started Wed May 17 09:20:29 PDT 2006
Running on camano.cse.ucsc.edu

Wed May 17 17:00:39 PDT 2006 Kevin Karplus

Despite all the problems with the servers today, I've managed to get a
fold-recognition result for T0290.  It appears to be SCOP domain
b.62.1.1, with at least 37 templates available.  Picking out the best
template may be a little tricky, as 35 of the PDB sequences appear in
the multiple alignments already, so the HMM scoring may be more for the
consensus of the family than for the specific sequence.  The simplesw
scores may be a better test of which templates are closest (as would
BLAST scores).

There are thousands of sequences in the thin90 multiple alignments, so
there should be enough diversity for the rr predictions to be fairly good.

 
Wed May 17 19:13:15 PDT 2006 Kevin Karplus

The SCOP family found is Cyclophilin (peptidylprolyl isomerase).

RPS Blast finds several hits to cyclophilins, the best two being

cd01926

    Cyclophilin A, B and H-like cyclophilin-type peptidylprolyl cis-
    trans isomerase (PPIase) domain. This family represents the
    archetypal cystolic cyclophilin similar to human cyclophilins A, B
    and H. PPIase is an enzyme which accelerates protein folding by
    catalyzing the cis-trans isomerization of the peptide bonds
    preceding proline residues. These enzymes have been implicated in
    protein folding processes which depend on catalytic
    /chaperone-like activities. As cyclophilins, Human hCyP-A, human
    cyclophilin-B (hCyP-19), S. cerevisiae Cpr1 and C. elegans Cyp-3,
    are inhibited by the immunosuppressive drug cyclopsporin A
    (CsA). CsA binds to the PPIase active site. Cyp-3. S. cerevisiae
    Cpr1 interacts with the Rpd3 - Sin3 complex and in addition is a
    component of the Set3 complex. S. cerevisiae Cpr1 has also been
    shown to have a role in Zpr1p nuclear transport. Human cyclophilin
    H associates with the [U4/U6.U5] tri-snRNP particles of the
    splicesome..

pfam00160, Pro_isomerase, Cyclophilin type peptidyl-prolyl cis-trans isomerase.

The best hits found by NCBI blast in PDB are 
	1c5fO	=1a58
	1iipA	=1ihgA
	1qngA	
	1qnhB	=1qnhA
	1e3bA	=1dywA
	1cynA
	1vdnA
	2cfeA
	1mzwA	=1qoiA
	1xq7C	=1xo7A
	1zcxA
	1zmfA
	...

The 1cf5O hit is 60% identical (79% positives) over a gapless
alignment of 172 residues (all but the final P).  That's going to be a
hard template to do better than!  We have 1a58 as the identical
sequence in the dunbrack-pdbaa set, which comes out sixth on the
T0290.best-scores.rdb list.  A 1a58 alignment does appear to be the
chosen one in TryAllAlign for try1.

Since this is such a close homolog, we'll probably want to raise the
weights of soft-clashes, breaks, and sidechain, to do fancy polishing
on a basically good backbone.

Wed May 17 20:24:40 PDT 2006 Kevin Karplus

The try1-opt2 model and all the models from alignment are very close
in backbone.  There are a couple of loops that vary, but we seem to
have picked up decent templates for them.  We may have messed up the
sidechains, since we are relying on scwrl to clean them up from the
alignment, and it may have changed some critical residues.

Perhaps I should do try2 from just a subset of the top hits and *not*
SCWRL the intial alignments (we'll still run scwrl later on to allow
it to clean up stuff we mess up).

I put 
MANUAL_TOP_HITS:= 1a58 1ihgA 1qngA 1qnhA 1dywA 1cynA 1vdnA 2cfeA 1qoiA 1xo7A 1zcxA 1zmfA
into the Makefile and made 'extra_alignments' and 'read_alignments'
(with separate makes)

Some of them are not in the template library, so have more limited
number of alignments. (No matter, since alignment is fairly trivial on
this template.)

Wed May 17 20:55:26 PDT 2006 Kevin Karplus

try2 has started on camano.
It looks like the 1a58 alignments are still favored.
We're getting a few more clashes than with the SCWRL'd alignments, but
not too many, so I think that there is a chance that this will produce
a better final result than try1-opt2.


Thu May 18 18:11:53 PDT 2006 Kevin Karplus

The try2-opt2 run certainly scored better than try1-opt2, and Rosetta
likes it better after repacking.

I will increase soft_clashes and breaks for try3 and eliminate
constraints, polishing up existing models.

There are a lot of CYS and HIS residues in this protein, suggesting
metal-binding sites.  It might be worthwhile to look at the templates
and see if they have metal ions in them, then add constraints to the
residues that coordinate the ions.  I won't do this for try3, but it
seems like the right next step for try4.


Thu May 25 15:01:55 PDT 2006 Kevin Karplus

try3-opt2 looks pretty good.  We should do one more polishing run and
claim this is done.  Scoring the server results would be interesting also.

The polishing run should include constraints on the cys and his
residues, if appropriate.

Thu May 25 15:10:37 PDT 2006 Kevin Karplus

I scored the server runs with try3.costfcn, and other than our hits,
the top models are ROBETTA_TS2 and 3Dpro_TS2-scwrl.  Oops---I forgot
to add "missing_atoms" to the cost function.


Thu May 25 15:31:34 PDT 2006 Kevin Karplus

Putting in missing_atoms did not change things much, as both the
server models mentioned above were complete.

Sat Jun  3 07:08:22 PDT 2006 Kevin Karplus

There are 4 HIS residues in a cluster (H98, H127, H131, H132).
In 1a58 (61% identical) they do not coordinate a ligand.
Nor in 1ihgA (62%id).  There is a paper discussing histidines in
cyclophilins, which indicates that the chemistry of histidines in
cyclophilins may be a bit unusual:
 Yu L, Fesik SW.
 
 pH titration of the histidine residues of cyclophilin and
 FK506 binding protein in the absence and presence of
 immunosuppressant ligands.
 
 Biochim Biophys Acta. 1994 Nov 16;1209(1):24-32. 	 

In any case, I don't see that I'm going to get any constraints on
these residues from the PDB files, so I might as well submit.
     
Wed Jun 14 10:06:57 PDT 2006 Kevin Karplus

Now released as 2gw2, running evaluation to see how we did.

Wed Jun 14 10:45:42 PDT 2006 Kevin Karplus

Oops---the evaluation script did not include the models actually
submitted by number (though it did evaluate all the tries, so we can
figure it out).  I'll rerun on the farm cluster.

For T0290, there are a lot of SERVERS that beat us, with the best
being ROBETTA_TS2.  Our server came out about 53rd, and that was after
running scwrl (which did better than the server without SCWRL).  Even
our server did better than we did by hand (we submitted try3-opt2):

                          kno cle rmsd log_ rmsd log_    GDT smooth missi real_co
                            t  ns      rmsd  _ca rmsd          _GDT ng_at      st
                                                  _ca                 oms        
                                                                                 
               WEIGHTS--> 0.1 0.5  0.0  0.1  0.0  0.1    0.0    0.0   0.1        

 2gw2A                    0.0 0.0  0.0 -0.9  0.0 -0.9 -100.0 -100.0   0.6   -1.22
 ROBETTA_TS2              0.0 0.0  1.2  0.0  0.5 -0.1  -99.0  -97.7   0.0   -0.03
 ...
 3Dpro_TS2                0.0 0.1  1.2  0.0  0.7 -0.1  -98.8  -97.6   0.0    0.02
 ...
 SAM_T06_server_TS1-scwrl 0.0 0.1  1.5  0.1  0.7 -0.0  -98.0  -95.2   0.0    0.07
 SAM_T06_server_TS1       0.0 0.1  1.6  0.1  0.7 -0.0  -98.0  -95.2   0.0    0.07
 T0290.try1-opt2.repack-n 0.0 0.1  1.6  0.1  0.7 -0.0  -97.4  -94.3   0.0    0.08
 T0290.try1-opt2.gromacs0 0.0 0.1  1.6  0.1  0.7 -0.0  -97.4  -94.3   0.0    0.08
 T0290.try1-opt2.pdb      0.0 0.1  1.6  0.1  0.7 -0.0  -97.4  -94.3   0.0    0.08
 T0290.try3-opt2.pdb.gz   0.0 0.1  1.8  0.1  0.9 -0.0  -96.7  -94.2   0.0    0.12 

Both the server models that we identified as best before (ROBETTA_TS2
and 3Dpro_TS2 do better than us, though scwrling 3Dpro_TS2 makes it worse).

In terms of model1 results (all they will look at), our server is
14th, coming just behind ROBETTA and beating 3Dpro.  It looks like we
might do better by selecting the best server model for comparative
modeling, since we would then have done much better, even than the
best of the servers.  The server with the best TS1 model is FUGMOD.

Wed Jun 14 12:53:44 PDT 2006 Kevin Karplus

It is hard to see how we can fix undertaker's cost fcn to favor
ROBETTA_TS2 over try3-opt2.  In every cost function component we
scored except n_ca_c and hbond_geom_beta_pair, try3-opt2 scored as
well or better.  Making an improvement would require improving the
components of the cost function, and not just reweighting them.

The conserved residues are almost all identical in all the good
models, since they are in the core.  My arginines and lysines tend not
to stick as far into the solvent as ROBETTA's, which I suspect
indicates some poor values in the sidechain cost function---it may be
a result of the Gaussian mixutre I use, as a wide peak will score
worse than a narrow peak, even if it has higher total probability mass.


Fri Jul 14 11:14:14 PDT 2006 Kevin Karplus

In evaluate.unconstrained.pretty, ROBETTA_TS2 is still the best model.
SAM_T06_server is the 13th of the server TS1 models (beating ROBETTA_TS1).
Our best hand model is try1-opt2.repack-nonPC, which wasn't quite as
good as our server model.  SCWRLing our server model improved it :-(.