SAM-T06: Full 3D predictions from UCSC.

Kevin Karplus	(group leader)
George Shackelford
Firas Khatib
Martin Madera
Grant Thiltgen
Zack Sanborn
Chris Wong
Pinal Kanabar 
Cynthia Hsu
Crissan Harris
Sylvia Do
NavyaSwetha Davuluri

Biomolecular Engineering Department
University of California, Santa Cruz

The SAM-T06 hand predictions use methods similar to SAM-T04 in CASP6
and the SAM-T02 method in CASP5.

We start with a fully automated method, implemented as the SAM_T06
server. The server runs the SAM-T2K and SAM-T04 iterative methods for
finding homologs of the target and aligning them. The hand method also
uses the experimental new SAM-T06 alignment method, which we hope is
both more sensitive and less prone to contamination by unrelated
sequences.

We use the alignments to make local structure predictions with our
neural nets. Currently we use 10 local-structure alphabets: DSSP,
STRIDE, STR2 (an extended version of DSSP that splits the beta strands
into multiple classes: parallel / antiparallel / mixed, edge /
center), ALPHA (a discretization of the alpha torsion angle between
CA(i-1), CA(i), CA(i+1) and CA(i+2)), BYS (a discretization of
Ramachandran plots due to Bystroff), CB_burial_14_7 (a 7-state
discretization of the number of C_beta atoms in a 14A radius sphere
around the C_beta), near-backbone-11 (an 11-state discretization of
the number of residues in a 9.65A radius sphere around a residue),
DSSP_EHL2 (CASP's collapse of the DSSP alphabet; computed as a
weighted average of the other backbone alphabet predictions), O_NOTOR2
(an alphabet for predicting characteristics of hydrogen bonds from the
carbonyl oxygen) and N_NOTOR2 (an alphabet for predicting
characteristics of hydrogen bonds from the amide nitrogen).

The target sequence is scored against a library of about 8000 (t06),
10000 (t04), or 15000 (t2k) single-track template HMMs. The template
libraries are expanded weekly, but old template HMMs are not
rebuilt. The server also builds two-track HMMs for the target using
each alphabet (a weight of 1.0 for the amino-acid track and 0.3 for
local structure) and scores them against the structures in the
template libraries. Finally, a single-track target HMM is used to
score a non-redundant copy of the entire PDB. Logarithms of all the
E-values are combined in a weighted average (so far with rather
arbitrary weights), and the best templates ranked.

Alignments of the target to the top templates are made using several
alignment methods (mainly with the SAM hmmscore program, but also Bob
Edgar's MUSCLE). We use the alignments to generate fragments (short
9-residue alignments for each position) using SAM's "fragfinder"
program and the 3-track HMM which tested best for
alignment. Residue-residue contact predictions are made using mutual
information, pairwise contact potentials, joint entropy, and other
signals combined by a neural net.

Then our program called Undertaker tries to combine the alignments and
fragments into a consistent 3D model. No single alignment or parent
template is used as a frozen core, though in many cases one has much
more influence than others. The alignment scores are not passed to
Undertaker and only determine the set of alignments and fragments that
Undertaker will see. Helix and strand constraints generated from the
secondary-structure predictions are passed to Undertaker to use in the
cost function, as are the residue-residue contact predictions. One
important change in this server over previous methods is that sheet
constraints are extracted from the top few alignments and passed to
Undertaker.

All the computations described so far are carried out by the
server. After the automatic predictions are done, we examine them by
hand and try to fix any flaws we see. This generally involves
rerunning Undertaker with new cost functions, increasing the weights
for features we want to see and decreasing the weights where we think
the optimization has gone overboard. Sometimes we add new templates or
remove ones that we think are misleading the optimization process. 

New this year, we occasionally used ProteinShop to manipulate proteins
by hand to produce starting points for Undertaker
optimization. Another new trick is to optimize models with Gromacs to
knock them out of a local minimum. The Gromacs optimization is good at
removing clashes, but distorts sidechains and peptide planes. The
resulting models are only a small distance from the pre-optimization
models, but score much worse with the Undertaker cost functions, so
Undertaker can move them more freely than models it has optimized
itself.