SAM-T06: Full 3D predictions from UCSC. Kevin Karplus (group leader) George Shackelford Firas Khatib Martin Madera Grant Thiltgen Zack Sanborn Chris Wong Pinal Kanabar Cynthia Hsu Crissan Harris Sylvia Do NavyaSwetha Davuluri Biomolecular Engineering Department University of California, Santa Cruz The SAM-T06 hand predictions use methods similar to SAM-T04 in CASP6 and the SAM-T02 method in CASP5. We start with a fully automated method, implemented as the SAM_T06 server. The server runs the SAM-T2K and SAM-T04 iterative methods for finding homologs of the target and aligning them. The hand method also uses the experimental new SAM-T06 alignment method, which we hope is both more sensitive and less prone to contamination by unrelated sequences. We use the alignments to make local structure predictions with our neural nets. Currently we use 10 local-structure alphabets: DSSP, STRIDE, STR2 (an extended version of DSSP that splits the beta strands into multiple classes: parallel / antiparallel / mixed, edge / center), ALPHA (a discretization of the alpha torsion angle between CA(i-1), CA(i), CA(i+1) and CA(i+2)), BYS (a discretization of Ramachandran plots due to Bystroff), CB_burial_14_7 (a 7-state discretization of the number of C_beta atoms in a 14A radius sphere around the C_beta), near-backbone-11 (an 11-state discretization of the number of residues in a 9.65A radius sphere around a residue), DSSP_EHL2 (CASP's collapse of the DSSP alphabet; computed as a weighted average of the other backbone alphabet predictions), O_NOTOR2 (an alphabet for predicting characteristics of hydrogen bonds from the carbonyl oxygen) and N_NOTOR2 (an alphabet for predicting characteristics of hydrogen bonds from the amide nitrogen). The target sequence is scored against a library of about 8000 (t06), 10000 (t04), or 15000 (t2k) single-track template HMMs. The template libraries are expanded weekly, but old template HMMs are not rebuilt. The server also builds two-track HMMs for the target using each alphabet (a weight of 1.0 for the amino-acid track and 0.3 for local structure) and scores them against the structures in the template libraries. Finally, a single-track target HMM is used to score a non-redundant copy of the entire PDB. Logarithms of all the E-values are combined in a weighted average (so far with rather arbitrary weights), and the best templates ranked. Alignments of the target to the top templates are made using several alignment methods (mainly with the SAM hmmscore program, but also Bob Edgar's MUSCLE). We use the alignments to generate fragments (short 9-residue alignments for each position) using SAM's "fragfinder" program and the 3-track HMM which tested best for alignment. Residue-residue contact predictions are made using mutual information, pairwise contact potentials, joint entropy, and other signals combined by a neural net. Then our program called Undertaker tries to combine the alignments and fragments into a consistent 3D model. No single alignment or parent template is used as a frozen core, though in many cases one has much more influence than others. The alignment scores are not passed to Undertaker and only determine the set of alignments and fragments that Undertaker will see. Helix and strand constraints generated from the secondary-structure predictions are passed to Undertaker to use in the cost function, as are the residue-residue contact predictions. One important change in this server over previous methods is that sheet constraints are extracted from the top few alignments and passed to Undertaker. All the computations described so far are carried out by the server. After the automatic predictions are done, we examine them by hand and try to fix any flaws we see. This generally involves rerunning Undertaker with new cost functions, increasing the weights for features we want to see and decreasing the weights where we think the optimization has gone overboard. Sometimes we add new templates or remove ones that we think are misleading the optimization process. New this year, we occasionally used ProteinShop to manipulate proteins by hand to produce starting points for Undertaker optimization. Another new trick is to optimize models with Gromacs to knock them out of a local minimum. The Gromacs optimization is good at removing clashes, but distorts sidechains and peptide planes. The resulting models are only a small distance from the pre-optimization models, but score much worse with the Undertaker cost functions, so Undertaker can move them more freely than models it has optimized itself.