The SAM-T06 hand predictions use methods similar to SAM_T04 in CASP6 and the SAM-T02 method in CASP5. We start with a fully automated method (implemented as the SAM_T06 server): Use the SAM-T2K and SAM-T04 methods for finding homologs of the target and aligning them. The hand method also uses the experimental new SAM-T06 alignment method, which we hope is both more sensitive and lass prone to contamination by unrelated sequences. Make local structure predictions using neural nets and the multiple alignments. We currently use 10 local-structure alphabets: DSSP STRIDE STR2 an extended version of DSSP that splits the beta strands into multiple classes (parallel/antiparallel/mixed, edge/center) ALPHA an discretization of the alpha torsion angle: CA(i-i), CA(i), CA(i+1), CA(i+2) BYS a discretization of Ramachandran plots, due to Bystroff CB_burial_14_7 a 7-state discretization of the number of C_beta atoms in a 14 Angstrom radius sphere around the C_beta. near-backbone-11 an 11-state discretization of the number of residues (represented by near-backbone points) in a 9.65 Angstrom radius sphere around the sidechain proxy spot for the residue. DSSP_EHL2 CASP's collapse of the DSSP alphabet DSSP_EHL2 is not predicted directly by a neural net, but is computed as a weighted average of the other backbone alphabet predictions. O_NOTOR2 an alphabet for predicting characteristics of hydrogen bonds from the carbonyl oxygen N_NOTOR2 an alphabet for predicting characteristics of hydrogen bonds from the amide nitrogen We hope to add more networks for other alphabets over the summer. We make 2-track HMMs with each alphabet (1.0 amino acid + 0.3 local structure) and use them to score a template library of about 8000 (t06), 10000 (t04), or 15000 (t2k) templates. The template libraries are expanded weekly, but old template HMMs are not rebuilt. We also used a single-track HMM to score not just the template library, but a non-redundant copy of the entire PDB. One-track HMMs built from the template library multiple alignments were used to score the target sequence. All the logs of e-values were combined in a weighted average (with rather arbitrary weights, since we still have not taken the time to optimize them), and the best templates ranked. Alignments of the target to the top templates were made using several different alignment methods (mainly using the SAM hmmscore program, but a few alignments were made with Bob Edgar's MUSCLE profile-profile aligner). Generate fragments (short 9-residue alignments for each position) using SAM's "fragfinder" program and the 3-track HMM which tested best for alignment. Residue-residue contact predictions are made using mutual information, pairwise contact potentials, joint entropy, and other signals combined by a neural net. The contact prediction method is expected to evolve over the summer, as new features are selected and new networks trained. Then the "undertaker" program (named because it optimizes burial) is used to try to combine the alignments and the fragments into a consistent 3D model. No single alignment or parent template was used as a frozen core, though in many cases one had much more influence than the others. The alignment scores were not passed to undertaker, but were used only to pick the set of alignments and fragments that undertaker would see. Helix and strand constraints generated from the secondary-structure predictions are passed to undertaker to use in the cost function, as are the residue-residue contact prediction. One important change in this server over previous methods is that sheet constraints are extracted from the top few alignments and passed to undertaker. After the automatic prediction is done, we examine it by hand and try to fix any flaws that we see. This generally involves rerunning undertaker with new cost functions, increasing the weights for features we want to see and decreasing the weights where we think the optimization has gone overboard. Sometimes we will add new templates or remove ones that we think are misleading the optimization process. New this year, we are also occasionally using ProteinShop to manipulate proteins by hand, to produce starting points for undertaker optimization. We expect this to be most useful in new-fold all-alpha proteins, where undertaker often gets trapped in poor local minima by extending helices too far. Another new trick is to optimize models with gromacs to knock them out of a local minimum. The gromacs optimization does terrible things to the model (messing up sidechains and peptide planes), but is good at removing clashes. The resulting models are only a small distance from the pre-optimization models, but score much worse with the undertaker cost functions, so undertaker can move them more freely than models it has optimized itself. We did not have any good fold-recognition hits for T0356---neither as a full chain nor broken up into putative subdomains. The protein is a bit too large for ab initio techniques, but we threshed around with it for a while anyway. We got some sheet fragments that look pretty good, but no overall fold. Model 1, T0356.try6-opt2.pdb, is a model created from a chimera of three subdomains. The subdomains we created were M1-R180, T160-F340, and T320-S496. We chose to remove the HIS-tag for creating our subdomain models because it was getting in the way of some of our models. The chimera we made for this model used try6-opt2 of the first subdomain, try8-opt2 of the second subdomain and try3-opt of the third subdomain. Try6-opt2 was based on try2-opt2, which used some hbond constraints and ehl2 constraints to create sheets and helices by hand. Then Grant used proteinshop to add a third strand to the first sheet which was trailing off by itself. Try8-opt2 was based on a proteinshopped model of try2 which needed some sheets fixed. Try3-opt2 is based off of try1-opt2 of the third domain. It was the original undertaker run that had been proteinshopped to get all the strands to form a sheet. Grant used up to residue 171 of the first domain, residue 331 of the second domain, and the rest of the third domain, with a HIS-tag tacked onto the end. We used ProteinShop to move some of the clashing secondary structure and backbone elements apart from each other. Grant then sent the chimera into undertaker for a full run (which ran close to 15 hours) to get this model. Model 2, T0356.try11-opt2.repack-nonPC.pdb, is our second highest scoring undertaker model. It uses a different chimera this time. For this chimera, we used try1-opt2 of the first domain, try5-opt2 of the second domain, and try5-opt2 of the third domain. Try1-opt2 was the first model from the undertaker run of the first domain. Try5-opt2 is based on try2 of the second domain with some proteinshopped strands to form a sheet. Try5-opt2 is a polished model of try3-opt2 which has the proteinshopped strands. Grant cut the first domain at residue 160, the second domain at residue 330 and the rest from the third model. I then added a HIS-tag. We used protein shop to move some of the clashing secondary structure and backbone elements apart from each other. Grant sent it to undertaker with the constraints from each of the subdomains to keep the secondary structure elements, and this is the result. Model 3, T0356.try13-opt2.pdb, is based on the first chimera again using the constraints from the three individual subdomains with hbond constraints added for the predicted hbonds from the sep and notor alphabets. Model 4 is T0356.try5-opt2.gromacs0.repack-nonPC.pdb, the best scoring model using the rosetta scoring function. It is an optimization of a reoptimization of server model RAPTORESS_TS4 by undertaker, then gromacs, then sidechain repacking by rosetta. RAPTORESS_TS4 and RAPTORESS_TS5 were the high-scoring server models with the unconstrained costfcn. Model 5 is T0356.try1-opt2.pdb, the atuomatic undertaker run starting from alignments.