PFRMAT TS TARGET T0495 AUTHOR 4008-1775-0004 METHOD The SAM-T08 hand predictions use methods similar to SAM_T06 in CASP7. METHOD METHOD We start with a fully automated method (implemented as the SAM-T08-server): METHOD METHOD Use the SAM-T2K, SAM-T04, and SAM-T06 methods for finding homologs METHOD of the target and aligning them. METHOD METHOD Make local structure predictions using neural nets and the METHOD multiple alignments. These neural nets have been newly trained METHOD for CASP8 with an improved training protocol. The neural nets for METHOD the 3 different multiple sequence alignments are independently METHOD trained, so combining them should offer improved performance. METHOD METHOD We currently use 15 local-structure alphabets: METHOD STR2 an extended version of DSSP that splits the beta strands METHOD into multiple classes (parallel/antiparallel/mixed, METHOD edge/center) METHOD STR4 an attempt at an alphabet like STR2, but not requiring DSSP. METHOD This alphabet may be trying to make some irrelevant METHOD distinctions as well. METHOD ALPHA an discretization of the alpha torsion angle: METHOD CA(i-i), CA(i), CA(i+1), CA(i+2) METHOD BYS a discretization of Ramachandran plots, due to Bystroff METHOD PB de Brevern's protein blocks METHOD METHOD N_NOTOR METHOD N_NOTOR2 METHOD O_NOTOR METHOD O_NOTOR2 alphabets based on the torsion angle of METHOD backbone hydrogen bonds METHOD METHOD N_SEP METHOD O_SEP alphabets based on the separation of donor and METHOD acceptor for backbone hydrogen bonds METHOD METHOD CB_burial_14_7 a 7-state discretization of the number of C_beta METHOD atoms in a 14 Angstrom radius sphere around the C_beta. METHOD near-backbone-11 an 11-state discretization of the number of METHOD residues (represented by near-backbone points) in a METHOD 9.65 Angstrom radius sphere around the sidechain proxy METHOD spot for the residue. METHOD METHOD DSSP_EHL2 CASP's collapse of the DSSP alphabet METHOD DSSP_EHL2 is not predicted directly by a METHOD neural net, but is computed as a weighted METHOD average of the other backbone alphabet predictions. METHOD METHOD We make 2-track HMMs with each alphabet with the amino-acid track METHOD having a weight of 1 and the local structure track having a weight METHOD of 0.1 (for backbone alphabets) or 0.3 (for burial alphabets). METHOD We use these HMMs to score a template library of about METHOD 14000 (t06), 16000 (t04), or 18000 (t2k) templates. METHOD The template libraries are expanded weekly, but old template HMMs METHOD are not rebuilt. The target HMMs are used to score consensus METHOD sequences for the templates, to get a cheap approximation of METHOD profile-profile scoring, which does not yet work in the SAM package. METHOD METHOD We also used single-track HMMs to score not just the template METHOD library, but a non-redundant copy of the entire PDB. This scoring METHOD is done with real sequences, not consensus sequences. METHOD METHOD All the target HMMs use a new calibration method the provides more METHOD accurate E-values than before, and can be used even with METHOD local-structure alphabets that used to give us trouble (such as METHOD protein blocks). METHOD METHOD One-track HMMs built from the template library multiple alignments METHOD were used to score the target sequence. Later this summer, we METHOD hope to be able to use multi-track template HMMs, but we have not METHOD had time to calibrate such models while keeping the code METHOD compatible with the old libraries, so the template libraries METHOD currently use old calibrations, with somewhat optimistic E-values. METHOD METHOD All the logs of e-values were combined in a weighted average (with METHOD rather arbitrary weights, since we still have not taken the time METHOD to optimize them), and the best templates ranked. METHOD METHOD Alignments of the target to the top templates were made using METHOD several different alignment settings on the SAM alignment software. METHOD METHOD Generate fragments (short 9-residue alignments for each position) METHOD using SAM's "fragfinder" program and the 3-track HMM which tested METHOD best for alignment. METHOD METHOD Residue-residue contact predictions are made using mutual METHOD information, pairwise contact potentials, joint entropy, and other METHOD signals combined by a neural net. Two different neural net METHOD methods were used, and the results submitted separately. METHOD METHOD CB-CB constraints were extracted from the alignments and a METHOD combinatorial optimization done to choose a most-believable METHOD subset. METHOD METHOD Then the "undertaker" program (named because it originally METHOD optimized burial) is used to try to combine the alignments and the METHOD fragments into a consistent 3D model. No single alignment or METHOD parent template was used as a frozen core, though in many cases METHOD one had much more influence than the others. The alignment scores METHOD were not used by undertaker, but were used only to pick the set METHOD of alignments and fragments that undertaker would see. METHOD METHOD The cost functions used by undertaker rely heavily on the METHOD alignment constraints, on helix and strand constraints generated METHOD from the secondary-structure predictions, and on the neural-net METHOD predictions of local properties that undertaker can measure. METHOD The residue-residue contact predictions are also given to METHOD undertaker, but have less weight. There are also a number of METHOD built-in cost functions (breaks, clashes, burial, ...) that are METHOD included in the cost function. METHOD METHOD The automatic script runs the undertaker-optimized model through METHOD gromacs (to fix small clashes and breaks) and repacks the METHOD sidechains using Rosetta, but these post-undertaker optimizations METHOD are not included in the server predictions. They can be used in METHOD subsequent re-optimization. METHOD METHOD After the automatic prediction is done, we examine it by hand and try METHOD to fix any flaws that we see. This generally involves rerunning METHOD undertaker with new cost functions, increasing the weights for METHOD features we want to see and decreasing the weights where we think the METHOD optimization has gone overboard. Sometimes we will add new templates METHOD or remove ones that we think are misleading the optimization process. METHOD We often do "polishing" runs, where all the current models are read in METHOD and optimization with undertaker's genetic algorithm is done with high METHOD crossover. METHOD METHOD Some improvements in undertaker include better communication with METHOD SCWRL for initial model building form alignments (now using the METHOD standard protocol that identical residues have fixed rotamers, rather METHOD than being reoptimized by SCWRL), more cost functions based on the METHOD neural net predictions, multiple constraint sets (for easier METHOD weighting of the importance of different constraints), and some new METHOD conformation-change operators (Backrub and BigBackrub). METHOD METHOD We also created model-quality-assessment methods for CASP8, which we METHOD are applying to the server predictions. We do two optimizations from the METHOD top 10 models with two of the MQA methods, and consider these models METHOD as possible alternatives to our natively-generated models. METHOD METHOD T0495 has weak but consistent hits to SCOP superfamily c.52.1 METHOD (restriction endonuclease-like). Homology was not strong enough to METHOD decide the orientation of strand 5, which varies in this superfamily. METHOD There is often a minor sheet in the superfamily as well, which none of METHOD the models managed to get. METHOD METHOD METHOD 1 T0495.try12-opt3.pdb #