The SAM-T06 hand predictions use methods similar to SAM_T04 in CASP6 and
the SAM-T02 method in CASP5.

We start with a fully automated method (implemented as the SAM_T06 server):
   
    Use the SAM-T2K and SAM-T04 methods for finding homologs of the
    target and aligning them.  The hand method also uses the
    experimental new SAM-T06 alignment method, which we hope is both
    more sensitive and lass prone to contamination by unrelated sequences.

    Make local structure predictions using neural nets and the
    multiple alignments.  
      
    We currently use 8 local-structure alphabets: 
	DSSP
	STRIDE
	STR2	an extended version of DSSP that splits the beta strands
		into multiple classes (parallel/antiparallel/mixed,
					edge/center)
	ALPHA	an discretization of the alpha torsion angle:
		CA(i-i), CA(i), CA(i+1), CA(i+2)
	BYS	a discretization of Ramachandran plots, due to Bystroff
	CB_burial_14_7	a 7-state discretization of the number of C_beta
		atoms in a 14 Angstrom radius sphere around the C_beta.
	near-backbone-11 an 11-state discretization of the number of
	      residues (represented by near-backbone points) in a 
	      9.65 Angstrom radius sphere around the sidechain proxy
	      spot for the residue.
	DSSP_EHL2	CASP's collapse of the DSSP alphabet
			DSSP_EHL2 is not predicted directly by a
			neural net, but is computed as a weighted
			average of the other backbone alphabet predictions. 
    We hope to add more networks for other alphabets over the summer.
    
    We make 2-track HMMs with each alphabet (1.0 amino acid + 0.3
    local structure) and use them to score a template library of about
    8000 (t06), 10000 (t04), or 15000 (t2k) templates.
    The template libraries are expanded weekly, but old template HMMs
    are not rebuilt.
    
    We also used a single-track HMM to score not just the template
    library, but a non-redundant copy of the entire PDB.

    One-track HMMs built from the template library multiple alignments
    were used to score the target sequence.

    All the logs of e-values were combined in a weighted average (with
    rather arbitrary weights, since we still have not taken the time
    to optimize them), and the best templates ranked.  
    
    Alignments of the target to the top templates were made using
    several different alignment methods (mainly using the SAM hmmscore
    program, but a few alignments were made with Bob Edgar's MUSCLE
    profile-profile aligner).

    Generate fragments (short 9-residue alignments for each position)
    using SAM's "fragfinder" program and the 3-track HMM which tested
    best for alignment.

    Residue-residue contact predictions are made using mutual
    information, pairwise contact potentials, joint entropy, and other
    signals combined by a neural net.  The contact prediction method
    is expected to evolve over the summer, as new features are
    selected and new networks trained.
    
    Then the "undertaker" program (named because it optimizes burial)
    is used to try to combine the alignments and the fragments into a
    consistent 3D model.  No single alignment or parent template was
    used as a frozen core, though in many cases one had much more
    influence than the others.  The alignment scores were not passed
    to undertaker, but were used only to pick the set of alignments
    and fragments that undertaker would see.  Helix and strand
    constraints generated from the secondary-structure predictions are
    passed to undertaker to use in the cost function, as are the
    residue-residue contact prediction.

    One important change in this server over previous methods is that
    sheet constraints are extracted from the top few alignments and
    passed to undertaker. 

After the automatic prediction is done, we examine it by hand and try
to fix any flaws that we see.  This generally involves rerunning
undertaker with new cost functions, increasing the weights for
features we want to see and decreasing the weights where we think the
optimization has gone overboard.  Sometimes we will add new templates
or remove ones that we think are misleading the optimization process.


We were really flailing on this model, as our ab initio techniques
were not making much progress in the short time available.

We are submitting

Model 1 is T0314.try48-opt2, the closest match to the secondary
structure predictions.  Grant made a hairpin of residues 20-29 because
of strand predictions and separation predictions that looked good from
the sep alphabets.  The helix constraints are straight out of the
neural net predictions.  Grant made the sheet constraints from the
str2 secondary structure predictions.  He made a small sheet
constraint that he ended up removing later.  The first run of these
constraints (try20.costfcn) gave me a very good base to start from,
but there was an extra strand that wasn't pairing up with the rest of
the sheet.  Grant finally ended up moving the strand manually with
ProteinShop adding breaks to the models in order to get the strands to
form correctly in undertaker.

Model 2 is try30-opt2 which came from try13-opt2.  Try13 was a
polishing run on the five Robetta server models.  We ended up getting
something vaguely proteinlike.

Model 3, try21-opt2, was an attempt to get the strand in place from
try20-opt2, but it didn't work well.  It ended up looking decent on
its own, so it's a valid model.

Model 4, try35-opt2, is a polished model based on alignments from a
lipoprotein Grant found in the PDB.  A lipoprotein in E.coli (1oapA)
was of similar length and he got a few other structures from vast
(2aizP,1r1mA) to make alignments from. We ended up with a structure
that looked somewhat like the E.coli lipoprotein.

Model 5, try14-opt2, was a polish of the best scoring Pcons6 model
(TS4) (which is really robetta model 7) in undertaker.  Grant ran a
polishing run on it and this is the model.