The SAM-T06 hand predictions use methods similar to SAM_T04 in CASP6 and
the SAM-T02 method in CASP5.

We start with a fully automated method (implemented as the SAM_T06 server):
   
    Use the SAM-T2K and SAM-T04 methods for finding homologs of the
    target and aligning them.  The hand method also uses the
    experimental new SAM-T06 alignment method, which we hope is both
    more sensitive and lass prone to contamination by unrelated sequences.

    Make local structure predictions using neural nets and the
    multiple alignments.  
      
    We currently use 10 local-structure alphabets: 
	DSSP
	STRIDE
	STR2	an extended version of DSSP that splits the beta strands
		into multiple classes (parallel/antiparallel/mixed,
					edge/center)
	ALPHA	an discretization of the alpha torsion angle:
		CA(i-i), CA(i), CA(i+1), CA(i+2)
	BYS	a discretization of Ramachandran plots, due to Bystroff
	CB_burial_14_7	a 7-state discretization of the number of C_beta
		atoms in a 14 Angstrom radius sphere around the C_beta.
	near-backbone-11 an 11-state discretization of the number of
	      residues (represented by near-backbone points) in a 
	      9.65 Angstrom radius sphere around the sidechain proxy
	      spot for the residue.
	DSSP_EHL2	CASP's collapse of the DSSP alphabet
			DSSP_EHL2 is not predicted directly by a
			neural net, but is computed as a weighted
			average of the other backbone alphabet predictions. 
        O_NOTOR2 an alphabet for predicting characteristics of hydrogen
		bonds from the carbonyl oxygen
	N_NOTOR2 an alphabet for predicting characteristics of hydrogen
		bonds from the amide nitrogen
    We hope to add more networks for other alphabets over the summer.
    
    We make 2-track HMMs with each alphabet (1.0 amino acid + 0.3
    local structure) and use them to score a template library of about
    8000 (t06), 10000 (t04), or 15000 (t2k) templates.
    The template libraries are expanded weekly, but old template HMMs
    are not rebuilt.
    
    We also used a single-track HMM to score not just the template
    library, but a non-redundant copy of the entire PDB.

    One-track HMMs built from the template library multiple alignments
    were used to score the target sequence.

    All the logs of e-values were combined in a weighted average (with
    rather arbitrary weights, since we still have not taken the time
    to optimize them), and the best templates ranked.  
    
    Alignments of the target to the top templates were made using
    several different alignment methods (mainly using the SAM hmmscore
    program, but a few alignments were made with Bob Edgar's MUSCLE
    profile-profile aligner).

    Generate fragments (short 9-residue alignments for each position)
    using SAM's "fragfinder" program and the 3-track HMM which tested
    best for alignment.

    Residue-residue contact predictions are made using mutual
    information, pairwise contact potentials, joint entropy, and other
    signals combined by a neural net.  The contact prediction method
    is expected to evolve over the summer, as new features are
    selected and new networks trained.
    
    Then the "undertaker" program (named because it optimizes burial)
    is used to try to combine the alignments and the fragments into a
    consistent 3D model.  No single alignment or parent template was
    used as a frozen core, though in many cases one had much more
    influence than the others.  The alignment scores were not passed
    to undertaker, but were used only to pick the set of alignments
    and fragments that undertaker would see.  Helix and strand
    constraints generated from the secondary-structure predictions are
    passed to undertaker to use in the cost function, as are the
    residue-residue contact prediction.

    One important change in this server over previous methods is that
    sheet constraints are extracted from the top few alignments and
    passed to undertaker. 

After the automatic prediction is done, we examine it by hand and try
to fix any flaws that we see.  This generally involves rerunning
undertaker with new cost functions, increasing the weights for
features we want to see and decreasing the weights where we think the
optimization has gone overboard.  Sometimes we will add new templates
or remove ones that we think are misleading the optimization process.

New this year, we are also occasionally using ProteinShop to
manipulate proteins by hand, to produce starting points for undertaker
optimization.  We expect this to be most useful in new-fold all-alpha
proteins, where undertaker often gets trapped in poor local minima by
extending helices too far.

Another new trick is to optimize models with gromacs to knock them out
of a local minimum.  The gromacs optimization does terrible things to
the model (messing up sidechains and peptide planes), but is good at
removing clashes.  The resulting models are only a small distance from
the pre-optimization models, but score much worse with the undertaker
cost functions, so undertaker can move them more freely than models it
has optimized itself.

For this model, we had two strong hits to the PDB and to our HMM's, 
2g03A and 2f6sA.   The best hit PDB hit to 2g03A matched residues
65-223 of the target.  However, since the target is 299 residues, we
needed to split the model into domains for the first region (residues 
1-65) and the last region (residues 223-299).  These two regions had
no close PDB hits and poor fold recognition hits, so we had to model
these using ab initio techniques.  For almost all the models we made
chimeras of the three domains in order to make decent models, and we had
to include constraints to keep the seven helix bundle for the comparative
modelling portion of the protein together.

Model 1 is try10-opt2.  This is a chimera of try4, which was the first
	polish of the best SAM_T06 server model, and using the first
	run of our subdomain from L225-T299.  We included residues 
	1-229 of the polished SAM_T06 model and residues 230-299 of 
	the subdomain we ran.  This model scores the best using the
	rosetta scoring function.  It also scores best using the try10
	costfcn, which is a fairly neutral function that we decided
	to use.

Model 2 is try12-opt2.  This model is also built from a chimera.  The
	chimera was built by using three separate predictions from 
	subdomains.  The first subdomain was from M1-D64, and used
	a model based on secondary structure prediction and hydrogen
	bond predictions in the first subdomain.  Residues T65-R224
	used try3-opt2, which was a polished model built from alignments
	and using distance constraints to keep the seven helix bundle
	together.  The alignments and constraints were from 2g03A.
	The third domain was from L225-T299 and used try1-opt2 from 
	this region, which was the original results from our first 
	run in this subdomain.  It scored well on two costfcns
	that we used to analyze the models.  It scored first on the 
	unconstrained costfcn and scored second best on the try10 costfcn.

Model 3 is try13-opt2.  This model is a polish of the complete best scoring
	server model from the SAM_T06 server.  It kept the seven helix 
	bundle in the center of the protein and model the rest of the
	protein fairly nicely.  

Model 4 is try20-opt2.  This model is a polish of another chimera that
	we had built.  This chimera was also built from three subdomains.
	M1-T64 used try4-opt2, which was the model based on secondary
	structure predictions.  A65-L224 used try4-opt2 which was another
	model built with constraints on helices and alignments, but this
	time using constraints from the other well scoring model, 2f6sA.
	L225-T299 used try6-opt2, which attempted to use predicted 
	residue-residue contacts, predicted hydrogen bonds, and predicted
	secondary structure.  This chimera was optimized and polished in
	undertaker.  This model scored second best on the unconstrained
	costfcn and scored decently on the try10 costfcn.

Model 5 is try17-opt2.gromacs0.repack-nonPC.  This model scores very well
	with rosetta.  I included it here for a bit of variety for 
	submissions.  This model was also built from a chimera.  This 
	model used try2-opt2 of the M1-T64 subdomain, which was a polish
	of the initial undertaker run for this domain.  A65-L224 used
	try4-opt2 which was based on 2f6sA.  L225-T299 used try5-opt2
	of this subdomain which attempted to used secondary structure
	predictions and hydrogen bond predictions for this domain.  This
	model was a polished model of the chimera.