Wed May 10 09:39:36 PDT 2006 Kevin Karplus

This directory has all the working notes for CASP7 predictions.
This README file will be added to as the prediction season progresses,
with notes about how to do predictions and changes to the software and
procedures.  The individual targets are in subdiretories T0283
... with a README file for each directory discussing the target.

It is probably worth reading ../casp6/README for notes on the
procedures used in the previous CASP in 2004.

------------------------------------------------------------

TABLE OF CONTENTS

Getting started
Creating a new try*.under file
Making a chimera
Handling a homodimer
Moving a helix with ProteinShop
Viewing Undertaker PDB Hbonds with ProteinShop
Moving Dimers with ProteinShop
Looking at burial predictions on real proteins
BLAST to choose among close templates
Using jmol from Firefox on the Linux machines
Preparing final models for submission
Mailing predictions (RR and TS formats)
Mailing dimer predictions
Confirming submissions of files
Adding specific templates to the set used by undertaker
To generate sheet constraints from an alignment
To focus on a particular multiple alignment
To download and score server predictions
To score server models for many targets.
To get Robetta models that were not submitted to CASP
To evaluate a model once the correct solution is released
------------------------------------------------------------


Wed May 10 09:42:23 PDT 2006 Kevin Karplus

Getting Started

New targets will generally be started by me (Kevin), unless I am out
of town when they are released.  The command to start a new target is
new-target, and takes a single argument, the target number:

casp7/scripts/new-target T0283

This script creates the working directory for the target and puts a
Makefile there, but does nothing to actually get the target from the
casp7 web site or do the prediction.  Those tasks are done with

cd T0283
(make -k >& make.log; gzip -9f make.log) &

which runs the iterated searches, the local structure predictions,
the fold-recognition searches, fragment-finding, contact prediction,
and one run of undertaker for full 3D prediction.

When it has finished, we need to look at the alignments and
predictions done, improve the cost function (and possibly the input
alignments) for undertaker and generally tweak things up until we are
satisfied.


------------------------------------------------------------

Mon Jun 26 12:14:48 PDT 2006 Kevin Karplus
mkdi
Creating a new try*.under file

All our optimization runs with undertaker are done with an undertaker
command file of the form "try7.under", with the numbers increasing
sequentially as we try different things.

The first one, try1.under, is created automatically by the initial make.
It generally serves as a basis for future try*.under files.


The first thing to do to create a new try.under file is to copy an
existing one (be sure you don't step on one that already exists).
I generally do this in emacs, but you can also use "cp try1.under try2.under"

The *very* first thing to do, before you even *start* to think about
what you want to change is to do a global replacement of try1 by try2
(or try7 by try8, or whatever the names are) in the file.  Every year
someone forgets to do this and overwrites an existing good result with
a bad one.

After try1 has finished, the templates that it read from PDB files
have been saved in a more compact format in Templates.atoms.gz
Generally, you do not want to re-create this file on subsequent runs,
so you should comment out the line

	PrintTemplateAtoms Template.atoms

If you are changing what set of PDB files get used as templates (say
because you added a bunch of ideas that did not come from the HMM
searches), then you can uncomment the line for one run.

Note: because the try1.under script refers to the try1.costfcn cost
function, you need to create a new cost function for each new try.
Generally, it is in the cost function that most our work is done.

If you want to generate new models from the alignments, then just
copying the try1.under file and replacing "try1" by the new try works
fairly well.

If you want to focus on a subset of the templates, you need to change
which alignments undertaker works from.  The default set comes from 5 lines:

Include XXX0000.t04.undertaker-align.under
Include XXX0000.t06.undertaker-align.under
Include XXX0000.t2k.undertaker-align.under
Include XXX0000.undertaker-align.under
ReadFragmentAlignment NOFILTER SCWRL all-align.a2m

If you want to eliminate a particular alignment or template from the
set that undertaker considers, you must comment these out.
The TryAllAlign commands then become useless, unless you provide a
different set of alignments to work from.  Uncommenting
// InfilePrefix 1xxxX/
//   include read-alignments-scwrl.under
(with 1xxxX replaced by the actual template chain id) will provide
undertaker with specific alignments.

If 1xxxX/read-alignments-scwrl.under doesn't exist, you can make it by
adding 1xxxX to the space-separated list of chains defining
MANUAL_TOP_HITS in the Makefile, then doing
	make extra_alignments
	make read_alignments

Sometimes you don't want to start from alignments, but from an
existing complete model or set of complete models (from previous runs
or from automatic servers).

To use existing models, comment out all the TryAllAlign commands (and
the first SCWRLConform command before OptConform has been called).
Uncomment
## InfilePrefix decoys/
## include read-pdb.under
to include all existing models in the decoys directory.
(You can do "make decoys/read-pdb.under" if it doesn't exist, but it usually
gets created automatically by "make decoys/score-all.try*.pretty")

If you want to optimize specific models, don't use the "include
read-pdb.under", but provide a  ReadConformPDB command for each model
you want to consider in the initial set.

When you are optimizing existing models, you may want to tweak the
pseudocounts for the conformation-change operators:  for example,
increasing the initial probability of CrossOver between models, and
decreasing the initial probability of InsertSpecificFragment.
This is not terribly important, as the adaptation will eventually pick
out which operators to use, but it can make the process more efficient.


------------------------------------------------------------


Sat May 13 15:20:19 PDT 2006 Kevin Karplus

Making a chimera

Sometimes one wants to combine two different predictions of a protein,
copying some parts from one conformation, some parts from another.

Right now, the easiest way to do this is to superimpose the
conformations (using a script like superimpose-best.under), then use
emacs to do cut-and-paste operations on the superimposed models.
you can control the superposition to make things match particularly
well where the cutting will occur, so that those residues line up very
precisely.

------------------------------------------------------------

Mon May 15 08:04:01 PDT 2006 Kevin Karplus

Handling a homodimer

This method assumes that you have a pretty good monomer that you want
to dimerize based on a template with an existing dimer and then
optimize.  It is not intended for creating dimers
from scratch.

1) create a subdirectory dimer/ (or 3mer/, 4mer/ ...)
2) in dimer create a target fasta file with the lengthened target
   sequence.  For example, T0284.a2m would de
    >T0284 PA4872, Pseudomonas aeruginosa PAO1, 287 res
    MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGSVASLQVLAAP
    DFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVMRTVVELERAGIAALTIEDTL
    LPAQFGRKSTDLICVEEGVGKIRAALEARVDPALTIIARTNAELIDVDAVIQRTLAYQEA
    GADGICLVGVRDFAHLEAIAEHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFA
    AIKATYDCLREERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE

    MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGSVASLQVLAAP
    DFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVMRTVVELERAGIAALTIEDTL
    LPAQFGRKSTDLICVEEGVGKIRAALEARVDPALTIIARTNAELIDVDAVIQRTLAYQEA
    GADGICLVGVRDFAHLEAIAEHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFA
    AIKATYDCLREERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE

 3) Copy the Makefile to the dimer/ subdirectory, and add a macro
    (before the include)
    	MONOMER_LENGTH := 287

 4) Make a dimer/decoys/ directory

 5) Create a script make-dimer.under in the main directory
    (start with "make make-dimer.under")
    This script needs to have a properly dimerized template to copy
    the positioning from and a monomer to dimerize.

 6) Create an alignment file that has the target and copies of the
    best alignment.  For example, for T0284, we have
    T0284/1mumA/1mumA.dimer-a2m modified from
    T0284-1mumA-t04-local-str2+CB_burial_14_7-1.0+0.4+0.4-adpstyle5.a2m :

>T0284 PA4872, Pseudomonas aeruginosa PAO1, 287 res
MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGS
VASLQVLAAPDFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVM
RTVVELERAGIAALTIEDTLLPAQFGRKSTDLICVEEGVGKIRAALEARV
DPALTIIARTNAELIDVDAVIQRTLAYQEAGADGICLVGVRDFAHLEAIA
EHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFAAIKATYDCLR
EERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE
>1mumA
sl------HSPGKAFRAALTKENPLQIVGTINANHALLAQRAGYQAIYLS
GGGVAAGSLGLPDLGISTLDDVLTDIRRITDVCSLPLLVDADIGFGsSAF
NVARTVKSMIKAGAAGLHIEDQVGAKRCGHrPNKAIVSKEEMVDRIRAAV
DAKTDPDFVIMARTDALAvEGLDAAIERAQAYVEAGAEMLFPEAITELAM
YRQFADAVQVPIlaNITEFGATPLFTTDELRSAHVAMALYPLSAFRAMNR
AAEHVYNVLRQegtqksVIDTMQTRNELYESINYYQYEEKLDNL------
farsqvk
>1mumB
sl------HSPGKAFRAALTKENPLQIVGTINANHALLAQRAGYQAIYLS
GGGVAAGSLGLPDLGISTLDDVLTDIRRITDVCSLPLLVDADIGFGsSAF
NVARTVKSMIKAGAAGLHIEDQVGAKRCGHrPNKAIVSKEEMVDRIRAAV
DAKTDPDFVIMARTDALAvEGLDAAIERAQAYVEAGAEMLFPEAITELAM
YRQFADAVQVPIlaNITEFGATPLFTTDELRSAHVAMALYPLSAFRAMNR
AAEHVYNVLRQegtqksVIDTMQTRNELYESINYYQYEEKLDNL------
farsqvk

  7) in dimer make try1.costfcn, then edit it to have a KnownBreak
     between the chains:

     KnownBreak M288

    If you want any constraints on the optimization, it is necessary
    to make multiple copies in the cost function, renumbering the
    constraints in the later chains (a real pain).  Alternatively, you
    can compute the constraints only on the first monomer.  If the
    monomers are identical, this should not cause any problems.

    Getting the scoring for predicted alpha may be harder, as
    generating multiple alignments and predictions for the polyprotein
    chain will be harder. (We could write scripts to take the
    monomeric predictions and concatenate them with renumbering, but
    haven't yet done this.)  It may be easiest just to comment out the
    CreatePredAlphaCost commands of the costfcn, and remove the
    pred_alpha components.


Once you have an acceptable dimer, you want to optimize it, keeping it
dimerized in roughly the same orientations.

If you read in a dimer with ReadConformPDB, be sure to mark it as a
dimer by following the read command with
	Multimer 2
as a separate command to label the dimer as a cyclic dimer.
Note: if the multimer is *not* cyclic then *don't* label it, as
undertaker will try to symmetrize it.

You can do the optimization as usual, but use "multimer 2" in the
OptConform arguments.  Any alignments (for fragments and the like) can
be gotten from the original monomeric runs.   You probably want to
reduce the duration of the run (by reducing num_gen, gen_size,
super_iter, and/or super_num_gen), because multimeric runs take longer
than monomeric ones.  You can also read the Template.atoms file from
the monomeric directory, avoiding duplicating that file.

You might want to turn off TweakMultimer at first if you are trying to pack a
tight interface, as it will tend to move monomers apart to reduce clashes.
But if you have a loose interface, you definitely want TweakMultimer
on to try to tighten up the interface.

It may be necessary to add some inter-chain constraints to hold the
dimer together.  Even without TweakMultimer on, undertaker may find a
way to alleviate clashes by moving parts of the dimer away from each
other as it did in try1 (of T0284/dimer).

Note: you don't always want "multimer 2" for a dimer or "multimer 4"
for a tetramer.  What the command (or option to OptConform) do is to
force the creation of a cyclic multimer.  That is the transform that
takes A to B will take B back to A for a dimer, or T(A->B) = T(B->C) =
T(C->D) = T(D->A) for a tetramer.  Not all multimers are cyclic!

You can still optimize non-cyclic multimers in undertaker, but you
must *not* use the multimer command or option to OptConform.  This
will cause each chain to be separately optimized but the "OptSubtree"
method will tend to rearrange the transformation between chains.

You can optimize a mixture of cyclic and non-cyclic dimers in
OptConform if they are initially labeled with Multimer commands and
OptConform has no "multimer" keyword (or, equivalently, "multimer 0").
If OptConform has "multimer 2" set, then all multimers will be set ot
be cyclic dimers.

Note: you can do optimization of a some tetramer with symmetry S_{2,2}
by telling OptConform to use "multimer 2".  You don't get the full
symmetry, but you will get some symmetry: chain A and chain B will be
independently optimized, but chain C and chain D will be copies of
chains A and B and T(AB->CD)= T(CD->AB).


NOTE: gromacs doesn't like big chain breaks, and it will not see the
multimer merged into a single chain as two chains.  To get gromacs to
optimize a multimer, you need to unpack the multimer into separate chains:
	cd casp7/T0332/dimer
	make decoys/T0332.try2-opt2.unpack.pdb.gz decoys/T0332.try2-opt2.unpack.gromacs0.pdb.gz

You can get this to happen for you automatically if you use
	cd casp7/T0332/dimer
	(make  T0332.mult2 >& do2.log; gzip -9f do2.log)&
instead of the monomer version
	(make  T0332.do2 >& do2.log; gzip -9f do2.log)&

Sat Jul  1 13:35:27 PDT 2006 Kevin Karplus

I made a small change to undertaker, adding
        force_alignment
        fragment_only

options to ReadFragmentAlignment, so that I could force undertaker to
treat the short fragments as being a complete alignment or not being
treated as an alignment at all (just fragments).  If neither option is
provided, then it is added to the alignment library only if it is
multiple fragments or a sufficiently long single fragment (something
like half the total protein length).

For multimers, you can include force_alignment in the
ReadFragmentAlignment command that specifies the multimer, to avoid
losing an alignment that has only a short piece aligned to show what
corresponds.


------------------------------------------------------------

Date: Mon, 15 May 2006 14:48:15 -0700
From: "Firas Khatib"
To: "Kevin Karplus"
Subject: ProteinShop discovery!

I finally figured out how to lock 1 secondary structure element and
select the coils on either side to only move THAT ss element and the
coils, leaving the rest of the proten intact!

Ctrl-Shift-Left Button on the coil toggles the activation state of a coil region

small victories with Proteinshop! :)

--Firas

------------------------------------------------------------

Date: Tue, 13 Jun 2006 16:17:25 -0700
From: "Firas Khatib"
To: "Kevin Karplus"
Subject: Proteinshop discovery!

I figured out a quick and easy way to get Proteinshop's hydrogen bonds
visualizer to work with Undertaker PDB files!

This can be very useful tool, since Proteinshop can also display the
hydrogen cages and hydorgen bond sites, so moving strands with
Proteinshop can be easier!

The solution is to open undertaker's PDB in molmol, clicking RIBBONS
(which turns on molmol's ssa) and saving the file.

This new file has Hydrogens in it (determined by molmol of course) and
you can open it with Proteinshop and the hydrogen bonds will appear!

---Firas

------------------------------------------------------------


Using Dimers with ProteinShop:
Since Proteinshop does not deal with chainbreaks (it connects any gaps
in the chain with a line that cannot be shrunk) you have to do the following
if you want to move 1 dimer relative to the other:

You need to save each chain in your dimer as 2 different files and open
them both with Proteinshop.

You will notice that if you move anything it will move BOTH chains (which
doesn't help you in any way).

Under the toolbar click on "Windows" and "Show Selection Dialog".
Now you can select the chain you want to move (but you will notice that it
still moves both!)
You must then use the knobs on the "Protein Selection Dialog" to move the
chain into the position that you want. This is tricky, but not too bad.

Note that even if you turn "Visualize Atom Collisions" on, it will not show
you any clashes BETWEEN your two chains!

When you have it aligned the way you want you save your file and then you
have to cat both chains together and run my script renumberChain.pl (which
is located in ~/casp7/scripts) to have them numbered correctly.
(you might also have to replace all the chain letters).

Then load it back into Proteinshop and turn on "Visualize Atom Collisions"
to see if you have any clashes you can quickly fix.


------------------------------------------------------------


Looking at burial predictions on real proteins.

Martin suggested looking at burial predictions on real proteins, to
see what they looked like there, before trying to modify unknown
proteins to fit some pre-conceived notion of how burial should look.

Here is my reply:

Date: Mon, 15 May 2006 18:14:06 -0700
From: Kevin Karplus
Subject: Re: Burial predictions for known 3D structures


It is certainly worthwhile to look at what the predictors are doing.
Predictions have been run for many of the proteins in the template
library (for example, the test set that Grant has been using for
fold-recognition tests).

The list of ids in that test set is in
	pcem/indexes/dunbrack-in-scop-2005-folds.ids
(pcem is a soft link I use for /projects/compbio/experiments/models.97/)

The predictions for 1w2wA, for example, would be in directory
	pcem/pdb/1w/1w2wA/nostruct-align/
with names like 1wswA.t2k-near-backbone-11.rdb

We haven't set up rasmol scripts for them, but this would be a fairly
easy change to the pcem/Makefile.models97 file, since the perl script
for creating the rasmol scripts is called from other makefiles (such
as casp7/starter-directory/Make.main).

I agreee that there are exposed residues on T0283.try4-opt2 that should be
buried, but I've not looked at what Firas has done to the model yet.
I had assumed that he had done nothing so far, since he had not put
any notes in the README file.

------------------------------------------------------------

BLAST to choose among close templates

Date: Thu, 18 May 2006 16:03:27 -0700
From: Kevin Karplus
Subject: new target in Make.main


The casp7 Make.main file has a new target
	${TARGET}.pdb.blast

This does a quick blastp of the dunbrack-pdbaa subset of the pdb
database using the target sequence and returns a short table of the
top hits.

This may be a good way to choose top templates when there are many
close templates.  The HMMs tend to pick templates that match the
consensus of the model, rather than the specific target.  This is good
for distant fold recognition, but may choose poor templates when there
are many very close ones.
------------------------------------------------------------


Using jmol under Firefox from LINUX boxes

Thu Jun 29 13:36:40 PDT 2006 Kevin Karplus

Using jmol has not been working from the PDB website on the Linux
boxes in the labs, though other machines (such as Mac OS X) running
Firefox have had no problems.

I asked the sysadmins how to fix it and got the following technique:

mkdir ~/.mozilla/plugins
cd  ~/.mozilla/plugins
ln -s /usr/java/jre1.5.0_06/plugin/i386/ns7/libjavaplugin_oji.so  .

(Warning: the jre version may vary depending on the computer, which
may also cause problems.)

I have not tested this yet, but will update the README file after I
have tested it.
----------------------------------------------------------------------


Preparing final models for submission

Tue Jul 11 11:29:30 PDT 2006 Kevin Karplus

The details for doing an actual submission are below, but everyone
needs to know how to get a submission ready for me to look at.

Have a superimpose-best.under file in the main directory and another
one in the dimer/ directory (for models that need a DIMER submission
as well).  There should be  exactly 5 models in the
superimpose-best.under---the five to be submitted, best first.
(If there are questions, then you can include more, but be sure to
spell out exactly what decisions are needed in the README file---the
number needs to be reduced to 5 for the final submission.)
Do "make best-models.pdb.gz" to gather the selected predictions into
one file.

Have an explanation of the history of each model to be submitted in
the README file. For example, for T0312

Probable current submission:
	try17-opt2 < try16-opt2 < try15-opt2 < try13-opt2 <try11-opt2
		< try10-opt2 < chimera-try9-try4
	try9-opt2  < try7-opt2 < 1xv2A (hand1.a2m)
	try8-opt2 <  1xv2A (hand1.a2m)
	try4-opt2 < try3-opt2 < alignments (2fug7)
	try1-opt2 automatic model < alignments (2fug7)

A more detailed explanation of *why* certain choices were made or what
was done in ProteinShop would be good, so that I can write the method file.


Make sure that Makefile defines MANUAL_TOP_HITS with the top 10 or 20
chains from T0312.best-scores.rdb (supplemented by any other templates
you considered). You can also pick up the list from
T0312.t06.best-scores.rdb (or t2k or t04) if one of those is the
alignment you chose to work most with.  Check to be sure that the
templates actually chosen by the method are included.

I find what templates were mainly used for try3 by looking at the
try3.log.gz file, and using the "occur" command in emacs, looking for "best".

838 lines matching "best" in buffer try3.log.gz<2>.
    355:# best score in alignment pool out of 11: T0312+T0312-1xv2A-t04-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m:1xv2A at pool[7] 420.537 cost/residue, 212 clashes 0.469407 breaks
    544:# best score in alignment pool out of 21: T0312.try3-al1+T0312-1xv2A-t06-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m:1xv2A at pool[17] 294.094 cost/residue, 272 clashes 0.358772 breaks
    914:# best score in alignment pool out of 40: T0312.try3-al1A 294.094 cost/residue, 272 clashes 0.358772 breaks
  27240:# best score in alignment pool out of 1151: T0312.try3-al2 294.094 cost/residue, 272 clashes 0.358772 breaks
  27250:# best score in alignment pool out of 1151: T0312.try3-al3+all-align.a2m:2fug7 at pool[675] 285.663 cost/residue, 421 clashes 0.358772 breaks
  27260:# best score in alignment pool out of 1151: T0312.try3-al4 285.663 cost/residue, 421 clashes 0.358772 breaks
  27270:# best score in alignment pool out of 1151: T0312.try3-al5+all-align.a2m:2fug7 at pool[968] 285.663 cost/residue, 421 clashes 0.358772 breaks
  27280:# best score in alignment pool out of 1151: T0312.try3-al6 285.663 cost/residue, 421 clashes 0.358772 breaks
  27290:# best score in alignment pool out of 1151: T0312.try3-al7 285.663 cost/residue, 421 clashes 0.358772 breaks
 123469:# best score in initial pool out of 20: T0312.try3 at pool[10] 269.716 cost/residue, 286 clashes 0.358755 breaks
...
 135821:# best score in super_pool out of 20: T0312.try3-scwrl at pool[7] 196.84689 cost/residue, 201 clashes 0.11314 breaks

This tells me that the optimization worked mainly with 1xv2A and 2fug7
as its templates, with 2fug7 as the finally chosen one.


----------------------------------------------------------------------

Mailing predictions (RR and TS formats)

Date: Mon, 22 May 2006 15:04:58 -0700
From: Kevin Karplus
Subject: mailing contact predictions


I have set up a new target in Make.main for mailing residue-residue
contact predictions.  To mail contact predictions to the casp7
submission site,
	make mail_contact_pred

I have mailed the predictions for T0288, to test the make target, and
to make sure that the T0288 submission was complete.

I expect George to do the mailing of contact predictions on other
targets when he is ready.

I will continue to do the mailing of the 3D files, which *can* be done with
	make email
but which is really a multi-step process:
	edit the superimpose-best.under file to select the models to submit
	make best-models.pdb.gz
	make T0232.method and edit it to be specific for target.
		(Alternatively, one can make model1.method, ... , model5.method
		and edit each separately.)
	Add a MANUAL_TOP_HITS macro to Makefile, listing the templates
		to be reported as parents.

		Selecting the top 20 or so hits from
		T0232.best-scores.rdb is the best way to do this.

	make casp_models
	edit the model1.ts ... model5.ts files to change parents, if needed
		(generally, I only edit the parents for models created
		by sidechain replacement on an alignment to a single template)
	make email

Repeat:
	I am responsible for mailing 3D (TS) files.
	George is responsible for mailing RR files.


----------------------------------------------------------------------
Mailing dimer predictions

Dimer predictions are a bit trickier to mail than monomers, since we
have to keep the chain IDs around through the whole process, and some
of the processing we use for the monomers loses the chain IDs.

First, create the dimers with separate chains (instead of one long
chain), but no TER record.  There is a target for this.  To convert
decoys/T0300.try5-opt2.pdb.gz just make decoys/T0300.try5-opt2.unpack.pdb.gz
in the dimer directory.

In the dimer/Makefile, you need to have targets for each of the dimer.ts models:
dimer1.ts:
	$(call model_to_ts,try5-opt2,1)
dimer2.ts:
	$(call model_to_ts,try4-opt2,2)
...

You then make T0300.method and edit it as usual.
make dimer_models
make email_dimers

Because of the way the dimer*.ts files are created, you have to use a
single method file for all the dimers, not a separate method file for each.

If you have a dimer (or other multimer) to submit that is *not* the
result of a standard try script (for example, a dimer that comes just
form the initial superposition, without further optimization), you can
still submit it, but the procedure is slightly different.

First, make sure that the file you wish to submit has a proper
"MODEL        1" record before the atoms.

Second, add to the dimer Makefile

dimer4.ts:
	$(call  modelfullname_to_ts,decoys/dimer-try1-2fs2A.pdb.gz,4)

and proceed as before.


----------------------------------------------------------------------

Confirming submission of files

They are not sending confirmations this year---too much useless e-mail.

Instead, you can check the status for servers on
http://www2.predictioncenter.org/menu_frames.html
and for your own group on
http://predictioncenter.org/casp7/models/casp7-models.html


----------------------------------------------------------------------

Adding specific templates to the set used by undertaker


Date: Thu, 1 Jun 2006 09:51:20 -0700
From: Kevin Karplus

If you want to add some pairwise alignments to the set that are used
for undertaker, the process is

1) Add the list of PDB chains you want used to the Makefile as MANUAL_TOP_HITS.
   Warning: this list is used for identifying the PARENT in the
   submitted model file, so include all the top hits, not just the
   extras you want.

   For example, T0288/Makefile has

MANUAL_TOP_HITS:= 2fneA 1xz9A 1t2mA 1wfvA 1x6dA 2fcfA 1g9oA 1ihjA 1q3oA 1mfgA

2) run
	make extra_alignments
   which makes sure that all the MANUAL_TOP_HITS have had their
   pairwise alignments made.

3) run
	make read_alignments
   which creates the read-alignments-scwrl.under and read-alignments-noscwrl.under
   scripts in the subdirectories

4) If desired, you can
	make all-align.a2m.gz
   which ensures that most undertaker runs have access to all the
   pairwise alignments.

5) Alternatively, you can modify the try*.under undertaker script to
   use the (normally commented out)
   	InfilePrefix 1xxxX/
	include read-alignments-scwrl.under
   inputs to pick up the pairwise alignments.  You might want to move
   the reading relative to the TryAllAlign commands---perhaps moving
   it before the first TryAllAlign, though the default place in
   try1.under is ok.


----------------------------------------------------------------------

To generate sheet constraints from an alignment
Tue Jul 11 15:55:42 PDT 2006 Kevin Karplus


If you have chosen a template, and would like to get sheet constraints
from a particular alignment to guide the initial selection of models,
you can write an undertaker script, like the show-align.under script
in starter-directory/

For example, to get sheet constraints for 1eg5A and 1p3wA in T0339,
you would want the lines

InfilePrefix 1eg5A/
ReadFragmentAlignment NOFILTER SCWRL T0339-1eg5A-t2k-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m
InfilePrefix 1p3wA/
ReadFragmentAlignment NOFILTER SCWRL T0339-1p3wA-t2k-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m
PrintAlignmentsSheets T0339.1eg5A-1p3wA.sheets

to generate the sheet constraints from the usually best local alignment.

(You need all the usual startup stuff for undertaker---see
starter-directory/show-align.under)


----------------------------------------------------------------------
To focus on a particular multiple alignment

Tue Jul 11 15:58:09 PDT 2006 Kevin Karplus

Sometimes only one of the multiple alignment methods (tr2k, t04, or
t06) seems to find a reasonable number of homologs. (One can have too
few and not enough evolutionary signal or too many and loss of focus
on the target sequence.)

To focus on just one alignment, there are two things to do:

1) In the Makefile (before the include) set
	PREFERRED_AL_METHOD := t2k
   The default is currently t06, so if that is your preferred
   alignment, you don't need to do this step.

   Then run "make -k" to remake everything.  This generally does very
      	little, unless something has changed since the first make, but
      	it takes a while to go through and make sure all the
      	alignments are there.  The main effect will be for the short
      	names for the rasmol scripts to be linked to the preferred AL method.

2) use only pairwise alignments based on the preferred HMM for
   testing templates:

   One way to do this would be to put all the reasonable templates
   (basically the top 10 or 20 hits in T0329.t06.best-scores.rdb) into
   MANUAL_TOP_HITS in the Makefile, do
        make extra_alignments
        make read_alignments
        foreach x  (*/read-alignments-scwrl.under)
        grep -h t06 $x > $x:s/scwrl/t06-scwrl/
        end

   Then include each of the read-alignments-t06-scwrl.under files to read
   in the alignments in the try.under script.


----------------------------------------------------------------------

To download and score server predictions

Date: Mon, 22 May 2006 18:41:19 -0700
From: Kevin Karplus
Subject: looking at server predictions


To download and score server predictions

1) On the file server silo
	make fetch_tarball unpack_tarball

2) On a workstation
	make decoys/score-all+servers.try1.pretty

Creating the read-pdb+servers.under script on a workstation is
sometimes quite slow, so you can run
	make decoys/read-pdb+servers.under
on silo if you need to.  Don't run undertaker or anything
computationally intensive on silo---it is really only to be used as a
file server and for I/O intensive tasks that would be much too slow on
a workstation.

Note: you should add
	missing_atoms 1
to your costfcn if it is not already there, since otherwise incomplete
models may come out looking extremely good.

Further note: models that have only CA atoms (some servers return such
crummy models) will fail to produce a SCWRL'ed model, and the
NameConform command in the script will erroneously cause the -scwrl
addition to be given to the unSCWRLed model.


----------------------------------------------------------------------
To score server models for many targets.

Server models are often picked up for several targets at a time, and
it is useful to score the server models for them on the farm cluster.

To do so, create a file listing one target per line (say /tmp/targ.ids)
Then run the following command to request a scoring run for each
target in the list:

para-trickle-make -many -makefile Makefile -targets 'decoys/score-all+servers.unconstrained.rdb decoys/score-all+servers.unconstrained.pretty' -no2letter -modelsdir ~/casp7 -se2log < /tmp/targ.ids


----------------------------------------------------------------------
To get Robetta models that were not submitted to CASP:

Firas Khatib


To get Robetta models that were not submitted to CASP go to
    http://robetta.bakerlab.org/queue.jsp?UserName=casp7&rpp=100
and click on the ID number in the left column for the target you want.
For T0361 you would click on 7991, for example.

Scroll down to "Ginzu Domain Prediction" and if there is a "Reference
Parent" then it should have 5 models (that we have already downloaded)
If under "Reference Parent" you see -- and under "Source" it says "cutpref"
then click on "domain 1" and you will see the 10 pdb models.

Under each image of the PDB model there are 3 icons: PDB,Rasmol, and a file.
Click on the file and save it!

[Tue Jul 11 16:07:15 PDT 2006 Kevin Karplus
There is a target in Make.main for fetching the robetta targets:
	make fetch_robetta
which tries to pick up the top ten models, so Firas's method above
should only be needed for picking up subdomain models.
]


----------------------------------------------------------------------

To evaluate a model once the correct solution is released

When the CASP organizers have posted the PDB id for a target, the
predictions can be evaluated by

1) defining REAL_PDB:=2gw2A in the Makefile before the include

2) making targets: decoys/evaluate.rdb decoys/evaluate.pretty
	(better, use targets decoys/evaluate.unconstrained.rdb
	decoys/evaluate.unconstrained.pretty)

If several targets need to be evaluated, they can be sent to the farm cluster.
Put the list of targets (one per line) in a temporary file (say /tmp/ids)
and use para-trickle-make:

para-trickle-make -many -makefile Makefile \
  -targets 'decoys/evaluate.unconstrained.rdb decoys/evaluate.unconstrained.pretty' \
  -no2letter -modelsdir ~/casp7 -se2log < /tmp/ids


----------------------------------------------------------------------