EMERGENCY PHONE NUMBERS
-----------------------
Saturday, I will be at Jorge's house: 423-1356
Sunday, if I am not to be found at Jorge's house, I will check my
messages at home: 454-0486.

ELECTRONIC ADDRESSES YOU SHOULD KNOW
------------------------------------
http://www.mrc-cpe.cam.ac.uk/casp2/criteria.html
This URL contains a multitude of information about submitting
predictions, including the three key email addresses below. 

To submit: submit@sb7.llnl.gov
To test: stest@sb7.llnl.gov

In both cases, a response will be emailed, stating either that the prediction 
was parsed successfully or that it could not be parsed.  Response time
here is not immediate: half an hour is typical.  Well-formatted
predictions generate longer response time than imporperly-formatted
predictions.

For system problems, email squery@sb7.llnl.gov


CONSTRUCTING THE PREDICTION
---------------------------
Prerequisites

(a) If making only whole-chain predictions, you need a library list.
The first column of the *.wholescore file will work.  The file
rev15.chainids is such a list, created with the command

	 gawk '$1!~/#/ && NF>1 {print $1}' \
		< ../t0011/T0011.relative.rev15.wholescores > rev15.chainids 

The file consists of a list of all chains in our library.  Each
structure and chain is listed on a separate line as shown:

1bn21
1cbp
1croA
1eps
1hmcB
1kanA
1mli

Make sure you do not have blank lines in your library list!

(b) If making a domain prediction, you need a domain library list as well:
	 casp2domain-library 

(c) one or more .a2m file containing an alignment of the target to 
one or more sequence from the library.  The target must be the FIRST
sequence in the a2m file, and the aligned sequence must be a WHOLE CHAIN,
not just a domain (though only a portion of the chain needs to be in
MATCH columns).  Why?  Because there is a problem with the perl scripts
accepting domain identifiers.

STEPS FOR A WHOLE CHAIN PREDICTION
----------------------------------
1. Make a directory for the target in experiments/casp2/submit
   cd to this directory
   
2. Copy the file author from one of the other target directories 
   (eg. casp2/submit/t0004)
   Edit by hand if desired (ESPECIALLY, check to see if Liisa should
   be included or excluded).

3. Hand-edit a description of the method, and place in file comments.
   Or, copy the file comments from one of the other target directories.

4. Copy the target sequence from the appropriate file <target>.seq
   (eg. t0004.seq) to sequence.  You can copy them from the directory
   /projects/compbio/experiments/casp2/<target>/<target>.seq
   If you do this, you will need to hand-edit the file sequence so
   that all lines except for the first one begin with four blank
   characters.
   
   EVEN MORE IMPORTANT---DON'T HAVE ANYTHING AFTER THE NAME---the script
   is too stupid to stop at the end of the name, and copies the rest
   of the line into the sequence!  ALSO, the name must be in all uppercase
   with no punctuation after it.  (The script really needs to be fixed here,
   or we'll lose some submission due to this.)

5. Build the score file by running the following command:
   ../cline.scripts/list2score.pl <TARGET> < <library_name> >score
   where <TARGET> is the name of the target, uppercase (eg. T0011),
   and where <library_name> is the library list as described in the
   prerequisite (a).

6. Copy into this directory the alignment of the target sequence to
   a sequence of known structure (part (c) of prereqs). 
   Run the following command:
   a2m.2.pdb.align.pl <a2mfile> <target> <known_struct> >align
   example a2m.2.pdb.align.pl test.a2m T0011 1eaf >align

   If at this step you get a message saying that your known structure
   is not in the file, check the labelling of the alignment.  This
   structure should be labeled as >core_ or >coreC where core is the
   four letter structure name and C is the chain identifier.  See 
   t0011/test.a2m for an example.  Note: be sure to remove trailing
   commas added by SAM, and type the known-structure name on the command
   line exactly as entered in the file (including trailing "_", if any.)
   NOTE: some of this advice is obsolete, as a2m.2.pdb.align.pl is now
   much more robust about sequence names.

   Suggestion: after building the align file, check the pdb numbers
   against the hssp file by hand.
   
7. Run the command submit >submit.  submit is a c program that must be
   run on a sparc platform.

8. Edit submit.  In the tscore section, the fourth column
	(initialized to 0.0) reflects the probability with which
	we predict each member of the library.  Hand-edit this
	column.  So, most of the column will remain 0.0, a few
	will change to some intuitively chosen value, and altogether
	they must add up to 1.0.

9. Mail the contents of submit to the test address, then the actual address.


VARIATIONS FOR DOMAIN LIBRARY PREDICTIONS
-----------------------------------------
1-4. Same as above
5. Run the following command:
   ../cline.scripts/strsub.2.tscore.dom.pl <TARGET> < ../casp2domain-library \
    > score
   where target is the target name (eg. T0011).

6-7. Same as above

8.  Run the command
    cat ../casp2domain-library >>submit

8.  Edit submit, and change the following:
	- in the tscore section, edit the values in the 0.0 column
	  as described above in step 8.
        - in the align section, the eighth column contains the
	  domain index.  This is 0 for whole proteins, and is 
	  set to 0 by default.  Edit this to reflect the domain
	  being used (eg. for 3aahA_3, set the column to 3).
	  Note: the scripts have been improved so tha this should now
	  be set automatically, but check it anyway!

    WATCHOUT: the CASP2 system will complain if the bounds of the
    alignment are not inside the bounds of the domain!  If this is
    the case, if the domain boundary is X and the alignment boundary
    is X+2, pick one to change!