EMERGENCY PHONE NUMBERS ----------------------- Saturday, I will be at Jorge's house: 423-1356 Sunday, if I am not to be found at Jorge's house, I will check my messages at home: 454-0486. ELECTRONIC ADDRESSES YOU SHOULD KNOW ------------------------------------ http://www.mrc-cpe.cam.ac.uk/casp2/criteria.html This URL contains a multitude of information about submitting predictions, including the three key email addresses below. To submit: submit@sb7.llnl.gov To test: stest@sb7.llnl.gov In both cases, a response will be emailed, stating either that the prediction was parsed successfully or that it could not be parsed. Response time here is not immediate: half an hour is typical. Well-formatted predictions generate longer response time than imporperly-formatted predictions. For system problems, email squery@sb7.llnl.gov CONSTRUCTING THE PREDICTION --------------------------- Prerequisites (a) If making only whole-chain predictions, you need a library list. The first column of the *.wholescore file will work. The file rev15.chainids is such a list, created with the command gawk '$1!~/#/ && NF>1 {print $1}' \ < ../t0011/T0011.relative.rev15.wholescores > rev15.chainids The file consists of a list of all chains in our library. Each structure and chain is listed on a separate line as shown: 1bn21 1cbp 1croA 1eps 1hmcB 1kanA 1mli Make sure you do not have blank lines in your library list! (b) If making a domain prediction, you need a domain library list as well: casp2domain-library (c) one or more .a2m file containing an alignment of the target to one or more sequence from the library. The target must be the FIRST sequence in the a2m file, and the aligned sequence must be a WHOLE CHAIN, not just a domain (though only a portion of the chain needs to be in MATCH columns). Why? Because there is a problem with the perl scripts accepting domain identifiers. STEPS FOR A WHOLE CHAIN PREDICTION ---------------------------------- 1. Make a directory for the target in experiments/casp2/submit cd to this directory 2. Copy the file author from one of the other target directories (eg. casp2/submit/t0004) Edit by hand if desired (ESPECIALLY, check to see if Liisa should be included or excluded). 3. Hand-edit a description of the method, and place in file comments. Or, copy the file comments from one of the other target directories. 4. Copy the target sequence from the appropriate file .seq (eg. t0004.seq) to sequence. You can copy them from the directory /projects/compbio/experiments/casp2//.seq If you do this, you will need to hand-edit the file sequence so that all lines except for the first one begin with four blank characters. EVEN MORE IMPORTANT---DON'T HAVE ANYTHING AFTER THE NAME---the script is too stupid to stop at the end of the name, and copies the rest of the line into the sequence! ALSO, the name must be in all uppercase with no punctuation after it. (The script really needs to be fixed here, or we'll lose some submission due to this.) 5. Build the score file by running the following command: ../cline.scripts/list2score.pl < >score where is the name of the target, uppercase (eg. T0011), and where is the library list as described in the prerequisite (a). 6. Copy into this directory the alignment of the target sequence to a sequence of known structure (part (c) of prereqs). Run the following command: a2m.2.pdb.align.pl >align example a2m.2.pdb.align.pl test.a2m T0011 1eaf >align If at this step you get a message saying that your known structure is not in the file, check the labelling of the alignment. This structure should be labeled as >core_ or >coreC where core is the four letter structure name and C is the chain identifier. See t0011/test.a2m for an example. Note: be sure to remove trailing commas added by SAM, and type the known-structure name on the command line exactly as entered in the file (including trailing "_", if any.) NOTE: some of this advice is obsolete, as a2m.2.pdb.align.pl is now much more robust about sequence names. Suggestion: after building the align file, check the pdb numbers against the hssp file by hand. 7. Run the command submit >submit. submit is a c program that must be run on a sparc platform. 8. Edit submit. In the tscore section, the fourth column (initialized to 0.0) reflects the probability with which we predict each member of the library. Hand-edit this column. So, most of the column will remain 0.0, a few will change to some intuitively chosen value, and altogether they must add up to 1.0. 9. Mail the contents of submit to the test address, then the actual address. VARIATIONS FOR DOMAIN LIBRARY PREDICTIONS ----------------------------------------- 1-4. Same as above 5. Run the following command: ../cline.scripts/strsub.2.tscore.dom.pl < ../casp2domain-library \ > score where target is the target name (eg. T0011). 6-7. Same as above 8. Run the command cat ../casp2domain-library >>submit 8. Edit submit, and change the following: - in the tscore section, edit the values in the 0.0 column as described above in step 8. - in the align section, the eighth column contains the domain index. This is 0 for whole proteins, and is set to 0 by default. Edit this to reflect the domain being used (eg. for 3aahA_3, set the column to 3). Note: the scripts have been improved so tha this should now be set automatically, but check it anyway! WATCHOUT: the CASP2 system will complain if the bounds of the alignment are not inside the bounds of the domain! If this is the case, if the domain boundary is X and the alignment boundary is X+2, pick one to change!