Wed May 10 09:39:36 PDT 2006 Kevin Karplus This directory has all the working notes for CASP7 predictions. This README file will be added to as the prediction season progresses, with notes about how to do predictions and changes to the software and procedures. The individual targets are in subdiretories T0283 ... with a README file for each directory discussing the target. It is probably worth reading ../casp6/README for notes on the procedures used in the previous CASP in 2004. ------------------------------------------------------------ TABLE OF CONTENTS Getting started Creating a new try*.under file Making a chimera Handling a homodimer Moving a helix with ProteinShop Viewing Undertaker PDB Hbonds with ProteinShop Moving Dimers with ProteinShop Looking at burial predictions on real proteins BLAST to choose among close templates Using jmol from Firefox on the Linux machines Preparing final models for submission Mailing predictions (RR and TS formats) Mailing dimer predictions Confirming submissions of files Adding specific templates to the set used by undertaker To generate sheet constraints from an alignment To focus on a particular multiple alignment To download and score server predictions To score server models for many targets. To get Robetta models that were not submitted to CASP To evaluate a model once the correct solution is released ------------------------------------------------------------ Wed May 10 09:42:23 PDT 2006 Kevin Karplus Getting Started New targets will generally be started by me (Kevin), unless I am out of town when they are released. The command to start a new target is new-target, and takes a single argument, the target number: casp7/scripts/new-target T0283 This script creates the working directory for the target and puts a Makefile there, but does nothing to actually get the target from the casp7 web site or do the prediction. Those tasks are done with cd T0283 (make -k >& make.log; gzip -9f make.log) & which runs the iterated searches, the local structure predictions, the fold-recognition searches, fragment-finding, contact prediction, and one run of undertaker for full 3D prediction. When it has finished, we need to look at the alignments and predictions done, improve the cost function (and possibly the input alignments) for undertaker and generally tweak things up until we are satisfied. ------------------------------------------------------------ Mon Jun 26 12:14:48 PDT 2006 Kevin Karplus mkdi Creating a new try*.under file All our optimization runs with undertaker are done with an undertaker command file of the form "try7.under", with the numbers increasing sequentially as we try different things. The first one, try1.under, is created automatically by the initial make. It generally serves as a basis for future try*.under files. The first thing to do to create a new try.under file is to copy an existing one (be sure you don't step on one that already exists). I generally do this in emacs, but you can also use "cp try1.under try2.under" The *very* first thing to do, before you even *start* to think about what you want to change is to do a global replacement of try1 by try2 (or try7 by try8, or whatever the names are) in the file. Every year someone forgets to do this and overwrites an existing good result with a bad one. After try1 has finished, the templates that it read from PDB files have been saved in a more compact format in Templates.atoms.gz Generally, you do not want to re-create this file on subsequent runs, so you should comment out the line PrintTemplateAtoms Template.atoms If you are changing what set of PDB files get used as templates (say because you added a bunch of ideas that did not come from the HMM searches), then you can uncomment the line for one run. Note: because the try1.under script refers to the try1.costfcn cost function, you need to create a new cost function for each new try. Generally, it is in the cost function that most our work is done. If you want to generate new models from the alignments, then just copying the try1.under file and replacing "try1" by the new try works fairly well. If you want to focus on a subset of the templates, you need to change which alignments undertaker works from. The default set comes from 5 lines: Include XXX0000.t04.undertaker-align.under Include XXX0000.t06.undertaker-align.under Include XXX0000.t2k.undertaker-align.under Include XXX0000.undertaker-align.under ReadFragmentAlignment NOFILTER SCWRL all-align.a2m If you want to eliminate a particular alignment or template from the set that undertaker considers, you must comment these out. The TryAllAlign commands then become useless, unless you provide a different set of alignments to work from. Uncommenting // InfilePrefix 1xxxX/ // include read-alignments-scwrl.under (with 1xxxX replaced by the actual template chain id) will provide undertaker with specific alignments. If 1xxxX/read-alignments-scwrl.under doesn't exist, you can make it by adding 1xxxX to the space-separated list of chains defining MANUAL_TOP_HITS in the Makefile, then doing make extra_alignments make read_alignments Sometimes you don't want to start from alignments, but from an existing complete model or set of complete models (from previous runs or from automatic servers). To use existing models, comment out all the TryAllAlign commands (and the first SCWRLConform command before OptConform has been called). Uncomment ## InfilePrefix decoys/ ## include read-pdb.under to include all existing models in the decoys directory. (You can do "make decoys/read-pdb.under" if it doesn't exist, but it usually gets created automatically by "make decoys/score-all.try*.pretty") If you want to optimize specific models, don't use the "include read-pdb.under", but provide a ReadConformPDB command for each model you want to consider in the initial set. When you are optimizing existing models, you may want to tweak the pseudocounts for the conformation-change operators: for example, increasing the initial probability of CrossOver between models, and decreasing the initial probability of InsertSpecificFragment. This is not terribly important, as the adaptation will eventually pick out which operators to use, but it can make the process more efficient. ------------------------------------------------------------ Sat May 13 15:20:19 PDT 2006 Kevin Karplus Making a chimera Sometimes one wants to combine two different predictions of a protein, copying some parts from one conformation, some parts from another. Right now, the easiest way to do this is to superimpose the conformations (using a script like superimpose-best.under), then use emacs to do cut-and-paste operations on the superimposed models. you can control the superposition to make things match particularly well where the cutting will occur, so that those residues line up very precisely. ------------------------------------------------------------ Mon May 15 08:04:01 PDT 2006 Kevin Karplus Handling a homodimer This method assumes that you have a pretty good monomer that you want to dimerize based on a template with an existing dimer and then optimize. It is not intended for creating dimers from scratch. 1) create a subdirectory dimer/ (or 3mer/, 4mer/ ...) 2) in dimer create a target fasta file with the lengthened target sequence. For example, T0284.a2m would de >T0284 PA4872, Pseudomonas aeruginosa PAO1, 287 res MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGSVASLQVLAAP DFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVMRTVVELERAGIAALTIEDTL LPAQFGRKSTDLICVEEGVGKIRAALEARVDPALTIIARTNAELIDVDAVIQRTLAYQEA GADGICLVGVRDFAHLEAIAEHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFA AIKATYDCLREERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGSVASLQVLAAP DFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVMRTVVELERAGIAALTIEDTL LPAQFGRKSTDLICVEEGVGKIRAALEARVDPALTIIARTNAELIDVDAVIQRTLAYQEA GADGICLVGVRDFAHLEAIAEHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFA AIKATYDCLREERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE 3) Copy the Makefile to the dimer/ subdirectory, and add a macro (before the include) MONOMER_LENGTH := 287 4) Make a dimer/decoys/ directory 5) Create a script make-dimer.under in the main directory (start with "make make-dimer.under") This script needs to have a properly dimerized template to copy the positioning from and a monomer to dimerize. 6) Create an alignment file that has the target and copies of the best alignment. For example, for T0284, we have T0284/1mumA/1mumA.dimer-a2m modified from T0284-1mumA-t04-local-str2+CB_burial_14_7-1.0+0.4+0.4-adpstyle5.a2m : >T0284 PA4872, Pseudomonas aeruginosa PAO1, 287 res MHRASHHELRAMFRALLDSSRCYHTASVFDPMSARIAADLGFECGILGGS VASLQVLAAPDFALITLSEFVEQATRIGRVARLPVIADADHGYGNALNVM RTVVELERAGIAALTIEDTLLPAQFGRKSTDLICVEEGVGKIRAALEARV DPALTIIARTNAELIDVDAVIQRTLAYQEAGADGICLVGVRDFAHLEAIA EHLHIPLMLVTYGNPQLRDDARLARLGVRVVVNGHAAYFAAIKATYDCLR EERGAVASDLTASELSKKYTFPEEYQAWARDYMEVKE >1mumA sl------HSPGKAFRAALTKENPLQIVGTINANHALLAQRAGYQAIYLS GGGVAAGSLGLPDLGISTLDDVLTDIRRITDVCSLPLLVDADIGFGsSAF NVARTVKSMIKAGAAGLHIEDQVGAKRCGHrPNKAIVSKEEMVDRIRAAV DAKTDPDFVIMARTDALAvEGLDAAIERAQAYVEAGAEMLFPEAITELAM YRQFADAVQVPIlaNITEFGATPLFTTDELRSAHVAMALYPLSAFRAMNR AAEHVYNVLRQegtqksVIDTMQTRNELYESINYYQYEEKLDNL------ farsqvk >1mumB sl------HSPGKAFRAALTKENPLQIVGTINANHALLAQRAGYQAIYLS GGGVAAGSLGLPDLGISTLDDVLTDIRRITDVCSLPLLVDADIGFGsSAF NVARTVKSMIKAGAAGLHIEDQVGAKRCGHrPNKAIVSKEEMVDRIRAAV DAKTDPDFVIMARTDALAvEGLDAAIERAQAYVEAGAEMLFPEAITELAM YRQFADAVQVPIlaNITEFGATPLFTTDELRSAHVAMALYPLSAFRAMNR AAEHVYNVLRQegtqksVIDTMQTRNELYESINYYQYEEKLDNL------ farsqvk 7) in dimer make try1.costfcn, then edit it to have a KnownBreak between the chains: KnownBreak M288 If you want any constraints on the optimization, it is necessary to make multiple copies in the cost function, renumbering the constraints in the later chains (a real pain). Alternatively, you can compute the constraints only on the first monomer. If the monomers are identical, this should not cause any problems. Getting the scoring for predicted alpha may be harder, as generating multiple alignments and predictions for the polyprotein chain will be harder. (We could write scripts to take the monomeric predictions and concatenate them with renumbering, but haven't yet done this.) It may be easiest just to comment out the CreatePredAlphaCost commands of the costfcn, and remove the pred_alpha components. Once you have an acceptable dimer, you want to optimize it, keeping it dimerized in roughly the same orientations. If you read in a dimer with ReadConformPDB, be sure to mark it as a dimer by following the read command with Multimer 2 as a separate command to label the dimer as a cyclic dimer. Note: if the multimer is *not* cyclic then *don't* label it, as undertaker will try to symmetrize it. You can do the optimization as usual, but use "multimer 2" in the OptConform arguments. Any alignments (for fragments and the like) can be gotten from the original monomeric runs. You probably want to reduce the duration of the run (by reducing num_gen, gen_size, super_iter, and/or super_num_gen), because multimeric runs take longer than monomeric ones. You can also read the Template.atoms file from the monomeric directory, avoiding duplicating that file. You might want to turn off TweakMultimer at first if you are trying to pack a tight interface, as it will tend to move monomers apart to reduce clashes. But if you have a loose interface, you definitely want TweakMultimer on to try to tighten up the interface. It may be necessary to add some inter-chain constraints to hold the dimer together. Even without TweakMultimer on, undertaker may find a way to alleviate clashes by moving parts of the dimer away from each other as it did in try1 (of T0284/dimer). Note: you don't always want "multimer 2" for a dimer or "multimer 4" for a tetramer. What the command (or option to OptConform) do is to force the creation of a cyclic multimer. That is the transform that takes A to B will take B back to A for a dimer, or T(A->B) = T(B->C) = T(C->D) = T(D->A) for a tetramer. Not all multimers are cyclic! You can still optimize non-cyclic multimers in undertaker, but you must *not* use the multimer command or option to OptConform. This will cause each chain to be separately optimized but the "OptSubtree" method will tend to rearrange the transformation between chains. You can optimize a mixture of cyclic and non-cyclic dimers in OptConform if they are initially labeled with Multimer commands and OptConform has no "multimer" keyword (or, equivalently, "multimer 0"). If OptConform has "multimer 2" set, then all multimers will be set ot be cyclic dimers. Note: you can do optimization of a some tetramer with symmetry S_{2,2} by telling OptConform to use "multimer 2". You don't get the full symmetry, but you will get some symmetry: chain A and chain B will be independently optimized, but chain C and chain D will be copies of chains A and B and T(AB->CD)= T(CD->AB). NOTE: gromacs doesn't like big chain breaks, and it will not see the multimer merged into a single chain as two chains. To get gromacs to optimize a multimer, you need to unpack the multimer into separate chains: cd casp7/T0332/dimer make decoys/T0332.try2-opt2.unpack.pdb.gz decoys/T0332.try2-opt2.unpack.gromacs0.pdb.gz You can get this to happen for you automatically if you use cd casp7/T0332/dimer (make T0332.mult2 >& do2.log; gzip -9f do2.log)& instead of the monomer version (make T0332.do2 >& do2.log; gzip -9f do2.log)& Sat Jul 1 13:35:27 PDT 2006 Kevin Karplus I made a small change to undertaker, adding force_alignment fragment_only options to ReadFragmentAlignment, so that I could force undertaker to treat the short fragments as being a complete alignment or not being treated as an alignment at all (just fragments). If neither option is provided, then it is added to the alignment library only if it is multiple fragments or a sufficiently long single fragment (something like half the total protein length). For multimers, you can include force_alignment in the ReadFragmentAlignment command that specifies the multimer, to avoid losing an alignment that has only a short piece aligned to show what corresponds. ------------------------------------------------------------ Date: Mon, 15 May 2006 14:48:15 -0700 From: "Firas Khatib" To: "Kevin Karplus" Subject: ProteinShop discovery! I finally figured out how to lock 1 secondary structure element and select the coils on either side to only move THAT ss element and the coils, leaving the rest of the proten intact! Ctrl-Shift-Left Button on the coil toggles the activation state of a coil region small victories with Proteinshop! :) --Firas ------------------------------------------------------------ Date: Tue, 13 Jun 2006 16:17:25 -0700 From: "Firas Khatib" To: "Kevin Karplus" Subject: Proteinshop discovery! I figured out a quick and easy way to get Proteinshop's hydrogen bonds visualizer to work with Undertaker PDB files! This can be very useful tool, since Proteinshop can also display the hydrogen cages and hydorgen bond sites, so moving strands with Proteinshop can be easier! The solution is to open undertaker's PDB in molmol, clicking RIBBONS (which turns on molmol's ssa) and saving the file. This new file has Hydrogens in it (determined by molmol of course) and you can open it with Proteinshop and the hydrogen bonds will appear! ---Firas ------------------------------------------------------------ Using Dimers with ProteinShop: Since Proteinshop does not deal with chainbreaks (it connects any gaps in the chain with a line that cannot be shrunk) you have to do the following if you want to move 1 dimer relative to the other: You need to save each chain in your dimer as 2 different files and open them both with Proteinshop. You will notice that if you move anything it will move BOTH chains (which doesn't help you in any way). Under the toolbar click on "Windows" and "Show Selection Dialog". Now you can select the chain you want to move (but you will notice that it still moves both!) You must then use the knobs on the "Protein Selection Dialog" to move the chain into the position that you want. This is tricky, but not too bad. Note that even if you turn "Visualize Atom Collisions" on, it will not show you any clashes BETWEEN your two chains! When you have it aligned the way you want you save your file and then you have to cat both chains together and run my script renumberChain.pl (which is located in ~/casp7/scripts) to have them numbered correctly. (you might also have to replace all the chain letters). Then load it back into Proteinshop and turn on "Visualize Atom Collisions" to see if you have any clashes you can quickly fix. ------------------------------------------------------------ Looking at burial predictions on real proteins. Martin suggested looking at burial predictions on real proteins, to see what they looked like there, before trying to modify unknown proteins to fit some pre-conceived notion of how burial should look. Here is my reply: Date: Mon, 15 May 2006 18:14:06 -0700 From: Kevin Karplus Subject: Re: Burial predictions for known 3D structures It is certainly worthwhile to look at what the predictors are doing. Predictions have been run for many of the proteins in the template library (for example, the test set that Grant has been using for fold-recognition tests). The list of ids in that test set is in pcem/indexes/dunbrack-in-scop-2005-folds.ids (pcem is a soft link I use for /projects/compbio/experiments/models.97/) The predictions for 1w2wA, for example, would be in directory pcem/pdb/1w/1w2wA/nostruct-align/ with names like 1wswA.t2k-near-backbone-11.rdb We haven't set up rasmol scripts for them, but this would be a fairly easy change to the pcem/Makefile.models97 file, since the perl script for creating the rasmol scripts is called from other makefiles (such as casp7/starter-directory/Make.main). I agreee that there are exposed residues on T0283.try4-opt2 that should be buried, but I've not looked at what Firas has done to the model yet. I had assumed that he had done nothing so far, since he had not put any notes in the README file. ------------------------------------------------------------ BLAST to choose among close templates Date: Thu, 18 May 2006 16:03:27 -0700 From: Kevin Karplus Subject: new target in Make.main The casp7 Make.main file has a new target ${TARGET}.pdb.blast This does a quick blastp of the dunbrack-pdbaa subset of the pdb database using the target sequence and returns a short table of the top hits. This may be a good way to choose top templates when there are many close templates. The HMMs tend to pick templates that match the consensus of the model, rather than the specific target. This is good for distant fold recognition, but may choose poor templates when there are many very close ones. ------------------------------------------------------------ Using jmol under Firefox from LINUX boxes Thu Jun 29 13:36:40 PDT 2006 Kevin Karplus Using jmol has not been working from the PDB website on the Linux boxes in the labs, though other machines (such as Mac OS X) running Firefox have had no problems. I asked the sysadmins how to fix it and got the following technique: mkdir ~/.mozilla/plugins cd ~/.mozilla/plugins ln -s /usr/java/jre1.5.0_06/plugin/i386/ns7/libjavaplugin_oji.so . (Warning: the jre version may vary depending on the computer, which may also cause problems.) I have not tested this yet, but will update the README file after I have tested it. ---------------------------------------------------------------------- Preparing final models for submission Tue Jul 11 11:29:30 PDT 2006 Kevin Karplus The details for doing an actual submission are below, but everyone needs to know how to get a submission ready for me to look at. Have a superimpose-best.under file in the main directory and another one in the dimer/ directory (for models that need a DIMER submission as well). There should be exactly 5 models in the superimpose-best.under---the five to be submitted, best first. (If there are questions, then you can include more, but be sure to spell out exactly what decisions are needed in the README file---the number needs to be reduced to 5 for the final submission.) Do "make best-models.pdb.gz" to gather the selected predictions into one file. Have an explanation of the history of each model to be submitted in the README file. For example, for T0312 Probable current submission: try17-opt2 < try16-opt2 < try15-opt2 < try13-opt2 . 355:# best score in alignment pool out of 11: T0312+T0312-1xv2A-t04-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m:1xv2A at pool[7] 420.537 cost/residue, 212 clashes 0.469407 breaks 544:# best score in alignment pool out of 21: T0312.try3-al1+T0312-1xv2A-t06-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m:1xv2A at pool[17] 294.094 cost/residue, 272 clashes 0.358772 breaks 914:# best score in alignment pool out of 40: T0312.try3-al1A 294.094 cost/residue, 272 clashes 0.358772 breaks 27240:# best score in alignment pool out of 1151: T0312.try3-al2 294.094 cost/residue, 272 clashes 0.358772 breaks 27250:# best score in alignment pool out of 1151: T0312.try3-al3+all-align.a2m:2fug7 at pool[675] 285.663 cost/residue, 421 clashes 0.358772 breaks 27260:# best score in alignment pool out of 1151: T0312.try3-al4 285.663 cost/residue, 421 clashes 0.358772 breaks 27270:# best score in alignment pool out of 1151: T0312.try3-al5+all-align.a2m:2fug7 at pool[968] 285.663 cost/residue, 421 clashes 0.358772 breaks 27280:# best score in alignment pool out of 1151: T0312.try3-al6 285.663 cost/residue, 421 clashes 0.358772 breaks 27290:# best score in alignment pool out of 1151: T0312.try3-al7 285.663 cost/residue, 421 clashes 0.358772 breaks 123469:# best score in initial pool out of 20: T0312.try3 at pool[10] 269.716 cost/residue, 286 clashes 0.358755 breaks ... 135821:# best score in super_pool out of 20: T0312.try3-scwrl at pool[7] 196.84689 cost/residue, 201 clashes 0.11314 breaks This tells me that the optimization worked mainly with 1xv2A and 2fug7 as its templates, with 2fug7 as the finally chosen one. ---------------------------------------------------------------------- Mailing predictions (RR and TS formats) Date: Mon, 22 May 2006 15:04:58 -0700 From: Kevin Karplus Subject: mailing contact predictions I have set up a new target in Make.main for mailing residue-residue contact predictions. To mail contact predictions to the casp7 submission site, make mail_contact_pred I have mailed the predictions for T0288, to test the make target, and to make sure that the T0288 submission was complete. I expect George to do the mailing of contact predictions on other targets when he is ready. I will continue to do the mailing of the 3D files, which *can* be done with make email but which is really a multi-step process: edit the superimpose-best.under file to select the models to submit make best-models.pdb.gz make T0232.method and edit it to be specific for target. (Alternatively, one can make model1.method, ... , model5.method and edit each separately.) Add a MANUAL_TOP_HITS macro to Makefile, listing the templates to be reported as parents. Selecting the top 20 or so hits from T0232.best-scores.rdb is the best way to do this. make casp_models edit the model1.ts ... model5.ts files to change parents, if needed (generally, I only edit the parents for models created by sidechain replacement on an alignment to a single template) make email Repeat: I am responsible for mailing 3D (TS) files. George is responsible for mailing RR files. ---------------------------------------------------------------------- Mailing dimer predictions Dimer predictions are a bit trickier to mail than monomers, since we have to keep the chain IDs around through the whole process, and some of the processing we use for the monomers loses the chain IDs. First, create the dimers with separate chains (instead of one long chain), but no TER record. There is a target for this. To convert decoys/T0300.try5-opt2.pdb.gz just make decoys/T0300.try5-opt2.unpack.pdb.gz in the dimer directory. In the dimer/Makefile, you need to have targets for each of the dimer.ts models: dimer1.ts: $(call model_to_ts,try5-opt2,1) dimer2.ts: $(call model_to_ts,try4-opt2,2) ... You then make T0300.method and edit it as usual. make dimer_models make email_dimers Because of the way the dimer*.ts files are created, you have to use a single method file for all the dimers, not a separate method file for each. If you have a dimer (or other multimer) to submit that is *not* the result of a standard try script (for example, a dimer that comes just form the initial superposition, without further optimization), you can still submit it, but the procedure is slightly different. First, make sure that the file you wish to submit has a proper "MODEL 1" record before the atoms. Second, add to the dimer Makefile dimer4.ts: $(call modelfullname_to_ts,decoys/dimer-try1-2fs2A.pdb.gz,4) and proceed as before. ---------------------------------------------------------------------- Confirming submission of files They are not sending confirmations this year---too much useless e-mail. Instead, you can check the status for servers on http://www2.predictioncenter.org/menu_frames.html and for your own group on http://predictioncenter.org/casp7/models/casp7-models.html ---------------------------------------------------------------------- Adding specific templates to the set used by undertaker Date: Thu, 1 Jun 2006 09:51:20 -0700 From: Kevin Karplus If you want to add some pairwise alignments to the set that are used for undertaker, the process is 1) Add the list of PDB chains you want used to the Makefile as MANUAL_TOP_HITS. Warning: this list is used for identifying the PARENT in the submitted model file, so include all the top hits, not just the extras you want. For example, T0288/Makefile has MANUAL_TOP_HITS:= 2fneA 1xz9A 1t2mA 1wfvA 1x6dA 2fcfA 1g9oA 1ihjA 1q3oA 1mfgA 2) run make extra_alignments which makes sure that all the MANUAL_TOP_HITS have had their pairwise alignments made. 3) run make read_alignments which creates the read-alignments-scwrl.under and read-alignments-noscwrl.under scripts in the subdirectories 4) If desired, you can make all-align.a2m.gz which ensures that most undertaker runs have access to all the pairwise alignments. 5) Alternatively, you can modify the try*.under undertaker script to use the (normally commented out) InfilePrefix 1xxxX/ include read-alignments-scwrl.under inputs to pick up the pairwise alignments. You might want to move the reading relative to the TryAllAlign commands---perhaps moving it before the first TryAllAlign, though the default place in try1.under is ok. ---------------------------------------------------------------------- To generate sheet constraints from an alignment Tue Jul 11 15:55:42 PDT 2006 Kevin Karplus If you have chosen a template, and would like to get sheet constraints from a particular alignment to guide the initial selection of models, you can write an undertaker script, like the show-align.under script in starter-directory/ For example, to get sheet constraints for 1eg5A and 1p3wA in T0339, you would want the lines InfilePrefix 1eg5A/ ReadFragmentAlignment NOFILTER SCWRL T0339-1eg5A-t2k-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m InfilePrefix 1p3wA/ ReadFragmentAlignment NOFILTER SCWRL T0339-1p3wA-t2k-local-str2+near-backbone-11-0.8+0.6+0.8-adpstyle5.a2m PrintAlignmentsSheets T0339.1eg5A-1p3wA.sheets to generate the sheet constraints from the usually best local alignment. (You need all the usual startup stuff for undertaker---see starter-directory/show-align.under) ---------------------------------------------------------------------- To focus on a particular multiple alignment Tue Jul 11 15:58:09 PDT 2006 Kevin Karplus Sometimes only one of the multiple alignment methods (tr2k, t04, or t06) seems to find a reasonable number of homologs. (One can have too few and not enough evolutionary signal or too many and loss of focus on the target sequence.) To focus on just one alignment, there are two things to do: 1) In the Makefile (before the include) set PREFERRED_AL_METHOD := t2k The default is currently t06, so if that is your preferred alignment, you don't need to do this step. Then run "make -k" to remake everything. This generally does very little, unless something has changed since the first make, but it takes a while to go through and make sure all the alignments are there. The main effect will be for the short names for the rasmol scripts to be linked to the preferred AL method. 2) use only pairwise alignments based on the preferred HMM for testing templates: One way to do this would be to put all the reasonable templates (basically the top 10 or 20 hits in T0329.t06.best-scores.rdb) into MANUAL_TOP_HITS in the Makefile, do make extra_alignments make read_alignments foreach x (*/read-alignments-scwrl.under) grep -h t06 $x > $x:s/scwrl/t06-scwrl/ end Then include each of the read-alignments-t06-scwrl.under files to read in the alignments in the try.under script. ---------------------------------------------------------------------- To download and score server predictions Date: Mon, 22 May 2006 18:41:19 -0700 From: Kevin Karplus Subject: looking at server predictions To download and score server predictions 1) On the file server silo make fetch_tarball unpack_tarball 2) On a workstation make decoys/score-all+servers.try1.pretty Creating the read-pdb+servers.under script on a workstation is sometimes quite slow, so you can run make decoys/read-pdb+servers.under on silo if you need to. Don't run undertaker or anything computationally intensive on silo---it is really only to be used as a file server and for I/O intensive tasks that would be much too slow on a workstation. Note: you should add missing_atoms 1 to your costfcn if it is not already there, since otherwise incomplete models may come out looking extremely good. Further note: models that have only CA atoms (some servers return such crummy models) will fail to produce a SCWRL'ed model, and the NameConform command in the script will erroneously cause the -scwrl addition to be given to the unSCWRLed model. ---------------------------------------------------------------------- To score server models for many targets. Server models are often picked up for several targets at a time, and it is useful to score the server models for them on the farm cluster. To do so, create a file listing one target per line (say /tmp/targ.ids) Then run the following command to request a scoring run for each target in the list: para-trickle-make -many -makefile Makefile -targets 'decoys/score-all+servers.unconstrained.rdb decoys/score-all+servers.unconstrained.pretty' -no2letter -modelsdir ~/casp7 -se2log < /tmp/targ.ids ---------------------------------------------------------------------- To get Robetta models that were not submitted to CASP: Firas Khatib To get Robetta models that were not submitted to CASP go to http://robetta.bakerlab.org/queue.jsp?UserName=casp7&rpp=100 and click on the ID number in the left column for the target you want. For T0361 you would click on 7991, for example. Scroll down to "Ginzu Domain Prediction" and if there is a "Reference Parent" then it should have 5 models (that we have already downloaded) If under "Reference Parent" you see -- and under "Source" it says "cutpref" then click on "domain 1" and you will see the 10 pdb models. Under each image of the PDB model there are 3 icons: PDB,Rasmol, and a file. Click on the file and save it! [Tue Jul 11 16:07:15 PDT 2006 Kevin Karplus There is a target in Make.main for fetching the robetta targets: make fetch_robetta which tries to pick up the top ten models, so Firas's method above should only be needed for picking up subdomain models. ] ---------------------------------------------------------------------- To evaluate a model once the correct solution is released When the CASP organizers have posted the PDB id for a target, the predictions can be evaluated by 1) defining REAL_PDB:=2gw2A in the Makefile before the include 2) making targets: decoys/evaluate.rdb decoys/evaluate.pretty (better, use targets decoys/evaluate.unconstrained.rdb decoys/evaluate.unconstrained.pretty) If several targets need to be evaluated, they can be sent to the farm cluster. Put the list of targets (one per line) in a temporary file (say /tmp/ids) and use para-trickle-make: para-trickle-make -many -makefile Makefile \ -targets 'decoys/evaluate.unconstrained.rdb decoys/evaluate.unconstrained.pretty' \ -no2letter -modelsdir ~/casp7 -se2log < /tmp/ids ----------------------------------------------------------------------