Thu Jun 22 11:40:02 PDT 2006 T0343 Make started Thu Jun 22 11:41:02 PDT 2006 Running on camano.cse.ucsc.edu Thu Jun 22 11:48:57 PDT 2006 Kevin Karplus Crystallographers tell us this forms a DIMER and that there are no SS bonds. We need to do mulitmer submission as well as monomer. BLAST gets no good hits in PDB, but there are some long fragments. ORFan!! Thu Jun 22 15:36:30 PDT 2006 Kevin Karplus No good hits with the HMMs either. Best hit is no better than chance: 1p90A with E-value 6. Fri Jun 23 05:25:45 PDT 2006 Kevin Karplus The top alignments are not in complete agreement about the structure, though all have a helix against a small sheet. The try1-opt2 model seems to be based on alignment to 1p90A (judging from the log file), but the superposition seems best with the alignment to 2bkrA---this may be a difference in which alignment was chosen, more than a difference in the templates. The try1-opt2 model is plausible, but I think that this target will need more work. It's being an ORFan makes all the usual predictions (local structure, residue-residue contacts, HMM search, ...) rather suspect. Fri Jun 30 16:29:15 PDT 2006 George Shackelford I'm going for the Usual Suspects from 'alphabetmatch' looking at both str2 and near-backbone. Since the local structure predictions are suspect, these selections are likely wrong. But it's a start. id score per residue 5S 10N 10N 1irw 155.103 1.49137 1ycc 154.794 1.4884 2b4zA 152.795 1.46918 1ccr 152.642 1.46772 2b10B 152.167 1.46315 1ytc 152.005 1.46159 2b11B 151.925 1.46081 5cytR 151.61 1.45779 1ql3A 149.436 1.43688 1b0nA 149.166 1.43429 1co6A 149.085 1.43351 1ppjF 148.806 1.43082 1bkrA 148.526 1.42814 1aa2 148.249 1.42547 2b0zB 148.232 1.4253 1i8oA 148.176 1.42477 1k3sA 147.69 1.4201 3c2c 147.557 1.41882 1bccF 147.497 1.41824 1qn2A 147.265 1.41601 I've run extra-alignments and read-alignments. The usual approach of splitting these ids into two groups of ten, and building two tries with them as included read-alignments-scwrl. So I have done so with try2, and try3. I made an error in try2 that included reading all fragments. If the results are useful then so be it. I have corrected the issue and try4 represents the best ten above. Sat Jul 1 17:44:13 PDT 2006 George Shackelford I looked over the results this morning and found that try2 with its faulty include and the initial try1 scored better than try3 and try4. This is not encouraging but I did do my search using str2,near for matching. I've decided to do a new search using ehl2 as before. This did require some software changes to my search code. I have completed them and the results are as follows: # program: alphabetmatch # George Shackelford # # Target: T0343 # length: 104 # length range: 96 to 114 # alphabets used: # ehl2 # id score per residue 5S 10N 10N 1k3sA 183.671 1.76607 2irfG 178.011 1.71165 1kte 177.686 1.70852 1awcA 177.241 1.70424 1kafA 176.444 1.69657 1gmxA 176.313 1.69532 1ihnA 175.811 1.69049 1d0qA 175.474 1.68725 1dcoA 174.269 1.67567 1dw0A 173.71 1.67029 1f9mA 173.503 1.6683 1iibA 173.441 1.6677 1ytc 173.418 1.66748 1l8rA 173.412 1.66742 1ccr 173.118 1.66459 1vcbB 173.032 1.66377 1g2rA 172.703 1.66061 1mfiA 172.482 1.65848 1gxqA 172.423 1.65791 1i8oA 172.326 1.65698 1k3sA 2irfG 1awcA 1d0qA 1mslA 1ytc 1g2rA 1gh6A 1hulA 1dcoA 1kte 1ccr 1ycc 1l8rA 1gxqA 1f9mA 1irw 1gmxA 1dw0A 1co6A Ok, we repeat our usual approach of ten and ten. Now we do try5 and try6. Sat Jul 1 22:30:10 PDT 2006 George Shackelford try5 looks unreal, like natural bridges. The weights for wet/dry need to be reset. Try6 is better but way too foamy. I'm going to take out 1kafA, the basis for try5 and rerun with normal wet/dry weights as try7. I want something more packed. I'm going to do the same for try6, removing 1mfiA, to get try8. Essentially try7 and try8 are fresh. try7 and try8 running on shaw Sun Jul 2 09:37:22 PDT 2006 George Shackelford Try7 and try8 look better. Try8 scores well. I note that the target is a dimer - that makes a difference with burial exposure. That makes me wonder if the current definitionn of "burial" needs to take the dimer configuration into account. The interface appears on the surface but its composition seems to match the interior. This can confound the predictions. Perhaps we need a different region representing the interface which would help the cost and prediction of other dimer interfaces. With burial in mind, I may try to do a match using ehl2 and burial. The burial alphabet represents a scaler. There may be a better way to calculate the cost that incorporates that. I don't have any insight other than mean and variance. Furthermore the sequence respresentation of the actual proteins only show one exact value. # program: alphabetmatch # George Shackelford # # Target: T0343 # length: 104 # length range: 96 to 114 # alphabets used: # ehl2 burial # id score per residue 5S 10N 10N 1k3sA 211.902 2.03752 2irfG 210.828 2.02719 1awcA 205.825 1.97908 1d0qA 205.1 1.97212 1mslA 204.607 1.96737 1ytc 203.683 1.95849 1g2rA 203.385 1.95562 1gh6A 203.224 1.95408 1hulA 202.849 1.95047 1dcoA 202.575 1.94784 1kte 202.372 1.94588 1ccr 201.998 1.94229 1ycc 201.988 1.94219 1l8rA 201.292 1.9355 1gxqA 200.293 1.92589 1f9mA 200.1 1.92404 1irw 199.806 1.92121 1gmxA 199.422 1.91752 1dw0A 199.4 1.91731 1co6A 199.184 1.91523 Interesting. The scores are higher with the new value included. I see many repeats (including the top two) and one from the first list using str2 (1ycc) that did not appear on the ehl2 list. I am wondering why the top two score well but don't figure as templates for models. I need to look at the actual structures. I took a look at 1k3sA. I like it. I'm going to a run with just 1k3sA included and see what I get. I'm going to turn constraints back downto 10 so they don't interfere too much. This is a test. try9 running on shaw I'm monitoring the best scores. Breaks started out quite bad ~.5 but dropped quickly to almost zero. Sun Jul 2 22:42:00 PDT 2006 George Shackelford try9 seems to be trying to form a sheet but fails. If it had succeeded in forming the sheet, the score would have been the best. I followed the pool backwards and found (as far as I can understand) that the basis was a small local alignment with 1k3sA or that and a global alignment. I could try and force a sheet between the two strands (starting with 42-46 rather than the 42-47 which makes a bad hairpin), but it appears that I may as well enforce a sheet that involves a third strand and see if I can get some kind of match with something in this batch. If I can, I'm going to work on that. I'd like to see if these matches come from some common family ala SCOP or CATH. Wed Jul 12 13:01:01 PDT 2006 George Shackelford Request Id: 1093721343867021157, chain A We need to form dimers from what we have. I took the try1-opt2.pdb and used VAST to find the following neighbors: 1xebA -- dimer 1gheB 2yreB 2yk3F 2vhsA 2vkcB Looking at 1xebA, it forms a sandwich. It makes me think that the other models might actually form decent dimers as well. I need to do some more searches. using try5 as well. Request Id: 307886608391903087, chain A Wed Jul 12 20:25:19 PDT 2006 George Shackelford So far I have: dimer-try1-1xeb.pdb dimer-try5-1kaf.pdb need to put in super-impose. let's get try2 Your VAST Search job was submitted at 07/12/2006 23:33:23(EDT). Request ID: 1106464373513935690 Damn. 4 hits and none are dimers. I'm going to retry searching all PDB. Your VAST Search job was submitted at 07/12/2006 23:42:57(EDT). Request ID: 1094361438954204740 Expected to take 30 minutes. I think I'll not get this one in... Approaching 9pm. I'm going to work somemore on try2, but only two submissions for now Wed Jul 12 21:12:00 PDT 2006 Kevin Karplus I don't see any list of monomers to submit here with notes about where they came from! superimpose-best.under has only the dimers (which belong in dimer/superimpose-best.under, not in the monomer directory) I'm going to have to guess that George intended the commented-out order for the monomers: ReadConformPDB T0343.try2-opt2.pdb ReadConformPDB T0343.try5-opt2.pdb ReadConformPDB T0343.try1-opt2.pdb ReadConformPDB T0343.try8-opt2.pdb ReadConformPDB T0343.try7-opt2.pdb I'll move try1 ahead of try5 For some reason, George has not been generating the gromacs0.repack-nonPC files (related to his not generating grep-best-rosetta files?). He has one for try10, but not for earlier runs. Wed Jul 12 21:29:42 PDT 2006 Kevin Karplus I did make T0343.do$x for 1 through 9, and looked at grep-best-rosetta. Rosetta likes best try3, try2, try5, try6, try1, try4, try8, try7 (all gromacs0.repack-nonPC models). The try1 costfcn likes best try2, try1, try5, try8, try10, try7 Unconstrained likes best try2, try5, try1, try8, try7 (the source for George's order?) I made a "secondary.costfcn" which is like unconstrained, but has constraints from just the dssp-ehl2 constraints. It orders the models try2, try8, try5, try7, try1, try3, try6 I don't have time to look at all the models now, but it seems like the 5 George left in the comments are a reasonable set, so I'll content myself with moving try1 up one notch. The dimers George made have not been optimized in a dimer context, but they hardly interact, so are unlikely to be correct dimers from the crystal. I'll submit them anyway, though I think they are still junk. Wed Jul 12 21:47:59 PDT 2006 Kevin Karplus Monomers ReadConformPDB T0343.try2-opt2.pdb ReadConformPDB T0343.try1-opt2.pdb ReadConformPDB T0343.try5-opt2.pdb ReadConformPDB T0343.try8-opt2.pdb ReadConformPDB T0343.try7-opt2.pdb submitted with comment For this preliminary submission, I do not really know how the models were generated or selected, as no concise summary was generated, and the notes in the lab notebook README file were insufficient for me to figure out what was done. George apparently used an undocumented program "alphabetmatch" that does something to generate more distant fold recognition templates. Other than that, I can't figure out what he did. Model 1 is try2-opt2 Model 2 is try1-opt2, the fully automatic method. Model 3 is try5-opt2 Model 4 is try8-opt2 Model 5 is try7-opt2 -------------------------------------------------- Wed Jul 12 21:56:39 PDT 2006 Kevin Karplus The two initial dimers submitted. Thu Jul 13 00:25:18 PDT 2006 George Shackelford Unfortunately I did not know that submission of the monomers was required along with the dimers. As a result, I made no list of the the five best monomers. Kevin correctly guessed that the five monomers that were commented out were the ones I would have suggested as monomer submissions. He also guess correctly that they were based on unconstrained scoring. I agree to his moving try1 up a notch; it looks better than those below. Had I known to submit the monomers as well, I would have also provided a history and explanation of their source. Fri Jul 14 12:12:36 PDT 2006 George Shackelford This is due on Sunday noon. Time to make the rest of the dimers. I have dimers for Model 2 / try1 and Model 3 / try5. For try2, I am using VAST results 1094361438954204740. I have found that 1e80 chains B and D may dimerize. I'm going to try them and see what I get. The results is a dimer but the chains are too far apart. I am going to try 1vm6 which is further down the list of the p-value sorted VAST hits. Ouch! Too close. 1vm6A covers only the first domain in T0343. I think I need one that touches on both. I'm going to return to this later when I've been briefed on using Protein Shop to align for dimers. The 1vm6 alignment is a good place to start. Focusing on the last two. try8 is next. Your VAST Search job was submitted at 07/14/2006 17:00:28(EDT). Request ID: 77855718860279234 May as well start a job for try7. Your VAST Search job was submitted at 07/14/2006 17:03:12(EDT). Request ID: 457155349145852532 Fri Jul 14 23:31:05 PDT 2006 George Shackelford Earlier this afternoon Chris helped me get a handle on doing dimers. One interesting note: he had stated that you couldn't load the two chains from one file but had to load two different files. I found that the aligned version of try2 could be loaded and the two chains could be manipulated separately without the troubles he had experienced. I think that Chris has been working with those dimer files that treat the dimer as one long chain; I had two chains already. When you have two chains, you find you have to use the selection popup to save to two different files. The two files can be combined but simply renumbering won't get the job done. I think I'll have a renumber and label chains program so to bypass having to use undertaker and specifiying where the break takes place (remember I actually have two chains). This may only be useful where we're not doing a lot of polishing of placement; frankly I have no idea if my dimer alignments are even close. [Sat Jul 15 09:25:15 PDT 2006 Kevin Karplus There is already a script for making a single chain into multiple chains and a make target for using it. If you have a multimer foo.pdb.gz, then "make foo.unpack.pdb.gz" will make the multi-chain version, assuming that you have MONOMER_LENGTH defined in your Makefile. You can also call unpack-multimer directly if you need to. ] the two chains I have aligned in ProteinShop are currently saved as try2A.pdb and try2B.pdb. I'll be finishing with them on Saturday. This evening I've been using the VAST matches and trying to solve try8. I have found a few possible dimers that may work: 1f1c - alignment has way too much space. I could use ProteinShop to close the gap. 1nnq - alignment is way too close. It has the two overlapping a bit. Sat Jul 15 09:27:47 PDT 2006 Kevin Karplus Please have a list of predictions (one for monomers, another for dimers) by early evening, so that I can do the submissions---I had to stay up way too late last night waiting for people to write up their notes, and I can't stay away that late tonight. Fix your .cshrc file so that it doesn't use "setpaths" (use an explicit list: /bin /usr/bin /usr/X11R6/bin /usr/local/bin) and you can make decoys/grep-best-rosetta without messing up your KDE initialization. Of course, the model rosetta likes best (decoys/T0343.try3-opt2.gromacs0.repack-nonPC.pdb.gz) is truly horrible, so this might not be that worthwhile. I'm assuming that the monomers we submitted as the preliminary tries are still the ones we want, and that only dimer optimization is being done now? Sat Jul 15 15:15:15 PDT 2006 George Shackelford Yes, I'm just doing dimer optimization now, if you could call it that. Mostly I'm just trying to get the monomers in reasonable positions. Continuing to work on try8 I find that: 1pzw - actually consists of one chain though is seems to dimerize! I can't recall the name of the database that has extra dimerizations. pqb?? No, PQS. Found it... I was able to get the dimerized version of 1pzw from PQS, and I gzipped it as zpzw.pdb.gz and added it to our PDB database. It did succeed in making try8 into a dimer but the parts overlapped. 1pzw - too close together [Sat Jul 15 20:13:03 PDT 2006 Kevin Karplus We would normally put 1pzw.mmol from pqs in the 1pzwA subdirectory ] Gotta try the these next two: 1y1a - not good, but close at least. 2pol - useless Sat Jul 15 18:20:37 PDT 2006 George Shackelford I finally decided to use ProteinShop and see if I could adjust try8-1fic.pdb. The program works on my home computer! Yes! I generated try8.pdb using it. Not perfect but about as decent as I can get it. I don't think I'll spend as much time with try7 as I have on try8. So far I have: dimer-try1-1xeb.pdb dimer-try5-1kaf.pdb try2.pdb try8.pdb Now for try7. First I'll get a decent copy by using VAST 457155349145852532. There are only five matches for the whole chain! hope I can get a dimer to start with. 2b99 is part of a pentamer. We start with that. That worked pretty good. The two chains are close (actually a bit too close, I think) and I can use ProteinShop to finish the positioning. Totally wrong I think, but try7.pdb is ready. Let's turn it over to Kevin. The monomers: ReadConformPDB T0343.try2-opt2.pdb ReadConformPDB T0343.try1-opt2.pdb ReadConformPDB T0343.try5-opt2.pdb ReadConformPDB T0343.try8-opt2.pdb ReadConformPDB T0343.try7-opt2.pdb The dimers: I changed the order to try and reflect my opinion of the best dimers: dimer-try1-1xeb.pdb try2.pdb dimer-try5-1kaf.pdb try8.pdb try7.pdb The derivations remain the same. Sat Jul 15 20:14:35 PDT 2006 Kevin Karplus OK, the monomers are what we already submitted, so nothing new to do there. Sat Jul 15 20:23:17 PDT 2006 Kevin Karplus Dimers submitted with comment Dimer models have not been optimized in the dimer context, just monomers superimposed on a dimeric PDB file. Dimer 1 dimer-try1-1xeb, which superimposes try1-opt2 on 1xeb. Dimer 2 is try2, which superimposes monomer try2-opt2 on 1vm6, then adjusted with ProteinShop. Dimer 3 dimer-try5-1kaf, which superimposes try5-opt2 on 1kaf. Dimer 4 is try8, monomer try8-opt2 superimposed on 1fic then adjusted with ProteinShop. Dimer 5 is try7, monomer try7-opt2 superimposed on 2b99, then adjusted with ProteinShop. Note: no optimization was done in the dimer context, though it would have been had we had more time and more confidence in the monomer predictions. Sat Jul 15 20:27:24 PDT 2006 Kevin Karplus I looked at the dimer models and they are all far too loose have any hope of being right---slight clashes would have been preferable to such loose associations. Oh well, there isn't time to do any better, and we still have 45 targets to predict. Let's hope we do better on another target.