13 June 2002 Kevin Karplus T0136 seems to be an alpha/beta protein, with no very obvious homologs, though the CAFASP servers seem to be favoring SCOP domain c.14.1 (FSSP reps 1ef8A, 1dciA, 1nzyA, 1hnoA, ...) A popular template seems to be 2dubA. Some of the other folds that are predicted (like 2scuA+2scuB) have substantial similarity also. Note: the AA-only method comes up with two different hits higher (c.1.10.2 1dosA and c.37.1.8 1efcA). The 1dosA fold scores high even with the two-track. Perhaps it is the other domain? At 520 residues long, it is clear that this is a multi-domain protein. The 1nzyA alignment will only account for about half---we'll probably have to split somewhere around residue 280 and run separate predictions for the two domains. The whole region 260-290 seems pretty weakly conserved, without even strong 2ry predictions. The HMM E-values are pretty strong for this family, with template 1nzyA scoring the strongest. 14 June 2002 Kevin Karplus The best-scoring alignment to the c.14.1.3 scop domain is for 1ef8A (a 2-track global alignment). This alignment is spread out across the whole sequence, so I suspect it is not compact. We're probably better off using a template alignment or a local alignment to avoid grabbing stuff from the other domain. The 1nzyA-T0136-fssp alignment is clearly aligned to the second domain, starting at S339, so perhaps there are 3 domains? The region 300-335 looks like it has a strong signal, though, so I would be a bit suprised to see it as a separate domain. I split the sequence into two subdomains, in the subdirectories t0136-1-290 and t0136-250-523. For the first domain, the strongest hit is to 1hnuA (d3,D2-enoyl coa isomerase eci1), which is probably in the c.14.1 family (DALI Z scores of 22.7 to 1dciA and 1nzyA). For the second domain, the strongest hit is to 1ef8A, also the c.14.1.3 family, so it looks like we have a tandem repeat of the same domain, despite low similarity between the domains as reported by dotter. I wonder if we can use the subdomain alignments in undertaker? The templates (1hnu, 1ef8A) seem to be homotrimeric, so I wonder about the oligmerization of the target---could it be 3 domains? The best alignments of t0136-250-523 to 1ef8A seem to start at around 318-329, and extend almost to the end. The Pfam domain for 1ef8A is ECH, which is a 168-long HMM. This domain often occurs with domain 3HCDH_N and 1 or 2 copies of 3HCDH. It also occurs alone a lot, and with ACBP, and with formyltransferase domains. Running PFam on T0136, I get two hits: carboxyl_trans 23-521 a full-length match TIGR00515 6-262 Prodom seems to split the carboxyl transferese into multiple subdomains: The best scoring ones are Position ProDom domain Score E value 208-372 #PD001372 779 1e-83 373-505 #PD001720 CARBOXYLASE COMPLETE BETA PROTEOME 502 1e-51 45-102 #PD003448 BETA SUBUNIT CARBOXYLASE PROTEOME 300 3e-28 21-211 #PD003074 CARBOXYLASE ACETYL-COA BIOTIN LIGASE 235 1e-20 1-44 #PD376759 CARBOXYL-TRANSFERASE SUBUNIT 12S BIOTIN 211 7e-18 155-207 #PD468765 LIGASE CARBOXYLASE COMPLETE PROTEOME 188 3e-15 103-154 #PD382209 CARBOXYL-TRANSFERASE SUBUNIT 12S BIOTIN 160 6e-12 103-157 #PD001571 BETA CARBOXYLASE SUBUNIT TRANSFERASE 153 4e-11 313-420 #PD003074 CARBOXYLASE ACETYL-COA BIOTIN LIGASE 96 2e-04 184-207 #PD013404 BETA PROPIONYL-COA CARBOXYLASE LIGASE 87 0.002 375-514 #PD004951 TRANSFERASE ALPHA SUBUNIT CARBOXYLASE 85 0.003 6-128 #PD001816 POLYPROTEIN PROTEASE SERINE HYDROLASE 89 0.001 The #PD001372 domain seems to occur after a sequence of pd003074 and PD471799. PD001720 almost always comes immediately after PD001372, usually preceeded by Pd003074 or PD003448+PD001571 There seem to be a lot of overlapping domains here with very similar names, and it isn't clear that the prodom boundaries mean anything at all. Still, there does seem to be a hint that there may be domain breaks around 210 and 375. In any case, this carboxylase family seems to be well studied, so there should be some good papers about it that might give us hints about active sites and the like. 17 June 2002 Kevin Karplus The CAFASP organizers have done a rough split into THREE domains for this protein (136N:1-120, 136MID:81-330, 136C:279-523). The CAFASP servers do NOT agree on the first domain---they either focus on the first 3 helices or the beta hairpin. There is almost univeral agreement that the middle domain is c.14.1, with templates 1dciA 7 1dubA 7 1nzyB 2 1nzyA 1ef8A 6 1ef9A 2dubA 1fc6A 1ey3A 1hnoA 3 1tyfN The SAM-T02 alignments for this middle domain are basically the full length of the domain. They agree with the SAM-T99 alignments for much of the domain, but place the final helices differently. The helix placement does not agree well with the 2ry structure prediction in either, so the domain boundary may be earlier than 331---perhaps closer to 260. The third domain again has strong consensus for SCOP c.14.1, with full-length predictions. The helix predictions match better here, so the domain boundary seems reasonable. Date: Mon, 17 Jun 2002 10:20:59 -0700 From: Kevin Karplus To: karplus@soe.ucsc.edu, rachelk@soe.ucsc.edu, weber@soe.ucsc.edu, learithe@cats.ucsc.edu, yael@biology.ucsc.edu, baertsch@soe.ucsc.edu, rph@soe.ucsc.edu Subject: target 136 Target 136 is a multi-domain protein from a family about which some things are known. It would be good to do a little literature search to see what can be found out both about the transcarboxylase family and the "crotonase-like" family, which seems to make up two domains of the multi-domain protein. The questions we'll want to answer include 1) where are the domain boundaries? 2) what might the first domain be? 3) are there any known catalytic residues that we can use to help choose alignments? 4) is there a linker region between the two crotonase-like domains? 5) how should the crotonase domains pack against each other? 6) what should we do about the region around 260-290----it doesn't seem to be a good match for the end of a crotonase domain. Right now, my best guess at domain boundaries is 1-80, 81-260, 280-523, with an unstructured linker region for 260-280 but I could change my mind with more data. ------------------------------------------------------------ 17 June 2002 Rachel Karchin/Jenny Draper Literature search on crotonase-like domains picked up: Hofstein et. al. Biochemistry 38:9508-9516 1999 "Role of Glutamate 144 and Glutamate 164 in the Catalytic Mechanism of Enoyl-CoA Hydratase". The two catalytic residues act together to hydrate a CoA substrate. We looked through CAFASP server alignments predicted for this target and observed that templates based on 2DUB (an Eonyl-CoA Hydratase) had aspartates 20 residues apart which might be aligned to D83 and D103 of T0136 (although several servers picked 2DUB and didn't pick up on this in their alignment). To see 2DUBE with the CoA ligand and putative catalytic side chains, in 2dub-functional-res, fireup rasmol and run the script 2dub-func-res.rasmol The decoy structure T0136-try1-opt.pdb contains a B-hairpin on which D83 and D103 come into contact in a manner similar to E144/E164 in 2DUB's active site. A Deepview "Magic-Fit" structural alignment of 2dubE and 1hnuA picks up a single glutamic acid, 1hnuA-E158, which is located exactly inbetween E144/E164 in 2dubE. (1hnuA seemed to be a good hit for the "first domain" (1-290) in the README entry, 6/14). ------------------------------------------------------------ 19 Jun 2002 Kevin Karplus Highly conserved acids in T0136: D55 D103 D301 E307 D369 E380 There are several moderately conserved acids also (such as D358, which might pair with D380). The 2dubE GLU residues that bind the CoA are both on helices that are separated by two short strands. Looking at the logo file pcem/pdb/2d/2dubE/nostruct-align/2dubE.t2k-w0.5-logo.eps we can find the two GLU at 109 and 129 (the numbering is different from the PDB file). Neither is strongly conserved---the strong conservation in that region is for G101,A103,G105,G107, and G133 The GLU residues don't actually touch the CoA ligand---residues that do are I70, A68, and G106 (not using pdb numbers but the numbers in the logo---use "renumber" in rasmol to use this numbering scheme). Looking inside the binding pocket in of 2dubE, it looks like several of the polar atoms in the pocket are backbone atoms (O of A68, O of A66, O of K26, ...). Perhaps we need to use the FSSP alignment for 1nzyA to characterize this binding pocket better. The corresponding residues are 1nzyA G117 and 1nzyA W137, neither of which is an acid. I think that the Gs are more important to the binding pocket than the acids. ------------------------------------------------------------ 19 Jun 2002 Jenny Draper Notes on catalytic residues in the crotonase family. The crotonase family catalyzes a range of metabolic reactions, including isomerase, dehalogenase, and hydratase activity. I'm not sure which of these functionalities the crotonase domain in T0136 would perform. In both 2dub (CoA hydratase) and 1hnu (CoA isomerase), a GLU in the same position is directly involved in the catalysis. In the HYDRATASE family: (2dub) The 2 GLU's are not supposed to contact the CoA ligand. They hydrogen-bond to a water, and this water is added to the CoA molecule in the hydration reaction. According to the paper referenced on 6/17/2002 above, replacement of G129 (logo numbering) with glutamine reduced functionality in this enzyme 630,000-fold. Replacement of G109 reduced functionality 7700 fold. Thus, for hydratases, the second GLU should be conserved. In the ISOMERASE family (1hnu) (Using PDB numbering) E158 (located directly between E144/E164 in a Deepview structural alignment to 2dub) is reffered to as "the only catalytic residue... involved in shuttling the proton from C2 of the substrate (CoA) to the C4 of the product. We suggest that the negatively charged transition state of the enzymatic reaction is stabilized by hydrogen bonding to the peptide NH groups of the conserved oxyanion hole residues, Ala70 and Leu126". Mursula, Anu. d3-d2-Enoyl-CoA isomerase from the yeast Saccharomyces cerevisiae Molecular and structural characterization. Dissertation, Department of Biochemistry, University of Oulu. (printouts of releavant pages are in binder in BE215). ---------------------------------------------------------- 24 June Jenny Draper Look at GLU245/265 (E225/245 in the t0136-1-290 logos). They are in the right helix-strand-helix orientation, and happen to be at positions 145/165 in the cafasp "mid-domain" prediction... ---------------------------------------------------------- 26 June 2002 Kevin Karplus The try2-opt run (with new alignments and fragments) scores MUCH better than try1-opt. There are parts of the structure that look reasonable, but other parts look like beta shets have been blown up. Perhaps we should generate some decoys for domains, then use them as partial conformations to be inserted? Perhaps a constraint between E225 and E245 would help, also, as they are getting blown apart in try2-opt. ----------------------------------------------------------- 26 July 2002 Jonathan Casper I'm attempting to add SCWRL and pred-alpha to the script, starting from the conformation of try2-opt. Note that this is a new version of undertaker, and the operations JiggleSubTree and OptSubtree are available. I turned them off because this target still seems to be in the early stages. 11 Aug 2002 Kevin Karplus I reran make with the new template library, and did not see much change. I don't think there is much point to running undertaker on the whole protein---we should split it into domains (perhaps the 3-domain split used for CAFASP???), run each domain separately through make and undertaker, then reassemble the pieces, and run undertaker on the reassembled protein. Fri Aug 16 00:04:30 PDT 2002 Kevin Karplus I've changed my mind---I think there is some value in running undertaker on the whole protein, IF we seed it with alignments made from the domains. It should be able to put together alignment pieces, I think. With this in mind, I've created t0136-1-80, t0136-1-290, t0136-80-290, t0136-250-523, each of which will be run to create alignments. When they have all finished, I'll edit undertaker.script to read all the alignments and try applying them. Fri Aug 16 13:45:15 PDT 2002 Kevin Karplus The first iteration of try4 using the subdomain alignments is looking pretty good. We'll probably have to add constraints to reattach V85-I92 antiparallel to P96-S101, but that should be pretty easy. (It may be necessary to redo from scratch once we've identified the constraints we need.) 17 Aug 2002 Kevin Karplus The new best score is try4-opt-scwrl. L316-V323 are predicted to be strand, but have curled up into a helix, and V85-I92 still loose. 85> VVTGRGTI LGR PVHAAS Trying to guess the register: vvtgrgti saahvprgl Maybe we should add H-bond constraints for T87 A99 R89 V97 With those constraints try4.17.60 scores better (it didn't break the sheet). The pairing there is G88-A99, V86-S101, so let's use them. We might also want to get L8 somewhere near F133 or F191, to cover the face of the sheet. With these constraints, the new best is still try4.17.60 --- let's try reoptimizing from there. 17 Aug 2002 Kevin Karplus oops---it looks like I started a try6 run that may have stepped on part of the try5 run. 17 Aug 2002 Kevin Karplus The try5-opt has a new low score. The individual domains look pretty good, but the N-terminal set of helices may need to be repacked by itself then cut-and-pasted in. Let's get rid of the packing constraint on the first helix---I think it is wrong. (Without this constraint, try5-opt-scwrl is best). try6.constraints removes the constraint on the first helix, but add some hbonds to keep G229-V331 attached to its sheet. With try6.constraints, try5-opt-scwrl is the best of the existing decoys. We should probably look up how the domain c.14.1 (2dubA, 1ef8A, 1dciA, 1nzyA, 1hnoA, ...) dimerizes, and apply the dimerization constraints to the two domains! This is a bit tedious to do from home, so I'll try looking it up on Monday. There are 64 c.14.1 domains in SCOP59: 1tyf is a 14-mer, 1fc[679] are monomers, 1k32 is a 6-mer, 1j7X is a monomer, 1nzy is a trimer, 1jxz is a trimer, [12]dub is a hexamer, 1ey3 is a hexamer, 1dci is a trimer, 1hno and 1hnu are monomers, 1hzd is a hexamer, 1ef8 is a trimer, 1ef9 is a monomer. [Note: I may have gotten the oligomeric state too low, as I did not take crystallographic symmetries into consideration.] In the meantime, I'll try to do more optimization of try5-opt and try5-opt-scwrl. TO DO: 1) figure out dimerization constraints 2) Pack N-terminal helices independently (say in t0136-1-80) and cut-and-paste them in. 18 Aug 2002 Kevin Karplus 11:37 try6 is still running. The best so far is still having trouble with the N-terminal helices. 14:22 Still only 90% done. This one will always run slowly, because of the size of the protein. 16:30 try6 Done! try6-opt-scwrl is new best score. The score has improved slightly over try5-opt, but the initial helices have not been packed against the structure and there are still some holes. I wonder it I281-V292 is a hairpin, though we have not predicted it to be, since it has two straight chunks of backbone, and is currently floating out in space. 19:00 I took t0136-1-80/decoys/T0136.try1-opt.pdb and pasted it into try6-opt-scwrl.pdb, to make T0136.cut-and-paste1.pdb. It has, of course, a huge gap between the two pieces, but try7 has been set up to use no other initial conformations, and have a high probablility of calling OptSubtree, so it should bring the two pieces back together quickly. After that, I'll do a longer run with more normal parameters to try to pack the helical domain against the rest. Mon Aug 19 09:41:44 PDT 2002 Kevin Karplus 1tyf oligomerizes by packing the helices of one chain against the open face of the sheet on the next chain. One long helix, just before the two strans of the beta-beta-alpha superhelix, sticks way out to make another oligomerization interface. 1k32 has 2 of the "beta-beta-alpha" superhelices in each chain, and several other domains, so won't provide a lot of help in packing. 1nzy, 1jxz, 1dci, and 1ef8 pack helices against helices and leave the beta sheets facing out. 2dub, 1ey3, and 1hzd do the same, but stack the two trimers---that interface is again helices. None of these structure get much help for dimerizing the two domains of T0136. try7-opt-scwrl is the new best scorer. It looks pretty good, except for the region from G222 to R322, which seems is a rather random assortment of helices, not well packed. Let's try upping the weight for dry12, and seeing if we can pack this thing any tighter---it probably won't help much, as undertaker is not exploring docking solutions for the multiple domains---it'll just settle things tighter into the current incorrect docking. Maybe I should do another subdomain: 211-315, to try to get the middle part packed better. Mon Aug 19 16:18:38 PDT 2002 Kevin Karplus try8-opt is new best. It looks ok in the major domains, but I really want to try replacing L221-Y305, which I may be able to do when the subdomain run finishes. Question: should I try to attach V79-D83 to edge of sheet?? Mon Aug 19 17:37:39 PDT 2002 Kevin Karplus The try1 run for t0136-211-315 is done. It produces a flat cap consisting of the 4 helices. I'm a bit dubious about the packing, but I'll try cutting and pasting G223-D304 into try8-opt and see what happens. The result will be called T0138.cut-and-paste2.pdb Tue Aug 20 14:30:18 PDT 2002 T0136.try9.5.50 and T0136.try9-opt-scwrl are new bests. I wonder if the predicted strand V317-V323 should be on the sheet antiparallel to R326-N333. 316> LVTAFARV N 333< NAVIGVSR G Let's try adding CB constraints (in try10.constraints). 21 Aug 2002 06:50 Kevin Karplus The strand 317-323 did NOT get attached, but remained curled up into a helix. Perhaps I need to cut-and-paste Y312-N324 from a different decoy where they are straight. I might also want to try moving G325-P492 away and R510-C523, so that they can re-hinge and repack. Thu Aug 22 16:11:46 PDT 2002 Y312-N324 are helical in try9-opt-scwrl, try8-opt-scwrl, try7-opt-scwrl, try6-opt-scwrl, try5-opt-scwrl, try4-opt-scwrl, try3-opt-scwrl, try2-opt, but is straight in try1-opt. T0136.cut-and-paste3 has Y312-N324 from try1-opt, and the rest from try10-opt-scwrl. Fri Aug 23 12:49:16 PDT 2002 The best score is still try10-opt-scwrl, with try11-opt-scwrl doing worse than try6 through try10. V317-V323 are fairly straight, but I think that domain A82-Y312 is on the wrong side of the other domain, preventing the strand from getting to right position. Let's add some constraints to move things around: 76> DKAVV 89< RGTVV 317> VTAFARV 333< NAVIGVSR With these constraints added T0136.try11.19.15.pdb scores best. I'll try breaking it at G84, I202, and N324 and reassembling. 23 August 22:38 Kevin Karplus try12-opt looks terrible, with one of the domains shredded, but it scores nearly as well as try11.19.15! If we turn off all the constraints, the best score is for try10-opt-scwrl. Maybe I should try reoptimizing it with the try12.constraints. 24 Aug 2002 Kevin Karplus try13-opt-scwrl, though the new best score, seems to have unwound the last part of the first big domain (204-218). Without constraints, try13.1.50 scores best and try10-opt-scwrl second best. Even in try10-opt-scwrl, g213-L221 are not attached as the last turn of the beta-beta-alpha super helix. Maybe I should add constraints pairing them in parallel with F191-T195 I wonder if I should use what I now know about the domain boundaries to redo the domains separately. 1-80, 81-221, 222-313, 314-523 191> FIIMT 213> GEDVT Adding these constraints makes try12.13.15 best, and try6-opt-scwrl second best, but try12.13.15 has blown up the first big domain, splitting the strand containing A99 from the one containing I93. Further adding a constraint that strand V317-V323 be kept straight, makes the same two be on top. Adding some constraints to hold the first sheet together makes try6-opt-scwrl score best, and try7.4.40 second best. Let's try another run starting from them, since I'm feeling too lazy to try to do a full breakdown into domains. I'll toss in a few of the other high scores 24 Aug 2002 16:56 It looks like try14-opt has messed up strand M200-T203, pulling it off the superhelix, despite try14-opt having such a high score, I think it is worse than before. Try10-opt certainly had this strand right. Sigh---do I have to break into domains and re-assemble, again? Mon Aug 26 10:57:51 PDT 2002 Kevin Karplus I added more constraints (try15.constraints) to try to keep the strand in place, and upped the constraint weights, but try14-opt still scores best. It seems to be based mainly on try7.4.40, which has the strand in place. Maybe I should add straightness constraints as well for each strand, and take out the dubious harpin I tried to make between D76-V80 and V85-R89, since D76-V80 is not even predicted to be a strand. try14-opt is still scoring best, even with all these changes to the constraints. Let's try reoptimizing, using the new constraints, starting from several OK models from different runs: try14-opt, try8.2.50, try7-opt, try6-opt-scwrl, try9.2.50, try5-al10-scwrl.10.50.pdb, try4.17.60, try13.2.50, try11.19.15, try12.18.15, try10.3.50, try1-opt. Mon Aug 26 16:47:23 PDT 2002 Best is now try15-opt-scwrl. In try15-opt-scwrl, strand V317-V323 is floating free, but it looks like it may belong near P96-S101, perhaps between that and T131-Y137? More likely, it pairs with V328-N333, where I've been trying to put it. Let's try running this model through vast: VS30815 password casp5t0136 20:25 Found a few good hits---mostly for the C-terminal domain. Save two alignments: T0136-1nzyA.vast-hand.a2m and T0136-1bob.vast.a2m There are some bugs in the try15.constraint files: T135.cb or F135 SHOULD BE T195 L354.cb or V354 SHOULD BE V354 M184.CB or Y184 SHOULD BE M194 try16.constraints is same, but with these lines fixed. I'm starting try16 from scratch, with just the three VAST alignments to set the initial conformation. I don't expect much, as the initial alignments cause some bad clashes when all applied. I'll also try polishing try15-opt-scwrl (the current best) once more. 27 Aug 2002 Kevin Karplus 01:47 Best score is now try17-opt-scwrl. try16-opt-scwrl isn't even close---it is a very loose collection of pieces. Since T0136 is due today, and I'm already one target behind schedule, I'll submit try17-opt-scwrl.