Wed Jun 9 10:20:50 PDT 2004 T0199 DUE 3 Aug 2004 Wed Jun 9 12:31:22 PDT 2004 Kevin Karplus According to the documentation, the function of this protein is known: heat shock operon repressor HrcA We'd expect to see a DNA-binding motif in a repressor. Our best alignments are to 1mkmA, but we seem to be getting other hits to superfamily a.4.5.* also. None of the hits are very strong, but they are consistent with each other. This is a big protein though, so that hit is probably only for one domain (probably through L130, based on predicted helices). We may have to split this protein into separate domains and model the parts separately. In addition to a domain break around 130 there may be another one around L195-V200 (between the predicted helices), based on the few non-full-length sequences in the t2k alignment. Wed Jun 9 22:13:24 PDT 2004 Yes---it definitely looks like we need to split this into domains. It looks like the first domain is about 1-114 and is a fairly easy fold-recognition target. The other domains may be harder. I should set up subdirectories for each of the domains (perhaps with small overlaps, so that we can paste domains back together). I'll do 1-130 , 111-210, 190-end. Maybe I'll run try2 first, to see if that helps me find domain boundaries at all. Thu Jun 10 12:03:52 PDT 2004 Kevin Karplus try2 has thrown away a lot of stuff that was good in the top alignment to 1mkmA---particularly some beta sheet and helix packing from the second domain. Perhaps I should copy all the beta Hbonds from that alignment. There may also be some good hbonds in the 1q48A alignment. Thu Jun 10 23:02:55 PDT 2004 Kevin Karplus try3 has done a much better job of preserving the good stuff. The hairpin at Y134-E157 probably belongs with the rest of the sheet, and 174-180 seems to be oriented backwards. In 1mkmA, the sheet is strictly antiparallel with order 321654, though strand3 is not a clean edge strand, having a big kink at an RLGM sequence. If there is something similar in the target, then the alignment I have is wrong, and V135-N141 belongs antiparallel to G302-T308. The sheet order would then be 4321765. If the hairpin is built right, I already have hbonds for V135, I137, R139, N141, leaving Y134, L136, E138, and P140 to pair---probably like Y134 F307 L136 Y305 E138 S303 P140 I301 If we just insert the hairpin then we want Y134's partner on the hairpin to align with Y159, L136's with I161, E138's with S163. That is L151 Y159 R149 I161 I147 S163 I've modified the Hbond constraints to try to get this packing of the sheet for try4.costfn. I'll see how it comes out, then try adjusting things for a better fit. It may not be necessary to break into domains, but if we do, it looks like E132 would be a good breakpoint. There are only 2 domains, unless the second domain has an insertion in the middle. Fri Jun 11 07:35:47 PDT 2004 Kevin Karplus The second domain is beginning to look pretty good in try4-opt though one strand needs work and the region G225-V267 needs to be moved, but the first domain (which had better homology originally) has been damaged. I see a choices: 1) add constraints taken from the models built from alignments. 2) superimpose on the first domain, then cut-and-paste to produce a model to refine further. 3) break into domains, and model domains separately, combining when done. Fri Jun 11 09:23:25 PDT 2004 Kevin Karplus I created two subdirectories with the domains: 1-133 and 131-end I'm running make on each, using for try1 in the domains the cost function for try4 here, but with constraints restricted to the subdomain. I'll try to put the comments for the domains into this README file. Domain 1 is definitely matching a.4.5.* domains, with the best match to 1hqcA. Fri Jun 11 13:52:18 PDT 2004 Kevin Karplus Domain 2 looks like a high probability of matching domain d.17.4.3, with 1gs3A as the top example. We did have this as the 9th match in the whole-chain fold recognition, but now we have many more templates to align to. The cost function being used may be inappropriate for the second domain, since I was guessing on sheet construction. (oops--I forgot to same the Template.atoms file in the subdomains) If I get decent models for the two subdomains, I'll have to play around a bit with how to put them back together. Fri Jun 11 14:19:33 PDT 2004 Kevin Karplus Hmm---it looks like the homology models for the first domain only extend through about E97, or possibly only L84, with the last two helices serving as a linker to the next domain. We might want to save what is created up to L84 for use in the whole chain. Fri Jun 11 20:41:19 PDT 2004 Kevin Karplus I had to fix some of the scripts for creating the .rasmol and .constraints files for the second domain (built-in assumptions that residue numbering started at 1). The second domain looks like a beta sheet is starting to form, but there isn't much in the alignments---this may be worse than the whole-chain predictions! Perhaps the problem is that I started with a lot of constraints. Maybe I should try again without them. Tue June 29 1:00 Jenny Draper I've been doing some research on the first domain of this protein. The first domain, approximately 1-95ish, I believe, is the DNA-binding domain. It's helix-turn-helix motif is the helices 3&4 (39-45 and 52-64). Helix 4 (52-64) is the DNA "recognition helix", although both contact the DNA. The sequence "SATIRN*M" in this helix is almost completely conserved across all instances of this gene (it's really common in bacteria). The residues S,T, and R in that motif have been experimentally verified as key for DNA binding. The R will definately have to be freely available on the protein surface! The protein recognizes the CIRCE dna sequence, which is a 9-bp repeat separated by a 9-bp spacer -- indicating that it binds as a dimer (or higher oligomer). There is experimental evidence that it does, indeed, bind in this fashion (although dimer vs higher is not known). There is speculation that it is not very stable unless it is bound to DNA, and requires a chaperone to fold; it forms large aggregates in solution if DNA is not present to stabilize it. In the closest full-lenth homolog (1mkm), the dimerization site is in the linker helix of the DNA-binding domain. From all of this, my guess is that: * The domain of approximately 1-94 will be the DNA binding site, with helices 3 & 4 forming a helix-turn-helix DNA-binding motif * The region 95-? is a linker region, possiblly involved in dimerization. * This protein will not be globular; it will most likely have an extended structure designed for oligimerization. references: -------------------------------------------------------------- Wiegert T, Schumann W. Analysis of a DNA-binding motif of the Bacillus subtilis HrcA repressor protein. FEMS Microbiol Lett. 2003 Jun 6;223(1):101-6. Wiegert T, Hagmaier K, Schumann W. Analysis of orthologous hrcA genes in Escherichia coli and Bacillus subtilis. FEMS Microbiol Lett. 2004 May 1;234(1):9-17. Hitomi M, Nishimura H, Tsujimoto Y, Matsui H, Watanabe K. Identification of a helix-turn-helix motif of Bacillus thermoglucosidasius HrcA essential for binding to the CIRCE element and thermostability of the HrcA-CIRCE complex, indicating a role as a thermosensor. J Bacteriol. 2003 Jan;185(1):381-5. -------------------------------------------------------------- Fri Jul 16 5:30 Jenny Draper I don't like the latest full-length model at all (try4), at least with respect to the first domain; try4 has completely screwed up this winged-helix dna-binding domain. Try3 sorta has the right idea. I'm looking for something a lot like the structure of 1mkmA, as can been seen pretty well in the best alignment structure: T0199-1mkmA-t04-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5.a2m:1mkmA Mon Jul 19 18:00:11 PDT 2004 Kevin Karplus Subdirectories 1-95 and 115-end have been set up. 1-95 has completed try1 and 115-end will soon. The 1-95 is a comparative model with 1j5yA as the best template (1mkmA, which Jenny likes, is the #3 template). The 115-end is more difficult fold recognition. Mon Jul 19 18:42:14 PDT 2004 Kevin Karplus The "rr" contact prediction failed without error message for 115-end, probably because the residue numbering doesn't start at 1 and traincontactnn was not told what the starting column is. When George tells me what command line argument to give traincontactnn, I can fix the Make.main file. Tue Jul 20 1:00 pm Jenny Draper I like the structure of the second domain in try4-opt2 for the full-chain prediction. It's pretty much exactly what I'm looking for, with a good sheet in a 5-6-7-1-2-3-4 pattern, with a long set of helices wrapping around the sheet between stand 4 and strand 5. I've set up a "strands" rasmol definition script, which also includes the helices: define s1 135-138 define s2 146-154 define s3 159-165 define s4 171-177 define s5 271-275 define s6 289-295 define s7 299-307 define h1 2-13 define h2 17-32 define h3 39-45 define h4 52-64 define h5 81-91 define h6 99-109 define h7 116-130 define h8 184-194 define h9 200-210 define h10 213-222 define h11 247-266 define h12 315-333 define wrapper 184-266 -- includes helices 8-11; wraps arounds sheet I'll hold off on this domain for a little while, try to get a good structure for the first domain, then try a scaffold setup for the full structure. Tue Jul 20 3:30 pm Jenny Draper I created a merged structure of T0199.1-95.try1-opt2.pdb (res 1-95) and T0199.try4-opt2.pdb (res 96-338) by superimposing them on the alignment model #1 from T0199.t2k.undertaker-align.pdb.gz (T0199-1mkmA-t04-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5) using DeepView's "Magic Fit". This is problematic, as this aligns the helix around 80-95 with the helix 315-336... instead of with the helix around 95-110 (which is right next to the end helix). I submitted this structure to VAST, hoping maybe it can provide a better superposition. Job: VS59951 Pwd: T0199merged -- still running at 6:00pm Tue Jul 20 7:45 pm Jenny Draper The VAST run didn't buy me much; it likes 1mkmA, and can't align the middle linking region. I'll set up some scaffolding constraints and see what undertaker can do with the merged file. ... which I will have to do tommorow, since I never uploaded the merged structure from my work desktop to compbio... Wed Jul 21 1:00 pm Jenny Draper I'm preparing an Undertaker run on the superpositioned structure (decoys/superimposed-domains.pdb). I tried a superposition using Undertaker; it had the same result as DeepView's magic-fit. Kevin suggested just running the straight merge -- with the horrible overlap of the end of domain 1 with the final helix -- and let Undertaker straighten it out. I'm working on some scaffolding constraints now, to keep the domains in order. Wed Jul 21 4:00 pm Jenny Draper Try5 is now running on croak. I only included the superimposed structure, which scores second-best (below try4-opt2) with the try5 cost function (a good sign...) Th Jul 22 12:30 pm Jenny Draper Try5 looks terrible; it's blown up both domains, though it holds the helix-turn-helix and sheet together. The one thing I do like is the two helices from 99-131. Maybe I can use these in my superpositioning... I'll have to try this tommorow though, since I've got Dr appointments all day today. Fri Jul 23 18:26:44 PDT 2004 Kevin Karplus I made an unconstrained.costfcn, and a try6.costfcn that is similar but adds a single domain-separation constraint. I also picked up a lot of the top hits from both this directory and the 1-95 and 135-end subdirectories and added them to MANUAL_TOP_HITS. I'm running "make extra_alignments" and "make all-align.a2m.gz" to get a rich set of fragments for the next optimization run. Unconstrained, the try2-opt2 model scores best, followed by try1-opt2 and try5-opt2. The extra constraint in try6.costfcn changes the order to try2, try4, try1, try5 Jenny had broken the "str2" script by editing under windows. The automatically created rasmol scripts should NOT be edited---any edits done that way are easily lost in a remake. I added the "dna" definition she was adding to the hand-created "strands" script. It doesn't look like Jenny read in the superimposed-domains.pdb file for try5, which would explain why it did so terribly---that was an ab-intio run. I'll try to get try6 to do roughly what Jenny wanted. Sat Jul 24 10:44:04 PDT 2004 Kevin Karplus try6 is VERY bad---it pulled the whole model apart. The center linker is about how we want it though. Maybe I can do a superposition of the two parts of superimposed-domains on just the linker and use that superposition. The superimpose-domains-2.under script does that fairly successfully. Now, I'll edit down the the superimposed-domains2.pdb file to be a single model and put it in decoys. Wait, there is a problem! The residue numbering is messed up in the second model, even though it was right in T0199.1-95-try1-opt2. Y87 has somehow been changed into F121, Y88 into Y134, E89 into E188, E90 into E202, ... I think that the problem is in undertaker's reading of incomplete PDB files and the crude alignment that is done: the AlignAndSetConformation() routine in ReadPDBCommands.cc Sat Jul 24 11:25:55 PDT 2004 Kevin Karplus I made a crude patch to the global_align routine to use pdb numbers as hints, which seems to have fixed the problem. Sat Jul 24 16:54:24 PDT 2004 Kevin Karplus The try7-opt2 model does not look too bad, though try6 scores better with an unconstrained score file. I'm convinced that try6 is trash, so try7 is currently our best guess, though we could also submit superimposed-domains2.pdb (the edited one in the decoys, not the multiple-model one in this directory). Maybe I'd better rename it one-model-2domains.pdb Rosetta really hates try7-opt2.repack-nonPC. The try7-opt2 and one-model-2domains models are quite similar, but the linker helix is straighter in try7 and has leaned over a bit, bringing the two domains closer together. I should probably do a polishing run to reduce breaks and clashes and call it quits for this one. Sun Jul 25 09:15:08 PDT 2004 Kevin Karplus Other than strand s2, try8 looks pretty good. Maybe try N141-K146-T166 P140-I147-L165 Y134-I153-Y159 It would probably have been better to fix s2 before sticking the decoys together (unless it was broken by subsequent RR constraints---I'll have to go back and check). The unconstrained cost fcn prefers try6-opt2 (which is junk) and try7-opt2 to try8-opt2. The difference in cost between try7 and try8 is tiny, and can be accounted for by slight differences in weighting different components of the cost fcn. Rosetta prefers try6-opt2.repack-nonPC also, though try8 does better than try7. If try9 makes a better half barrel, should I cut out 131-end and redo the superposition to get a better first model? Sun Jul 25 12:45 pm Jenny Draper I'm really unhappy with the linkage between the two domains. I think the helix between 109-81 should be sticking out, not wrapping around the sheet. This structure dimerizes, and I suspect that it does so in a fashion like 1MKM, where the two linker helices cross in an "X". The way try8 is formed, this dimerization would be impossible. Could we try superimposing this on the dimer (1mkm), making this structure a dimer? Take a look at a funky superpositon of the "superimpose-domains-2.pdb" structure that I put together, for an idea of what I'm looking for: T0199/dimer-superimpose-domains-2.pdb From karplus@soe.ucsc.edu Sun Jul 25 13:49:39 2004 Date: Sun, 25 Jul 2004 13:49:38 -0700 From: Kevin Karplus To: learithe@soe.ucsc.edu CC: karplus@soe.ucsc.edu Subject: unhappy dimer T0199 I'm not happy with T0199 either. Take a look at casp6/T0199/decoys/one-model-2domains.pdb That is the result of my superposition and meets most of your criteria. At the moment it is our first model, but if you could fix strand s2 in the second domain (see my notes in README) we could re-superimpose. It might be best to do the work in 131-end, so as not to have the first domain and linker slowing things down. Sun Jul 25 7:30 pm Jenny Draper Running 115-end try2, from the second-domain part of "one-model-2domains.pdb", using try9.under/costfcn, with all domain1 constraints removed. Sun Jul 25 21:51:08 PDT 2004 Kevin Karplus I don't see much difference between try8 and try9---strand s2 doesn't seem to have budged. I'd have to superimpose them to distinguish between them. The try9 costfcn likes try9 better, but the unconstrained one likes try8 better (of course, it loves the terrible try6). Mon Jul 26 12:25:45 PDT 2004 Kevin Karplus The try2 on 115-end doesn't look much better. Sun Sep 19 10:03:55 PDT 2004 Kevin Karplus I put 1stzA in the Makefile as REAL_PDB and evaluated our predictions. Our submitted models are ordered model5, model4, model1, model3, model2. If we insert the robetta models, we get: model5, robetta3, model4, robetta1, robetta4, robetta2, model1, model3, model2, robetta5 None of these models are particularly good (22 Ang rmsd). We are not doing significantly better than robetta on whole-chain rmsd---indeed, robetta's model 1 is better than ours. The model5 rmsd is artificially good, because the model is incomplete. The problem may be with the domain placement, though, rather than bad domains, since superimposed-domains.pdb does do slightly better than model5. It is annoying though, when the automatic methods (try4 and try5) do better than the hand-tweaked models. We do need an evaluation that looks at the domains separately to make any real judgement of how well things worked here. Wed Sep 22 05:11:28 PDT 2004 Kevin Karplus I don't have separate domains yet, but I did look at undertaker-computed GDT scores: model5 23.58% model1 20.59% model2=model3 19.89% robetta3 17.49% robetta2 16.72% robetta1 15.40% robetta4 13.24% robetta5 12.62% model4 10.37% None of these are great, but we are beating robetta. The incompleteness of model5 is still skewing the results, because I'm not computing GDT score quite right---I was normalizing by the number of CA atoms that were present in BOTH conformations, rather than just the number in the real conformation. I'll fix this and rerun. Wed Sep 22 10:28:40 PDT 2004 Kevin Karplus Fixing the bug lead to the correct ordering: model1, model2=model3, robetta3, robetta2, model5 15.48%, robetta1, robetta4, robetta5, model4 and model1 is the best model we created. Having model3 be slightly better than model2 on all-atom rmsd indicates that the Rosetta repacking made a small improvement. Fri Sep 24 12:34:41 PDT 2004 Kevin Karplus Changing smooth_GDT leads to the following: name length missing_atoms rmsd rmsd_ca GDT smooth_GDT model1.ts-submitted 338 0.0000 26.0666 25.4335 -20.4334 -19.4723 model2.ts-submitted 338 0.0000 27.6092 27.0085 -20.4334 -19.2314 model3.ts-submitted 338 0.0000 27.5642 27.0085 -20.4334 -19.2311 robetta-model3.pdb.gz 338 0.0000 23.7348 23.1228 -17.4923 -16.6723 robetta-model2.pdb.gz 338 0.0000 24.8752 24.3236 -16.7183 -15.6841 robetta-model1.pdb.gz 338 0.0000 24.2513 23.6755 -16.0991 -14.8817 model5.ts-submitted 338 1034. 22.8557 21.9430 -15.7121 -14.4855 robetta-model5.pdb.gz 338 0.0000 28.4818 27.7490 -12.6161 -12.0560 robetta-model4.pdb.gz 338 0.0000 24.2827 23.7282 -12.6935 -12.0547 model4.ts-submitted 338 0.0000 24.0005 23.4880 -10.8359 -10.5238 None of the models are very good, but we did beat robetta. We probably have to evaluate this protein in separate domains. Fri Nov 26 17:07:00 PST 2004 Kevin Karplus The assessors broke this into three different domains, with different difficulties: Domain : T0199_1 : CM/hard : NT=74 : 14-87 Domain : T0199_2 : FR/H : NT=134 : 116-142,230-336 Domain : T0199_3 : FR/A : NT=82 : 145-226 The smooth-GDT scores for the whole chain and the 3 domains is #Target best best model1 auto align robetta robetta # sam-t04 submit best 1 T0199 19.4505 19.4505 19.4505 10.5169 14.5432 16.6689 14.8804 T0199_1 73.1510 72.5353 72.5353 43.2916 53.1763 61.3333 56.4955 T0199_2 43.3686 43.0372 43.0372 16.8240 33.0152 39.9706 34.5972 T0199_3 25.7334 25.3971 24.6210 25.3972 15.5119 19.8216 19.8216 As expected, we did well on the CM domain, ok on the FR/H domain, and not so great on the FR/A domain. Interestingly, on T0199_3, we made the final helix longer than the real structure, which has turns at S214 and G206---the helix prediction was pretty strong for this region, so the mistake is understandable.