Tue Jul 13 09:00:38 PDT 2004 T0237 DUE 16 Aug 2004 Tue Jul 13 12:03:21 PDT 2004 The t04 alignment is showing 8 hugely conserved CYS. This sounds like disulfide bridges to me. We'll have to turn on the disulfide scoring. I hope that there is enough similarity to known proteins to disambiguate the bridges, otherwise we'll have 7*5*3 = 105 different pairings to consider. (We may be able to guess them separately, which would reduce the complexity a lot.) Good, it looks like we have a strong hit on 1hn6A, so we should be ok. Tue Jul 13 12:26:28 PDT 2004 Kevin Karplus Not so good as all that. 1hn6A has only 3 disulfides, not 4, and it is an NMR structure with a whole lot of variability. Only a hairpin and a helix near the disulfides are conserved. Still, if the pairing of the disulfides is right for the 1hn6A alignment, then the one remaining pair is forced and we can use SSBond statements to constraint the fold significantly. This is a 2.0 Ang X-ray crystal, not an NMR result, so we aren't going to have that much flop in the model we are trying to match. Tue Jul 13 14:51:02 PDT 2004 Kevin Karplus The first model in the T0237.t2k.undertaker-align.pdb.gz file is the 1hn6A alignment, for which only a small part is reasonable. That part suggests that the disulfide pairing is C409-C392, C390-C407, C402-C346. Wait a sec---the t2k alignment has 13 conserved cys, and this is only 3 of them (390, 402, and 407). There are still C52, C120, C150, C166, C178, C205, C223, C240, C312, and C321---all of which have even more conservation than the ones we matched. We have 9*7*5*3 = 945 ways to pair these CYS residues---that's too many for us to try to build a models for each, even if we automate the ssbond construction. (OK, if we were running on the kilokluster we could do it, but it would take up the kluster for a day or two.) Maybe mutual information will help disambiguate the pairings. The cys residues are not going to get MI values, but maybe some of their neighbors will. Tue Jul 13 16:42:04 PDT 2004 Kevin Karplus The try1-opt1 model is looking pretty scattered. There are bits and pieces of hairpins and helices, but nothing that will help us figure out the disulfide pairings. Thu Jul 29 17:55:36 PDT 2004 Martina Koeva I looked at the mutual information files and the pairs indicated above showed up (not directly, but through neighboring residues). Two additional pairs seemed to appear: C150-C178 and C205-C223. I will attempt to put those 5 pairs in try2. Additionally, I have rescaled the hbond parameters, increased the constraints weight (from 10 to 30) and have added the rr constraints from George's 280.rr.constraints file. Sun Aug 1 15:11:28 PDT 2004 Martina Koeva Something new: looked at one of the original papers solving the structure of 1hn6A. Here is a link to it: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12270711 There is quite a bit of information in it. The paper mostly focuses on domain III of the Apical membrane antigen 1, as expected. However, there is a schematic diagram of all 3 possible subdomains of the Pf AMA1 ectodomain, and more imporantly it shows the position of exactly 8 disulfide bonds (p.2 of paper).Assuming that the 3 pairs that Kevin mentioned above are correct (with that numbering scheme) and assuming that there are no insertions or deletions, if one counts the number of residues from those 3 pairs of cysteine residues back and maps between structures, it turns out that all cysteine residue positions have been completely conserved (relative to each other). The determination of the last three pairs (C346-C402, C390-C407, C392-C409) has been documented in the paper noted above. The determination of the previous 5 cysteine residue pairs has been done in the following paper: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8910611 What this means for T0237 is that the two additional pairs that I thought I had found previously were incorrect and the pairing of the 16 cysteine residues goes as follows: 1.) C150 - C120 2.) C178 - C166 3.) C205 - C52 4.) C312 - C240 5.) C321 - C223 6.) C402 - C346 7.) C407 - C390 8.) C409 - C392 I will set these 8 pairs of residues up as SSBond constraints for try3 for now and see what happens. Also, I think there is a bit more information that could be found in the literature on the structure of the AMA1 protein, so I will try and look for anything else useful that can give us hints in regards to T0237. Mon Aug 2 00:07:56 PDT 2004 Kevin Karplus Don't put too much faith in my guesses at the disulphide mapping---if you have any evidence that contradicts my guesses, go with the evidence. From karplus@soe.ucsc.edu Mon Aug 2 00:17:40 2004 Date: Mon, 2 Aug 2004 00:17:38 -0700 From: Kevin Karplus To: karplus@soe.ucsc.edu, sol@soe.ucsc.edu, ggshack@soe.ucsc.edu, learithe@soe.ucsc.edu, martina@soe.ucsc.edu, bbarnes@ucsc.edu, marcias@ucsc.edu, rph@soe.ucsc.edu Subject: correction on T0237 Oops---on T2037 the number of disulphides is 8, not 4, so the search would have been over 15*13*11*9*7*5*3 pairings----definitely not feasible for us. If some of the pairings we currently are using are not supported by experimental evidence or homology, then the should be left out of the cost function, and better guesses made once the known SS bonds have been inserted. From martina@soe.ucsc.edu Mon Aug 2 01:23:01 2004 MIME-Version: 1.0 Date: Mon, 2 Aug 2004 01:22:59 -0700 (PDT) From: Martina Koeva To: Kevin Karplus Subject: Re: correction on T0237 In-Reply-To: <200408020717.i727Hc4A006048@cheep.cse.ucsc.edu> The "maps" of the 3 pairings that I started off with (which were the ones you had suggested in the README file) were some of the ones mentioned the most in the paper for the structure of 1hn6A. I guess this was the case, because the disulfide bridges in subdomain III were the only ones that had not been confirmed experimentally up to that point. If those were incorrect, the mapping probably would not have worked as well as it did. I think we can more or less safely assume that the experimental evidence points to those pairings. Now, I just need to make those 3 pairs form a bridge. I also just saw another paper that just came out yesterday that confirms that all of the cysteines at least in the homolog Plasmodium f. AMAI protein form disulfide bonds (also experimental evidence). -------------------------------------------------- Mon Aug 2 11:03:17 PDT 2004 Kevin Karplus Try3 forms the 8 ss bonds, but has not folded the rest compactly. I'll start a try4 run on cheep that reduces the number of strand and sheet constraints to just the ones from t04.dssp-ehl2 and adds the rr constraints, in an attempt to pack this little better. Martina might want to start an independent run with guesses about sheet topology, as I have not had time to think about that. Mon Aug 2 14:04:14 PDT 2004 Martina Koeva I spent some more time going through the same original paper and trying to gather information about structure that we can use in T0237. Here is what I think we can use or conclude (directly taken from the paper): 1. All that follows below is only relevant to residues E339 (approx.) - end. This segment of residues is classified to be subdomain III of this protein. All the information is given for the structure of the template, but I have mapped the residue numbers to those of T0237. 2. The N-terminal region of this subdomain is not structured over the first 4 residues (E339-L342) and starts to become more ordered after E343. 3. The structure has a turn of a helix between C346 and K350, followed by a type I beta-turn, centered on E354 and R355 and stabilized by a backbone hydrogen bond. 4. The structured regions E343-R355 and F380-N408 are separated by a largely disordered loop of approximately 25-26 residues. 5. There is a completely conserved sequence P377-S382 (...PRIFIS...) between all Plasmodium sequences. The paper indicates that this stretch adopts an extended (sheet-like) structure, but does not interact with the beta-hairpin. 6. An antiparallel beta-sheet is found between residues E395-S398 and N403-V406: NFYV SIRE The residues between P394 and C407 form a beta-hairpin with a distorted type I beta-turn centered on residues S400 and T401. 7. The last residues (after N408) are largely unstructured. 8. There is another conserved region of sequence, namely S(E)NNEV between residues 418-422. In the case of T0237, the sequence is ENNQV. It is supposed to adopt a bent structure that contains some features of a reverse turn, but does not cause a chain reversal. 9. Surface residues include: D384, S387,S400, T401 and possibly N403. 10.One face of this subdomain is highly charged and has a cluster of negative potential towards the disulphide core. There is a basic cleft centered on: K359, residue 360 (which used to be an R, but in T0237 is a Q), R362, R378 and K389, where the residues come in from both the loop and the structured region. The opposite side of the subdomain is less charged. What does all of this mean for T0237: There is pretty much no structural information apart from the disulfide bonds for the rest of the protein (subdomain I and II). I tried using VAST on try3-opt2, which was a long shot anyway. VAST ID: VS60344 Password: T0237try3 As I discovered, VAST was smart about the search and separated TO237 into 4 subdomains. There were no hits at all for the latter 3 subdomains, and there were only a few hits for subdomain 1, which looked pretty trashy to me. I will go back and take a look at the aligned regions just to double-check, but I am sceptical if that is going to lead anywhere. As far as subdomain III, one of the recurring themes in the paper has been the lack of secondary structure in the majority of this small subdomain. I am still a little overt to believing that. My concern is (could be completely irrational) that such a conclusion is correlated to the nature of the method used (NMR) and the not-so-high resolution. Is that possible? Otherwise, I do believe and like the beta-hairpin element, even though we do not even predict a strand between 395 and 398. The strand there seems to want to form itself. I will put in an explicit strand and sheet constraints for those two antiprallel strands. The template also seems to have a turn of a helix, while our predictions seem to like the idea of extending that helix to another couple of turns (a helix of about 12 residues). Finally, I am inclined to like the idea of the cleft of basic residues, which in T0237 does not show up as a cluster yet in try3. I will include those in try5 tonight. Mon Aug 2 17:05:38 PDT 2004 Martina Koeva Try4 looks like it just finished and it strikes me as somewhat more compact, but it could be because I have been staring at this structure for too long. I am not seeing the cleft residues clustering yet. The goal for try5 will be to include explicit sheet constraints for the beta hairpin, as well as attempt to cluster the basic residues noted above. However, I do need more secondary structure (possible sheet) conjectures, so I will focus next on subdomain I, since we have a couple of more or less strongly predicted strands. If I get some more sheet conjectures tonight, I will start try5 with them too. Mon Aug 2 21:49:32 PDT 2004 Martina Koeva I started try5 with a few sheet constraints. It's a little bit of a shot in the dark, but I will have to wait and see what the structures show. I have not included yet the cluster of positively charged residues in the small cleft in subdomain III. Might need include that in try6. Wed Aug 4 02:11:35 PDT 2004 Kevin Karplus T0237.try4 looks rather horrible, lacking even the hairpins of try1. Maybe I should pick up the sheet constraints from try1-opt2. Thu Aug 5 13:56:01 PDT 2004 Martina Koeva I can't see anything that I was really looking for. Try5-opt2 scores worse than both try4 and try3 (with try3 scoring the best) with the try5 cost function. Try4-opt2 also scores worse than try3 with the try4 cost function. One thing that I noticed in the robetta models is a couple of hairpins that I quite like and that fit with our ehl2 predictions. I am putting in those as sheet constraints for try6. I am also including the sheet constraints from try1-opt2: SheetConstraint R207 K208 N212 G211 hbond K208 SheetConstraint N301 D306 N316 K311 hbond W302 SheetConstraint L322 N324 I329 N327 hbond I323 SheetConstraint E395 S398 V406 N403 hbond E395 Hmm, never mind! The sheet constraints from the robetta model 3 and from try1-opt2 actually turn out to overlap, so the 4 constraints from are the only one included for try6. I've also increased the constraints weight from 10 to 30. Sat Aug 7 15:45:07 PDT 2004 Kevin Karplus try6 has some hairpins and the disulphides. Perhaps we need to increase the break costs so that the backbone is not so shattered---there are some truly horrific breaks (like 16 before K113 19 before C150, 21 before F179, 49 before C205, 51 before P206, ..). I'll also include the T0237.t04.many.frag file in try7.under, after redoing the make to create it. This will mean re-creating the Template.atoms file also, so after try7 we have to remember to comment out the output of Template.atoms again. I also added a few more of the very weak hits to MANUAL_TOP_HITS and remade "extra_alignments" and "all-align.*" to try to get some more long fragments to use. Sun Aug 8 09:38:32 PDT 2004 Kevin Karplus Although try7 scores better than try6, it still doessn't look much like a protein. Several of the helices have unwound, and nothing is compact. There are still bad breaks, though none as horrendous as in try6. Perhaps the helix constraints, dry12, and phobic_fit parameters should be increased. I'll leave this one for Martina to work on. Mon Aug 9 16:43:05 PDT 2004 Martina Koeva This is probably going to be the last attempt before maybe breaking the protein into subdomains. I have raised all strand, helix, sheet and rr constraints in the try8 cost function. I have increased the break weight even further, turned down sidechain weight, turned up wet6.5 and all dry weights. Finally, I have also increased the phobic_fit weight. I have commented out again the Template.atoms file. Wed Aug 11 17:00:52 PDT 2004 Martina Koeva I have decided to split the subdomains in the following way: P1-V219 F215-E339 L334-L445(end) As a first attempt, I have made the starter subdirectory for the first subdomain. If everything works out fine, I will need to do the other two later tonight. Thu Aug 12 00:28:22 PDT 2004 Martina Koeva All of the subdirectories have been created and the initial runs have been started with the disulphide constraints already put in for try1. I have rescaled the hbond parameters in try1.costfcn, as well as have included 'known_ssbond'. From now on I will be commenting both in the main README file, as well as the subdirectory README files. Thu Aug 12 15:47:34 PDT 2004 Martina Koeva I started all try2s in each subdirectory and looked at the initial models. It seems that in each subdirectory the models are showing improvement in secondary structure formation, but as a downside both subdomain I and III are not forming the disulphide bonds, even though I had already specified explicit SSBond constraints in try1.costfcn for each subdomain. From martina@soe.ucsc.edu Thu Aug 12 20:30:51 2004 MIME-Version: 1.0 Date: Thu, 12 Aug 2004 20:30:50 -0700 (PDT) From: Martina Koeva To: Kevin Karplus Subject: T0237 In-Reply-To: <200408130256.i7D2uM2L021083@cheep.cse.ucsc.edu> I was wondering whether you can take a look at T0237 (the big new fold with the 16 Cys) at some point. I have split it into subdomains and I am pretty happy with subdomains II and III, given that I've done two tries on each. There is quite a bit to work on in subdomain I. I was wondering though if you would have any suggestions on it? Thank you! Martina From karplus@soe.ucsc.edu Fri Aug 13 16:02:34 2004 Date: Fri, 13 Aug 2004 16:02:32 -0700 From: Kevin Karplus To: martina@soe.ucsc.edu CC: karplus@soe.ucsc.edu In-reply-to: (message from Martina Koeva on Thu, 12 Aug 2004 20:30:50 -0700 (PDT)) Subject: Re: T0237 In domain 1, If you are having trouble forming disulfides in the T0237 subdomains, it may be because you still have InitMethodProbs ... InsertSSBond 0 \ ImproveSSBond 0 The "InsertSSBond" operator is almost certainly the one that caused the ssbonds to be formed (at the expense of almost everything else) in the main directory. Set their initial weights to 1 or 2, and they should start being used. In domain2, it looks like you might want to add a hairpin: SheetConstraint A287 N289 K295 N292 hbond N289 You might also want to strengthen your strand constraints relative to the helix constraints. You could probably drop knwon_ssbond back down to 1, but increase the wet and dry weights, and reduce the sidechain weight to 1 SetCost wet6.5 10 near_backbone 5 way_back 5 dry5 15 dry6.5 25 dry8 15 dry12 5 ... In domain3, I don't see any constraints to add, but you might want to tweak the weights as for domain 2. Sat Aug 14 00:06:29 PDT 2004 Martina Koeva I have incorporated all of the above suggestions into the try3s for the appropriate subdomains. If I manage to get better packing on the first subdomain, I will try to make try4 an optimization run from existing models, so that I can have enough time to put the protein back together and optimize the whole structure. Sat Aug 14 02:09:53 PDT 2004 Martina Koeva Wow, that was quite fast. The third try for subdomain III is done and the other two have already generated their opt1 models, so I will be able to start an optimization run from the existing models in the morning. As I can see in the structures that have already been generated for try3 all disulphides in each subdomain are forming. Sat Aug 14 14:46:31 PDT 2004 Martina Koeva I have started try4 on each subdomain from previous models. Once those are done I can try putting the structure back together and optimizing. Subdomain 1 still doesn't look very structured, but we are getting some strands into sheets. Subdomain II and III look pretty decent for being separate pieces of a whole structure (except foamyness). I still need to put in as constraints the cluster of positively charged residues in subdomain III. Sat Aug 14 21:31:12 PDT 2004 Martina Koeva Hmm, try4 for subdomain I is not done yet. The other two have finished and they do not look very polished. Both subdomains II and III try4-opt2 models look quite foamy, but at this point there isn't much time left, so I will wait for try4 in subdomain I to finish and superpose them. Sun Aug 15 05:11:33 PDT 2004 Martina Koeva Ok, so all try4s finished. I superimposed them and through cutting and pasting made a chimera model. It has some terrible clashes, but I am hoping that the try9 run will be able to fix that. I am starting only from the chimera model. With its own cost function the chimera model scores the worst right now, but I am keeping fingers crossed that try9 (running on peep) will score reasonably well. Once the undertaker run finishes, I will put in the README file the suggestions for models to submit. Sun Aug 15 13:25:49 PDT 2004 Martina Koeva Try9 is finished and even though the structure still looks pretty bad, it looks better than before. Try9-opt2 now scores the best with its own cost function, as well as the unconstrained function (I wonder whether the unconstrained cost function looks all right? Do I keep in at least the SSBonds explicitly in? After looking at the uncostrained function for T0238, I think I need to keep the SSBond constraints explicitly in.) For try10, I will start from all existing models. It looks like the dry weights are already up, I will raise near_backbone and way_back too. Phobic_fit also seems to be quite high, but maybe I can raise it a little more. I can also rasie the dry weights a little more. Finally, I am also including a few constraints for the cluster of basic residues. Sun Aug 15 13:50:13 PDT 2004 Martina Koeva Try10 is running on peep. I think we should submit: try10-opt2 (when it is done, it should be scoring better than try9 with the unconstrained function) try1-opt2 (fully automated one) T0237-1hn6A-t2k-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5 try8-opt2 (most compact model from before splitting into subdomains...it does not score as well as try7,try3,try4 with the unconstrained function,but it does have a more compact core) try9-opt2 (scores best with unconstrained function, but it probably will be scoring slightly worse than try10-opt2 when it is done. However, it is the first model after putting the subdomain back together.) ---------------------------------------------------------------------------- From karplus@soe.ucsc.edu Sun Aug 15 14:47:06 2004 Date: Sun, 15 Aug 2004 14:47:05 -0700 From: Kevin Karplus To: martina@soe.ucsc.edu CC: karplus@soe.ucsc.edu Subject: T0237 I'd like to submit T0237 in the next few hours. You've given me a list: try10-opt2 best unconstrained try8-opt2 most compact before splitting into subdomains try1-opt2 full auto T0237-1hn6A-t2k-local-str2+CB_burial_14_7-0.4+0.4-adpstyle5 You also gave me try9-opt2, but that is expected to be very similar to try10-opt2. Is there another possibility that gives us more diversity---a distinctly different prediction, even if it not quite as good? ---------------------------------------------------------------------------- Date: Sun, 15 Aug 2004 14:57:12 -0700 (PDT) From: Martina Koeva To: Kevin Karplus Cc: martina@soe.ucsc.edu Subject: Re: T0237 Try10-opt2 should be done hopefully in the next few hours. Try7-opt2 is the one that scores the best with the unconstrained function) from the earlier ones (before the split into subdomains). It is not as compact and it has unwound some of the predicted helices, but it does not have as many, and as bad breaks as all models before it. So maybe try7-opt2, instead of try9-opt2? -------------------------------------------------------------------------- Sun Aug 15 16:34:34 PDT 2004 Martina Koeva Try10-opt1 has been generated and as expected, it is already scoring a little better than try9-opt2 with the unconstrained cost function. Sun Aug 15 17:35:34 PDT 2004 Kevin Karplus When try10 finishes, I'll do a submission, but I think I'll want to do another run with just the ssbond constraints. I'm particularly worried about the enormous weight on the rr constraints, which are not THAT reliable. Also, the .under file should include the alignments and fragments from the subdomains, not just the whole protein. I'm unlikely at this point to find anything new, but perhaps we can get a better packing of the current domains. Before I do that, I'll add all the "reasonable" hits from the subdomains to the MANUAL_TOP_HITS lists, and make extra_alignments. Sun Aug 15 21:52:07 PDT 2004 Kevin Karplus try11-opt1 over try10-opt2 made a much bigger difference in the unconstrained costfcn than try10-opt2 over try9-opt2. Unfortunately, I put too many jobs on abyss, so they are all running a bit slowly. I'll have to hope that try11-opt2 is ready in the morning! Otherwise I'll have to submit try11-opt1 (which is still better than try10-opt2.) From baertsch@soe.ucsc.edu Mon Aug 16 14:30:03 2004 MIME-Version: 1.0 Date: Mon, 16 Aug 2004 14:30:02 -0700 (PDT) From: Robert Baertsch To: Kevin Karplus cc: Martina Koeva Subject: malaria Kevin, I think Res 409-417 on Target 237 is possibly the active site for the protein. It is a hydrophilic alpha helix and seems unlikely to add to the structure. Perhaps it slips into some cleft in the human protein. What type of surface would bind to it strongly? -Robert