30 July 1998 Wu-blast finds obvious homologs 1deg -58.34 1osa 1ajiA=1cdl[ABCD]=1cdmA=1cfc... -58.09 1osa 2cln=3cln=1dmo -57.85 1osa 2mysC -57.38 1wdcC 1ahr -56.87 1osa Damn calmodulins again! This time the homologies are very strong though, so it shouldn't be as hard a choice as it is for t74. Double-blast gets the same sequences in the same order with the same scores. Already by t76.t98_2 there are 34 3D structures in the alignment. Of course, t76.t98_6 finds all the EF-hands, starting with 2bbmA, 2bbnA at -136.64 The 1osa-t76-const-global alignment looks pretty good, but we may want to align to 1deg, which scores best with wu-blast. 31 July 1998 The 1osa and 1deg alignments essentially agree. The 1deg alignment has more residue identities, but is only c-alphas, so the placement of sidechains can't be any more accurate from it. Both alignments have disruptions in the first Ca-binding pocket of each domain, and both have a region in the middle of the long helix that is disrupted: PNGFDMPGDP is unlikely to form a continuous helix. The target alignments have pulled in way too many close homologs, which swamps out the signal we want to see. The myosins (which have the two domains close together with a loop in the middle of the long helix that joins the calmodulins) may be a better bet than the calmodulins, even though 1wdcC does not have as good a match to the EF-hand domain itself. Perhaps we need to piece the model together from partial matches, taking the EF-hand domains from the calmodulins, and the central section from the myosins. The 2mysC template fits very well (2mysC-hand.a2m) with 58 residue identities, and only 1 gap: T0076 st6sp--------YKQAFSLFDRHGTGRIPKTSIGDLLRACGQNPTLAEITEIESt...... K AF LFDR G I GD RA GQNPT AEI I 2mysC e....FSKAAADDFKEAFLLFDRTGDAKITASQVGDIARALGQNPTNAEINKILGnpskeem T0076 lpAEVDMEQFLQVLNRPNGFDMPGDPEEFVKGFQVFDKDATGMIGVGELRYVLTSLGEKLSNEE A E FL L G E FV G VFDK G ELR VL LGEK EE 2mysC naAAITFEEFLPMLQAAANNKDQGTFEDFVEGLRVFDKEGNGTVMGAELRHVLATLGEKMTEEE T0076 MDELLKGVPVKDGMVNYHDFVQMILAN EL KG G NY FV I 2mysC VEELMKGQEDSNGCINYEAFVKHIMSV The gap is at the beginning of the long helix, and is probably closed by bringing the two domains closer together, since there is not a long helix for t76 to wrap around, as there is in 2mysC. The weakest homology is to the first binding pocket of the second domain. Note: there is a gap in the structure for 2mysC between GQNPTNAE and IKNILGN: REMARK 6 THIS ENTRY CONTAINS THE COMPLETE COORDINATES FOR THE MYOSIN REMARK 6 HEAVY CHAIN OF CHICKEN SKELETAL MYOSIN SUBFRAGMENT-1 AND REMARK 6 C-ALPHA'S FOR THE TWO LIGHT CHAINS. THE LATTER ARE STILL REMARK 6 NOT DEFINED APPROPRIATELY AT THIS TIME. Unfortunately, 1mysC has the same tracing errors. We might need to fill in from some other structure! No, it's worse than that---the whole chain seems to be mistraced, with residues assigned in the wrong places. The other myosin regulatory chains in SCOP are 1wdcC and 1scmC. The 1wdcC alignment has 4 small gaps, 1 insertion, and only 44 identical residues. Mon Aug 3 16:08:00 PDT 1998 Break into 3 regions: 1st domain: through PAEVD central helix: MEQ through VKGFQ 2nd domain: VFDKD to end The t76-1osa-vit alignment has 51 conserved residues; 1st domain: 19, 4-residue gap in second binding pocket central helix: 6, 2-residue insert 2nd domain: 26, 1-residue gap in second domain. The 2-residue insertion in the middle of the central helix looks like a turn (PNG), so I'd want to break the prediction at that point. The 1wdcC-t76-const-global alignment has 44 conserved residues: 1st domain: 13, gaps in both binding pockets central helices: 5, 1-residue insert 2nd domain: 26, 2 gaps in 2nd binding pocket. The central helix bends in about the right place for the PNG in the sequence, but overall is no better match than the straight helix of 1osa. From t98_6, I extracted the subtree containing T0076 and the closest homologs. This seems to do a better job of predicting secondary structure than the other alignments I chose, though I should probably retrain it starting from t76, so that the unalignment of t76 doesn't disrupt the prediction. T0076 is GP:YSPCDC4G_1--the closest proteins are myosins. I'll try rerunning the scoring and the 2ry structure prediction with t98_6_subtree.retrain.a2m.gz as the main alignment. Rats--the retrained alignment STILL has the same 13 residues unaligned, though T0076 was aligned just fine in tmp_3-a.mult during the process of building the alignments. Either the final buildmodel or the final hmmscore screwed things up. Perhaps I need to retrain again with different parameters? With the new model, the top-scoring one is 1almC, then 2mysC, 1scmC, 1wdcC, 1clm, 1osa. I don't remember seeing 1almC before! No wonder---it is a theoretical model, based on 2mys. Still, it might provide a cleaner prediction than 1wdcC, or the mistraced 2mysC. It does seem to have a very fine alignment, with only one gap in the first domain. I'll have to check to see if it is in agreement with the 1wdcC alignment, where they can be checked (to make sure that they haven't inherited the slippage in the 2myC file). 4 August 1998 1almC alignment has 57 conserved residues: 1st domain: 22, 6-residue gap in second binding pocket central helices: 9, no gaps 2nd domain: 26, no gaps T0076 STDDSPYKQAFSLFDRHGTGRIPKTSIGDLLRACGQNPTLAEITEIESTLP.....AEVDMEQF K AF LFDR G I GD RA GQNPT AEI I E F 1almC -EQQDDFKEAFLLFDRTGDAKITLSQVGDIVRALGQNPTNAEINKILGNPSke6naKKITFEEF T0076 LQVLNRPNGFDMPGDPEEFVKGFQVFDKDATGMIGVGELRYVLTSLGEKLSNEEMDELLKGVPV L L G E FV G VFDK G ELR VL LGEK EE EL KG 1almC LPMLQAAANNKDQGTFEDFVEGLRVFDKEGNGTVMGAELRHVLATLGEKMTEEEVEELMKGQED T0076 KDGMVNYHDFVQMILAN G NY FV I 1almC SNGCINYEAFVKHIMSV From compbio.casp-request Tue Aug 4 10:05:24 1998 Return-Path: karplus@cse.ucsc.edu Date: Tue, 4 Aug 1998 10:05:22 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: dilemma! on t76 I have a problem on T76 that I'd like advice on. There is a very clear homology of target T0076 to the myosin regulatory chain from chickens (2 calmodulin-style EF-hand domains joined by a bent central helix). Unfortunately the structures for this myosin (1mys, 2mys) have errors in tracing this chain (the researchers were more interested in other chains, I guess). I have several choices: 1) I can use a more distant homology to myosin from bay scallops, which has several inserts and deletions. 2) I can take the EF hand domains from a fairly similar calmodulin and the center connecting part from a myosin. 3) I can align to 2mysC and ignore the obvious chain tracing error. 4) I can provide a prediction from 1almC, a theoretical model based on 2mys, but which I think fixes the chain tracing errors. The 1almC match is really very close---it has 57 conserved residues: 1st domain: 22, with 6-residue gap in second binding pocket central helices: 9, no gaps 2nd domain: 26, no gaps All the other alignments I've looked at have slightly lower residue identities, have gaps in the 2nd binding pocket of the first domain, and most also have gaps or insertions for the central helix region and gaps in the second domain. Perhaps someone else could look at the papers mentioned in 1alm.pdb to help me decide whether to trust this model, or to align only to genuine x-ray or NMR models. Kevin Wed Aug 5 18:06:51 PDT 1998 I looked at 1almC today and it seems to have residue numbering that corresponds to 1osa. I would want to chop out the aligment to STLP in the first domain (next to the gap), since I don't believe it is going to be headed off in that direction without the long insert. None of the homologs in the t76.t98_6 have a matching length loop in this region, so our best bet may just be to leave it unaligned. Fri Aug 7 16:12:55 PDT 1998 I decided to go with 1almC and leave STLPAE unaligned.