21 July 1998 Kevin Karplus Single blast finds no strong hits (1cmf, 1cmg, 1trcA, 1trcB at -4.9). Double-blast gets a somewhat stronger signal: 1cm4[ACDEG], 1trc[AB], 1lin, 1cm1A, 1cdl[ABCD], 1cll, 1cfd, 1ajiA, 1cfc, 1cmf, 1cmg, 1cdmA, 1ctr, all at -10.77 1tcoB, 1auiB at -10.319 1ahr -10.03 ... FSSP reps 1cm4A, 1osa, 1ncx 1auiB 1osa All these are calcium-binding domains, and quite similar (except for 1cm4A, which is NOT similar to the other calcium-binding domains according to FSSP). The t98 alignments bring in 3D sequences by t74.t98_5: 1ajiA (twice), 1ahr (twice), 1bod, 1clm (twice), 1deg (twice), 1trcA, 1trcB, 3cln (twice), 1cdlA (twice), 1cdmA2, 1trf, 1ctr2, 1cmf, 1lin (twice), 1tnp, 1tnx1, 1tnx2. t74.t98_6 finds lots of PDB files (125 with scores better than -10). The top ones are 2bbmA, 2bbnA, and 4cln (1osa as FSSP rep). Somewhat surprisingly, the 1cm4A, 1osa, 1ncx, 3cln, ... models are NOT the highest scoring template models. They are fairly decent scores (around -9.7), but there are several higher ones with template models: chain score fssp rep 2scpA -13.190 2scpA 1symA -13.110 1cfpA 1cb1 -12.920 5icb 5icb -12.740 5icb 1tcoB -12.270 1auiB 4cpv -12.230 1rro 5pal -12.060 1rro 1rro -11.960 1rro 1cfpA -11.730 1cfpA 1auiB -11.400 1auiB 1rec -11.320 1rec 1cnpA -11.240 5icb 4icb -10.920 5icb 3icb -10.560 5icb 1ponB -10.500 1ponB (not similar to rest according to FSSP) 1cm4A -9.950 1cm4A 1osa -9.730 1osa 1ncx -9.640 1ncx 3cln -9.590 1osa 2mysB -9.570 1wdcB 2sas -9.570 2sas 21 July 1998 Summing both ways puts more weight on the target model, so 1cm4A, 3cln, 1osa, 2bbmA, 2bbnA, 4cln, 1dmo, 2cln, ... come out on top. All the top hits bind calcium, and all except 1ponB have similar structures according to FSSP. The t74.t98_6 alignment has dropped a LOT of the residues to t74, from 98 down to 23:95 (22 lost from beginning 3 from end). May have to redo the search with t74.t98_4, which does not have all the calcium-binding structures already in the alignment Using the t74.t98_4 model, which has not yet pulled in so many calcium-binding domains, the top scorers are 1ahr -16.410 1osa 2bbmA -16.390 2bbnA -16.390 1osa 4cln -16.390 1osa 1cmf -16.330 1ncx 1cmg -16.330 1osa 1trcA -16.330 1osa 1trcB -16.330 1osa 1cfc -15.800 1ncx 1cfd -15.800 1ncx 1cll -15.800 1osa 1cm1A -15.800 1osa 1cm4A -15.800 1cm4A 1cm4C -15.800 1cm4A 1cm4E -15.800 1cm4A 1cm4G -15.800 1cm4A 1ctr -15.800 1lin -15.800 1osa 1cdlA -15.760 1osa 1cdlB -15.760 1osa 1cdlC -15.760 1osa 1cdlD -15.760 1osa 1ajiA -15.660 1cdmA -15.660 1osa 1tcf -15.380 1tn4 -15.380 1ncx 2tn4 -15.380 1ncx 1osa -15.360 1osa 1clm -15.330 1osa 1dmo -14.610 1osa 1tnc -14.610 2cln -14.610 3cln -14.610 1osa 1deg -14.340 1osa 1ak8 -14.130 1osa 21 July 1998 Summing both ways makes the top ones t74 1cm4A -25.75 1cm4A t74 1osa -25.09 1osa t74 1rro -24.21 1rro t74 3cln -24.2 1osa t74 2scpA -23.78 2scpA t74 1rec -23.65 1rec t74 1tcoB -22.56 1auiB t74 1ncx -22.03 1ncx t74 1auiB -21.69 1auiB t74 5pal -20.45 1rro t74 1symA -20.25 1cfpA t74 1cfpA -19.44 1cfpA t74 5icb -19.12 5icb t74 1ponB -18.99 1ponB t74 1cb1 -18.13 5icb t74 1wdcC -17.35 1wdcC There are 13 calcium-binding domains near the top of the list of hits (collected into t74-possible.ids). I'll build a constrained alignment from the fssp file for each of them, and see how each of the constrained alignments scores the sequence. To save time, I won't build joint models for each of the targets. The constrained alignments haven't finished building yet, but using the target98 alignments, the best score for T0074 comes from 5icb-t74-global (or 5icb-t74-post). The constrained alignments that are built score t74 better than the corresponding global , post, or viterbi alignments, so we may get a better alignment later. 22 July 1998 Here are the scores for the target with all the template models I've built: 5icb/5icb-t74-global T0074 -18.94 -19.50 5icb/5icb-t74-post T0074 -20.33 -19.50 1cfpA/1cfpA-t74-global T0074 -17.35 -18.36 1cfpA/1cfpA-t74-post T0074 -18.73 -18.36 1cfpA/1cfpA-t74-const-global T0074 -17.75 -16.27 2scpA/2scpA-t74-const-global T0074 -12.24 -15.97 1rro/1rro-t74-const-global T0074 -15.84 -15.36 2scpA/2scpA-t74-vit T0074 -16.60 -15.26 1rro/1rro-t74-global T0074 -16.28 -15.21 1rro/1rro-t74-post T0074 -17.67 -15.21 5icb/5icb-t74-const-global T0074 -15.33 -15.01 2scpA/2scpA-t74-global T0074 -13.23 -14.87 2scpA/2scpA-t74-post T0074 -14.62 -14.87 1cm4A/1cm4A-t74-const-global T0074 -12.15 -14.80 1auiB/1auiB-t74-const-global T0074 -12.79 -14.55 1rec/1rec-t74-const-global T0074 -10.73 -14.55 1rec/1rec-t74-global T0074 -10.97 -14.22 1rec/1rec-t74-post T0074 -12.36 -14.22 2sas/2sas-t74-const-global T0074 -11.63 -14.07 1cm4A/1cm4A-t74-global T0074 -11.15 -13.91 1cm4A/1cm4A-t74-post T0074 -12.54 -13.91 1auiB/1auiB-t74-global T0074 -11.73 -13.77 1auiB/1auiB-t74-post T0074 -13.12 -13.77 1rro/1rro-t74-vit T0074 -14.99 -13.72 2scpA/2scpA-t74-fssp-global T0074 -10.26 -13.39 1osa/1osa-t74-const-global T0074 -11.28 -13.29 2sas/2sas-t74-global T0074 -11.57 -13.12 2sas/2sas-t74-post T0074 -12.95 -13.12 1osa/1osa-t74-global T0074 -11.68 -13.10 1osa/1osa-t74-post T0074 -13.07 -13.10 1ponB/1ponB-t74-global T0074 -13.94 -12.94 1ponB/1ponB-t74-post T0074 -15.33 -12.94 1ponB/1ponB-t74-const-global T0074 -11.06 -12.93 1auiB/1auiB-t74-vit T0074 -14.59 -12.85 1wdcC/1wdcC-t74-global T0074 -8.86 -12.68 1wdcC/1wdcC-t74-post T0074 -10.25 -12.68 1wdcC/1wdcC-t74-const-global T0074 -9.00 -12.60 1ncx/1ncx-t74-const-global T0074 -9.57 -12.36 1ncx/1ncx-t74-global T0074 -9.67 -12.34 1ncx/1ncx-t74-post T0074 -11.06 -12.34 1rec/1rec-t74-vit T0074 -13.19 -12.18 5icb/5icb-t74-vit T0074 -13.51 -12.12 1wdcB/1wdcB-t74-const-global T0074 -9.37 -11.81 1wdcB/1wdcB-t74-global T0074 -7.54 -11.68 1wdcB/1wdcB-t74-post T0074 -8.93 -11.68 2sas/2sas-t74-vit T0074 -13.65 -11.52 1cm4A/1cm4A-t74-vit T0074 -11.83 -11.16 1osa/1osa-t74-vit T0074 -12.05 -10.99 1cfpA/1cfpA-t74-vit T0074 -12.48 -10.90 1wdcC/1wdcC-t74-vit T0074 -10.41 -10.88 1ponB/1ponB-t74-vit T0074 -11.52 -10.50 1ncx/1ncx-t74-vit T0074 -11.86 -10.47 1wdcB/1wdcB-t74-vit T0074 -8.54 -8.62 22 July 1998 The match with 5icb-t74-global is full length, and the gaps occur in places where the fssp alignment has gaps also, so looks quite promising. There is a gap in a loop, the end of a helix is missing KPVL---LN KLLLQTEPS and another 1-residue gap between a loo and the end of a helix. All together, 16 residues are conserved. The first 6 and last 24 residues are not aligned to anything. The 5icb-t74-const-global match is very similar, but moves the first gap over one, getting 17 conserved residues. The 1cfpA-t74-const-global alignment has 13 conserved residues, but is missing an important turn. I can extend the alignment to include the initial helix, but not with any significant number of conserved residues, though the alignment ....PWAVKPEDKAKYDAIFDSLSpvNGFL mselEKAVVALIDVFHQYSGREGD..KHKL conserves AV in positions that touch on the dimer. The 2scpA/2scpA-t74-const-global alignment has a big gap between the first helix and the rest, but that helix can be slid over, getting a total of 12 conserved residues. The missing first half of 2scpA makes this structure somewhat unlikely. The 1rro/1rro-t74-const-global alignment has a nasty gap at QSDAR that would require turning the 3-10 helix back into an alpha helix to close the gap. There are 14 conserved residues, though I can get one or two more by aligning the first helix. The 1cm4A/1cm4A-t74-const-global alignment, like the 2scpA one, has a big gap after the first helix. Sliding the first helix up gives a pretty good set of conserved residues (14 before moving the helix, 18 afterwards), but the initial residues stick out into space, since the earlier helices of 1cm4A are missing. The 1auiB/1auiB-t74-const-global alignment has 12 conserved residues, but a rather nasty mid-helix gap, followed by another gap in the loop at the end of the helix. The 1rec/1rec-t74-const-global alignment has 9 conserved residues and some very big gaps. Many of the other top alignments are good matches, but only for part of the sequence. Wed Jul 22 1998 I looked at the SCOP classification for the predicted proteins, and found they were in the EF-hand superfamily. 3.Fold: EF Hand-like core: 4 helices; array of 2 hairpins, opened; 4.Superfamily: EF-hand Duplication: consists of two EF-hand units: each is made of two helices connected with calcium-binding loop Families: 1.Calbindin D9K (2) made of two EF-hands only 2.S100 proteins (3) dimer: subunits are made of two EF-hands 3.Osteonectin (1) 5 helices; two EF-hands plus one of additional helices in the N-terminal part 4.Parvalbumin (5) 6-helices; array of 3 hairpins, closed made with two-helical hairpin and two EF-hands 5.Calmodulin-like (18) Duplication: made with two pairs of EF-hands 6.EF-hand modules in multidomain proteins (2) The predicted secondary structure has only 4 helices, so Calbindin or S100 seems most appropriate. Calbindin includes 1boc, 1bod, 1cb1, 1cdn, 1clb, 2bca, 2bcb, 3icb, 4icb, 5icb, 6icb (all with fssp rep 5icb). There are only 2 proteins here, bovine and porcine calmodulin D9K, but there are many mutants of the bovine one, and there are structures with and without calcium. The S100 proteins are 1cnp[AB], 1sym[AB], 1cfp[AB]. Of these, 1cfpA scores the best with the target model, and we've already seen that its alignment is not as good as the 5icb one. So which of the Calbindins do we want? The highest scoring one with the target model is 1bod, but the alignment does not look any better than the 5icb one. Wed Jul 22 12:28:29 PDT 1998 I rebuilt all the alignments using target98-pdb rather than the older target98 alignments. I don't expect this to make much difference, but 1bod only had the newer alignments built for it. Here are the top 25 scores for the new alignments: 5icb/5icb-t74-global T0074 -18.93 -19.32 5icb/5icb-t74-post T0074 -20.32 -19.32 1cfpA/1cfpA-t74-global T0074 -18.85 -19.18 1cfpA/1cfpA-t74-post T0074 -20.24 -19.18 1cfpA/1cfpA-t74-const-global T0074 -17.75 -16.27 1bod/1bod-t74-global T0074 -17.49 -16.20 1bod/1bod-t74-post T0074 -18.88 -16.20 2scpA/2scpA-t74-const-global T0074 -12.24 -15.97 1rro/1rro-t74-const-global T0074 -15.84 -15.36 5icb/5icb-t74-const-global T0074 -15.33 -15.01 1rro/1rro-t74-global T0074 -15.79 -14.88 1rro/1rro-t74-post T0074 -17.17 -14.88 1cm4A/1cm4A-t74-const-global T0074 -12.15 -14.80 1auiB/1auiB-t74-const-global T0074 -12.79 -14.55 1rec/1rec-t74-const-global T0074 -10.73 -14.55 1rec/1rec-t74-global T0074 -10.09 -14.50 1rec/1rec-t74-post T0074 -11.47 -14.50 2sas/2sas-t74-global T0074 -11.93 -14.46 2sas/2sas-t74-post T0074 -13.31 -14.46 1auiB/1auiB-t74-global T0074 -11.95 -14.18 1auiB/1auiB-t74-post T0074 -13.34 -14.18 1osa/1osa-t74-global T0074 -11.31 -14.08 1osa/1osa-t74-post T0074 -12.69 -14.08 2sas/2sas-t74-const-global T0074 -11.63 -14.07 2scpA/2scpA-t74-global T0074 -10.18 -14.03 The order is identical to before except for the introduction of the 1bod alignments (the scores are slightly different). The 5icb alignemnts look a bit better than 1bod ones. I unaligned one residue to make the gap better to get 5icb-t74-const-hand Note: our current alignment modifies one of the Calcium-binding loops. Question: does this protein bind one calcium or two? Hmm---just noticed that there are TWO very similar adjacent domains in EP15_HUMAN, of which t74 is the second. I'll try doing a prediction for the pair of domains (in ep15), and see if that gets a different best structure. ep-dom-blast puts 2bbmA, 2bbnA, 4cln first (-5.52) ep-dom-double-blast puts 1ajiA, 1cdlA, 1cdlB, 1cdlC, 1cdlD, 1cdmA, 1cfc, 1cfd, 1cll, 1cm1A, 1cm4A, 1cm4C, 1cm4E, 1cm4G, 1cmf, 1cmg, 1ctr, 1lin, 1trcA, 1trcB first, all with -10.77. The ep-dom.t98_4 model finds 1tcf=1tn4=2tn4 (-17.34), 2bbmA=2bbnA=4cln (-17.19), 1cfc=1cfd=1cll=1cm1A=1cm4A=1cm4C=1cm4G=1ctr=1lin (-16.9) 1cdl[ABCD] (-16.87), 1ahr (-16.83) These are mainly 1osa and 1ncx fssp reps. The best models for ep15-dom (based on how the sequence was scored) are 2scpA/2scpA-ep15-dom-global ep15-dom -25.21 -28.36 2scpA/2scpA-ep15-dom-post ep15-dom -26.60 -28.36 2sas/2sas-ep15-dom-global ep15-dom -25.24 -27.16 2sas/2sas-ep15-dom-post ep15-dom -26.62 -27.16 1cm4A/1cm4A-ep15-dom-const-global ep15-dom -24.87 -27.04 1auiB/1auiB-ep15-dom-global ep15-dom -24.99 -26.75 1auiB/1auiB-ep15-dom-post ep15-dom -26.37 -26.75 2sas/2sas-ep15-dom-const-global ep15-dom -24.56 -26.51 1osa/1osa-ep15-dom-global ep15-dom -24.12 -26.50 1osa/1osa-ep15-dom-post ep15-dom -25.50 -26.50 1cm4A/1cm4A-ep15-dom-global ep15-dom -23.85 -26.13 1cm4A/1cm4A-ep15-dom-post ep15-dom -25.24 -26.13 2scpA/2scpA-ep15-dom-const-global ep15-dom -22.99 -25.93 1rec/1rec-ep15-dom-global ep15-dom -22.52 -25.64 1rec/1rec-ep15-dom-post ep15-dom -23.90 -25.64 1osa/1osa-ep15-dom-const-global ep15-dom -23.61 -25.37 2scpA/2scpA-ep15-dom-fssp-global ep15-dom -22.10 -24.38 1ncx/1ncx-ep15-dom-global ep15-dom -21.73 -23.93 1ncx/1ncx-ep15-dom-post ep15-dom -23.11 -23.93 1auiB/1auiB-ep15-dom-const-global ep15-dom -21.83 -23.32 1ncx/1ncx-ep15-dom-const-global ep15-dom -20.68 -22.96 1rec/1rec-ep15-dom-const-global ep15-dom -20.18 -22.94 In reporting conservation below, I'm only looking at the domain that t74 is supposed to be. The 2scpA-ep15-dom-global alignment looks pretty good (12 residues conserved, one 1-residue gap), though the gap is in the first Ca-binding pocket (of the t74 domain) and there is low conservation there, so probably only one calcium is bound (in the well-conserved pocket). The 2sas-ep15-dom-global alignment looks ok (only 8 residues conserved), but only one of the two pockets of 2sas binds calcium, and it is the one that is disrupted, so this prediction would be better if if t74 didn't bind calcium. The 2sas-ep15-dom-const global alignment looks ok (12 residues conserved, 11-residue insert in only Ca-binding pocket, 2 residue deletion near the back end of a helix. The 1cm4A/1cm4A-ep15-dom-const-global alignment has 17 conserved residues, but has an 11-residue insertion in the first calcium-binding pocket of the t74 domain, and a 2-residue deletion between the helices. The 1auiB-ep15-dom-global alignment has 16 conserved residues, with an insertion in the same place as for 1cm4A, and with a 7-residue gap at the back between the helices. 1osa/1osa-ep15-dom-global has 18 conserved residues, with the 11-residue insertion in the first Ca-binding pocket and fairly good conservation of the second pocket. There is a 2 residue gap at the back of the two helices. From cbarrett@cse.ucsc.edu Wed Jul 22 14:32:58 1998 Return-Path: cbarrett@cse.ucsc.edu X-Authentication-Warning: moo.cse.ucsc.edu: cbarrett owned process doing -bs Date: Wed, 22 Jul 1998 14:32:55 -0700 (PDT) From: Christian Barrett X-Sender: cbarrett@moo To: karplus@cse.ucsc.edu Subject: t74 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII I haven't found any information about # of calcium binding sites except for the CASP T0074 page where it states underneath "Additional Information" that "*A* calcium binding site is located in the current structure." >From this I assume that there is only one site, making your deletion in the EF hand plausible. Christian 26 July 1998 Sorting the alignments by my current belief in them: name conserved residues gaps Ca-binding 2scpA-ep15-dom-global 12 1 one pocket ok, not other 1osa-ep15-dom-global 18 11,2 one pocket ok, not other 1cm4a-ep15-dom-const 17 11,2 one pocket ok, not other 1auiB-ep15-dom-global 16 11,7 one pocket ok, not other 5icb-t74-const 17 3,3,1 non-binding pocket disrupted (other binds Mg, not Ca) 2sas-ep15-dom-const 12 11,2 binding pocket disrupted 2sas-ep15-dom-global 8 ? binding pocket disrupted Still have to compare with 5icb alignment, even though the two-domain nature of ep15's calcium binding makes 2scpA and 1osa somewhat more attractive. Also, it is probably worth checking 2tn4, which I just added to the list of models to try, based on its top score for the ep15-dom target model. 2tn4 scores ep15-dom ok, but not great: ep15-dom-2tn4-global 2tn4 155 -18.14 -21.51 ep15-dom-2tn4-post 2tn4 155 -19.52 -21.51 ep15-dom-2tn4-vit 2tn4 155 -14.28 -14.09 2tn4-ep15-dom-vit ep15-dom 228 -11.63 -11.14 This is however, the best score with the ep15-dom model, so is probably worth looking at: ep15-dom ...PWAVKPEDKAKYDAIFDSLSP.VNGFLSGDKVKPVLLNSK..LPVDILGRVWELSDID K F G D 2tn4 ...KEDAKGKSEEELAELFRIFDRnADGYIDAEELAEIFRASGehVTDEEIESLMKDGDKN ep15-dom HDGMLDRDEFAVAMFLVYCALEKEPVPMSLPPALVPPSKRKTW DG D DEF 2tn4 NDGRIDFDEF--------------------------------- Hmm, only 10 residues conserved in the region we are interested in---maybe the good part is in the first domain. The N insertion in the first loop agrees with 2scpA, which has 13 conserved residues, though two of those come from off then end of 2tn4. I think I still like 2scpA better: ep15-dom PWAVKPEDKAKYDAIFDSLSP-VNGFL K F 2scpA NPEAKSVVEGPLPLFFRAVDTNEDNNI ep15-dom SGDKVKPVLLNSKLPVDILGRVWELSDIDHDGMLDRDEFAVAMFLVYCALEKEPVPMSLPPALv19ky S D L D DG L EF A P 2scpA SRDEYGIFFGMLGLDKTMAPASFDAIDTNNDGLLSLEEFVIAGSDFFMNDGDSTNKVFWGPLV Wed Jul 29 09:52:02 PDT 1998 The 2tn4 alignment has good conservation in the short helix after the second binding loop, but very poor conservation the helices between the binding loops. I think 2tn4 scores well because of the good match in the FIRST domain of the calmodulin, which is not part of the current crystal. I can hand-tweak the 2tn4 alignment to get 16 residues conserved: ep15-dom PWAVKPEDKAKYDAIFDSLSPVNGFLSGDKVKPVLLNSKLPV--DILGRVWELSDIDHDGMLD E A IFD G S V D DG D 2tn4 AKGKSEEELAELFRIFD--RNADGYIDAEELAEIFRASGEHVTDEEIESLMKDGDKNNDGRID ep15-dom RDEFAVAMFLVYCALEKEPVPMSLPPALVPPSKRKTWVVSPAEKAKY- DEF M 2tn4 FDEFLKMM---------------------------------------- I think I can improve the 2scpA alignment by moving the first helix to conserve the 2 D residues in the first binding loop, though this does make for a 6-residue insertion in the binding loop. Fri Aug 7 14:26:38 PDT 1998 After poring over the 3D views for a while today, I finally decided to go with the 2scpA (calmodulin-like) domain, rather than the 5icb (calbindin) domain. It has a bit lower residue identity, but I liked the conserved connection from the end of the domain to the middle of the loop between the helices, and I did not like the two gaps needed to get 5icb to align. I'm still very uncertain about the placement of the first helix, but it doesn't seem to have a good placement. I decided to have a 2-residue insert in the first Ca-binding loop, to get a conserved D, rather than 1-residue gap or a 6-residue insert. This was a bit arbitrary, but I thought inserts a bit better than deletions in the loop, and the 6-residue insert most disrupt the loop so much that the conservation of the Ds that coordinate the Ca is really rather irrelevant. See ep15/2scpA/2scpA-t74-hand.a2m for 2-residue insert ep15/2scpA/2scpA-t74-insert6.a2m for 6-residue insert ep15/2scpA/2scpA-ep15-dom-global.pw.a2m.gz for 1-residue gap