9 July 1998 Kevin Karplus From the submitter's file: Protein is "bovine recombinant conglutinin". "collectin family alpha-helical neck region. related to MBL." wu-blast finds many close homologs with very strong scores, starting with 1hup [1234]kmb[123] 1rtm[123] [12]msb[AB] 1ytt[AB] 1rd[ijklmno][12] 1bch[123] 1af[abd][123] 1bcj[123] ... double-blast also gets 1hup as the top-scorer, and essentially the same list of subsequent ones. Target98 summing both ways gets strong scores for 1rtm1, 1lit, 1hlj, 1htn, 1esl, 1hli, 1kje, 1hup, ... Scoring using just the t98_6 target model puts 1hlj, 1lit, 1hli, 1kje, 1hup, 1kmb[123], ... at the top. Note: 1hli, 1kje, and 1hup don't have entries in the model library, and so don't score as well in the 2-way summing as one would expect. ID sum-score fssp Z & IDE to 1rtm1 1rtm1 -175.96 1rtm1 28.3 1lit -171.04 1lit 14.5 19% IDE 1hlj -170.47 ? 1htn -165.16 1htn 19.5 24% IDE 1esl -133.06 1esl 15.3 29% IDE 1hli -92.850 ? 1kje -92.850 ? 1hup -83.980 1rtm1 22.3 55% IDE [1234]kmb[123] -83.850 1rtm1 27.0 97% IDE 1ixx[BDF] -83.550 ? 1rtm[23] -83.270 1rtm1 26.2 100% IDE 1bch[123] -82.440 ? 1bcj[123] -82.340 ? 1kja -82.330 ? 1afa[123] -82.180 1rtm1 25.5 96% IDE So it looks like the best FSSP reps are 1rtm1, 1lit, 1htn, and 1esl. 1rtm1 Mannose-Binding Protein A 1lit Human Lithostathine 1htn Human Tetranectin, a trimeric plasmogen 1esl E-selectin (Lectin and Egf domains) The sequences not in fssp's Table2 are 1hl[ij] Ige Receptor (Human, Low-Affinity) 1kje Theoretical model... 1ixx[ABC] Coagulation Factors Ix X-Binding Protein 1bc[hj][123] Mannose-Binding Protein-A Mutant 1kja Theoretical model... In the t69.t98_6.tree, the PDB chains in the smallest subtree that contains both PDB files and the target are 1rdi[AB], 1hup, 1afaA, 1rtmA, 1kmbA, 1ytt[AB], 2msb[AB] [Note: NRP used 1rdiA and 1rdiB, but PDB uses 1rdi1 and 1rdi2.] All of these are close homologs of 1rtm1, so it looks like 1rtm1 will be the best FSSP model. Which of the homologs in that set is best is not clear yet, though 1hup is a strong candidate, based on wu-blast's preference for it. I'll make another tree using just the (closer) sequences in t69.t98_2, which may be a bit more informative. I'll also score all of PDB with a model built from that alignment. Using t69.t98_2, the top scorers are 1rtm[123] -114.100 [1234]kmb[123] -113.880 1bch[1223] -112.490 1bcj[123] -112.280 1af[abd][123] -112.230 1hup -109.480 [12]msb[AB] -100.910 1ytt[AB] -100.910 1hlj -99.370 1rd[ijklmno][12] -94.860 1hli -93.760 ... When looking at the tree for t69.t98_2, again the closest 3d structures are 10 homologs of 1rtm1. (Note: t55 is also a close homolog of 1rtm1, though perhaps not quite as close.) The alignment t69-1hup-global.pw looks good in 1D---I'll have to look at it in 3D also. This looks like an easy alignment that no one will miss, and so I expect this to be a comparative homology target concentrating on minor shifts of sidechains. The 1rtm1 alignments look pretty good also, and they almost all agree on where the gaps are (t69 has an insertion of AQE and of DE relative to 1rtm1 or to 1hup). I'll have to look at both the 1hup and the 1rtm1 alignments with sae, to see which has more conserved residues in interesting places. Since the Z-score for 1hup--1rtm1 is 22.3 in fssp, I expect that the two alignments will be nearly interchangeable for the backbone, and the only advantage one might have over the other is in sidechain placement. 14 July 1998 Already by t69.t98_2, part of the t69 sequence is deleted from the alignment because of homologs that are swamping it out. There is a subfamily containing 10 3d structures that is probably sufficient for getting good alignments. I have separated these sequences out into t69.close.a2m, realigned them into t69.close.retrain.a2m, and removed the duplication caused by the retrain procedure. The retrained alignment has very few gaps, making the alignments to the structures rather obvious. There are 10 structures, all in one subfamily, while t0069 is in a different subfamily. The 10 structures are 1rdiA, 1rdiB, 1hup, 1yttA, 1yttB, 2msbA, 2msbB, 1kmbA, 1rtmA, and 1afaA. FSSP picked 1rtmA (which it calls 1rtm1) as the FSSP representative for this set. Looking at tree based on alignment, we can split into two groups: 1rdi[AB], 1hup 1ytt[AB], 2msb[AB], 1kmbA, 1rtmA, 1afaA The choice between 1hup and 1rtmA is basically a choice of subgroups. Here is the alignment for a representative of each subgroup to t69 (based on t69.close.retrain): 1hup .....--------------SERKALQTEMARIKKWLTFSLGKQVGNKFFLTNGEIMTFEKVKALC Q KK F G VG K F T G LC T69 .....AEANALKQRVTILDGHLRRFQNAFSQYKKAVLFPDGQAVGEKIFKTAGAVKSYSDAEQLC A K F G G K F T S LC 1rtmA .....----AIEVKLANMEAEINTLKSKLELTNKLHAFSMGKKSGKKFFVTNHERMPFSKVKALC 1hup VKFQASVATPRNAAENGAIQNLIK---EEAFLGITDEKTEGQFVDLTGNRLTYTNWNEGEPNNA. A PR AEN A A L D TEG F TG L Y NW GEPNN T69 REAKGQLASPRSSAENEAVTQMVRAQEKNAYLSMNDISTEGRFTYPTGEILVYSNWADGEPNNS. E G A PR EN A A L D TEG F Y TG L YSNW EPN 1rtmA SELRGTVAIPRNAEENKAIQEVAK---TSAFLGITDEVTEGQFMYVTGGRLTYSNWKKDEPNDH. 1hup ..--GSDEDCVLLLKNGQWNDVPCSTS.HLAVCEFpi. G E CV G WNDVPCS L CEF T69 ..DEGQPENCVEIFPDGKWNDVPCSKQ.LLVICEF... G E CV I G WNDI C CEF 1rtmA ..--GSGEDCVTIVDNGLWNDISCQAS.HTAVCEFpa. Identities: 54 for 1hup, 51 for 1rtmA---very nearly the same---a slight edge for 1hup, especially since 1hup is the shorter alignment. Maybe should look at 1rdiA also. Remaking all the joint alignments using the t69.close.retrain alignment as the seed for */t69-* alignments, and using 1hup, 1rtm1, and 1rdi1 as the possible templates. The 1hup alignments look fairly complete, and may not need any editing. Fri Jul 24 14:26:24 PDT 1998 The highest scoring non-self alignment is 1rtm1/t69-1rtm1-global.pw. It has two insertions (one of 3 residues, and one of 2), both conveniently placed on the surface. Nearly as high a score is 1hup/t69-1hup-global.pw, which has the same insertions, but does not have as long an initial helix. The structures are VERY similar, and the places where conservation differs do not seem to be places where the structures differ. The main difference, other than the length of the initial helix is a slight displacement of the first strand of the first sheet (in 1hup), making it not hydrogen-bond in 1rtm1. 1rdi1 is missing the initial helix, so is much less attractive as a target. Let's go with the 1rtm1/t69-1rtm1-global.pw. alignment.