13 July 1998 Kevin Karplus Despite submitter's claim that t72 has homology, organizers added caveat: "Note: Even though authors indicate that the target has homolog with known structure conventional sequence comparison methods do not produce unambiguous results." Is this a hint that "new-fold" is wrong? or just a typo on submitter's part? Indeed wu-blast and double-blast don't find any useful homologs in PDB. The t72.t98_6 model gets weak scores (-5.31 for 1ktx). t72-t98 finds 1vvc in library (-4.94) t72-sum finds 1vvc (-7.73 ) and 1ktx (-6.3). t72-remote_4 finds 1limA (-4.85), which is just a theoretical model, with 1ktx close behind (-4.52). The t72.t98_6 seems to have too many subfamilies, and t72 has gotten unaligned, so let's drop back to t72.t98_3 for predictions. t72.t98_3 has 1ktx first with -5.15. Summing both ways gets t72 1vvc -8.04 1vvc t72 1ktx -6.14 1ktx t72 1hfh -5.23 1hfh t72 2ktx -5.150 1ktx 25 August 1998 Kevin Karplus Trying again with new Makefile. The CD5 domain (here CD5_HUMAN 25:134) occurs many times in different proteins. It is one of the SRCR (Scavenger Receptor Cysteine-Rich) domains, which is a large superfamily. I picked up a few abstracts from medline (srcr-medline.abstracts) to get some background on the superfamily. It seems that the CD5 domain is an extracellular domain of a transmembrance signalling protein, and that the SRCR domains in general are extracellular parts of transmembrane proteins (involved in signalling or cell adhesion). The target98 method drifts a bit, and by t72.t98_4 the alignment no longer has a 1-1 correspondece with the residues of the target. t72.t98_3 may be the best to align to, since it has about 200 sequences but has not drifted so far. The CD5 and CD6 domains are here. wu-blast finds nothing: top is 1tabI 1.25 double-blast finds nothing. t72.t98_6 finds little: 1ktx -5.590 1ktx kaliotoxin 2ktx -5.570 1sco 81% kaliotoxin 1agt -4.840 1sco 76% agitoxin 1frfS -4.670 ? Ni-Fe hydrogenase 1rfs -4.510 1rfs Rieske soluble fragment 1aw6 -3.930 ? Gal4 1faxL -3.820 1hcgB 98% factor xa (stuart factor) 1mh1 -3.710 1mh1 rac1 fragment 1tf3A -3.710 1aayA 39% transcription factor iiia 1sco and homologs are toxins that are potassium-channel inihibitors. The top hits with the target98 library are (still) 1vvc -5.420 1vvc vaccinia virus complement control protein 1ahjA -5.020 1ahjA nitrile hydratase 1envA -4.590 1envA hiv-1 envelope protein chimera 1pspA_1 -4.540 2pspA Pancreatic spasmolytic polypeptide 1ps2 -4.090 2pspA 46% 1hfh -4.020 1hfh Factor h 1hc[de] -4.010 1hce hisactophilin 1pspA_2 -3.550 2pspA Summing both ways gets: 1vvc -8.37 1vvx 1ktx -7.61 1ktx 1rfs -5.96 1rfs 1gatA -5.58 1gatA Erythroid transcription factor gata-1 2ktx -5.570 1sco 81% 1hfh -5.34 1hfh [12]bds -5.33 2bds BDS-I (should use 1bds, since that is mean of 2bds) 1eciA -5.07 1eciA ectatomin 1ahjA -5.020 1ahjA nitrile hydratase The fssp library with viterbi scoring finds: 1rie -5.630 rieske iron-sulphur protein fragment 2hts -5.000 heat shock transcription factor (DNA-binding) 1tabI -4.880 Trypsin complex with bowman-Birk inihibitor 1wit -4.610 twitchin 18th 1gsf modul 1gpl -4.570 rp2 lipase 1hpt -4.330 human pancreatic secretory trypsin inhibitor 1ctt -4.080 cytidine deaminase I'm getting some of the same fssp files scoring well with many targets---is the reverse-sequence normalization not turned on (or not working)? (1wit scored t85 well also). Note: 1rie and 1rfs have Z-score 9.3, but only 29% residue id---the two-way match on this pair may be worth pursuing. Using t72.t98_3, we find about the same as with t72.t98_6 1ktx -5.460 1ktx kaliotoxin 2ktx -5.430 1sco 81% kaliotoxin 1frfS -5.040 ? Ni-Fe hydrogenase 1agt -4.670 1sco 76% agitoxin 1eptA -4.670 5ptp 90% procine e-trypsin 1rfs -4.460 1rfs Rieske soluble fragment Summing both ways uisng t72.t98_3 gets 1vvc -8.35 1vvc 1ktx -7.48 1ktx 1rfs -5.91 1rfs Rieske soluble fragment 1hfh -5.81 1hfh 2ktx -5.430 1sco 1pft -5.36 1pft tfiib fragment 1eciA -5.18 ectatomin 1frfS -5.040 ? Ni-Fe hydrogenase 1ahjA -5.020 1ahjA nitrile hydratase 1gatA -5.02 1gatA Erythroid transcription factor gata-1 1mtnC -5.01 5ptp? 1bpi? The t98-mixed library (with w0.5) finds 1ahjA -5.580 1ahjA nitrile hydratase 1vvc -5.500 1vvc 1hc[de] -3.990 1hce 100% hisactophilin 1bak -3.740 1bak g-protein coupled receptor kinase 2 fragment 4gatA -3.440 4gatA area fragment DNA 7gatA -3.440 1gatA 47% area fragment mutant DNA 1gatA -3.410 1gatA 1htn -3.330 1htn tetranectin fragment Using target98 with viterbi scoring picks a slightly different set than all-paths scoring: 1ahjA -8.380 1ahjA nitrile hydratase 1pspA_1 -6.600 2pspA 1ps2 -6.020 2pspA 46% 1eciA -5.590 1eciA ectatomin 1lucB -5.050 1lucA 31% bacterial luciferase 1pspA_2 -4.930 2pspA 2pspA -4.850 2pspA 1bp1 -4.740 1bp1 bacterail permeability-increasing protein 1envA -4.730 1envA hiv-1 envelope protein chimera 3rubL_2 -4.640 1burA 93% Ribulose 1,5-bisphosphate carboxylase/oxygenase 1lit -4.610 1lit lithostathine 1rtfB -4.600 ? The best non-self alignments are 1hce/1hce-t72-global T0072 110 -2.05 -8.78 1ahjA/1ahjA-t72-vit T0072 110 -8.35 -8.57 1vvc/1vvc-t72-global T0072 110 -8.67 -6.79 1rie/1rie-t72-fssp-global T0072 110 -2.00 -6.72 1ktx/t72-1ktx-vit 1ktx 37 -7.02 -6.64 1vvc/1vvc-t72-const-global T0072 110 -9.08 -6.48 1ktx/t72-1ktx-global 1ktx 37 -0.11 -5.72 1hfh/t72-1hfh-global 1hfh 120 -5.43 -5.64 1eciA/1eciA-t72-vit T0072 110 -4.03 -5.59 1rfs/t72-1rfs-global 1rfs 127 -5.86 -5.36 1bak/1bak-t72-global T0072 110 -4.89 -5.25 1bp1/1bp1-t72-vit T0072 110 -4.81 -4.93 1pft/1pft-t72-global T0072 110 -5.15 -4.91 1vvc/1vvc-t72-fsspt98-global T0072 110 -7.46 -4.89 1pft/t72-1pft-vit 1pft 50 -3.60 -4.85 1envA/1envA-t72-vit T0072 110 -5.33 -4.75 1eciA/t72-1eciA-vit 1eciA 37 -4.44 -4.56 1hfh/1hfh-t72-fsspt98-global T0072 110 -7.91 -4.42 1hfh/1hfh-t72-fssp-global T0072 110 -5.89 -4.39 1hce/1hce-t72-vit T0072 110 -4.63 -4.21 1rfs/t72-1rfs-vit 1rfs 127 -4.67 -4.17 2pspA/2pspA-t72-vit T0072 110 -4.94 -4.14 1vvc/t72-1vvc-global 1vvc 118 -5.83 -3.88 1vvc/1vvc-t72-vit T0072 110 -4.23 -3.72 1rie/1rie-t72-global T0072 110 -1.14 -3.58 1htn/1htn-t72-global T0072 110 -3.07 -3.38 1bak/1bak-t72-vit T0072 110 -3.89 -3.10 1eciA/1eciA-t72-fssp-global T0072 110 -5.29 -3.09 1vvc/1vvc-t72-fssp-global T0072 110 -4.99 -3.02 1eciA/1eciA-t72-global T0072 110 -3.60 -3.01 From karplus@cse.ucsc.edu Tue Aug 25 17:05:25 1998 Return-Path: karplus@cse.ucsc.edu Date: Tue, 25 Aug 1998 17:05:24 -0700 From: Kevin Karplus To: cbarrett@cse.ucsc.edu Cc: karplus@cse.ucsc.edu Subject: t72 papers I looked up some abstracts for CD5 (and related domains in the SRCR superfamily) and found a few papers that may be worth picking up (see pce/casp3/t72/rcrc-medline.abstracts): This one claims to determine cystine bridges: 25. Resnick D; Chatterton JE; Schwartz K; Slayter H; Krieger M. Structures of class A macrophage scavenger receptors. Electron microscopic study of flexible, multidomain, fibrous proteins and determination of the disulfide bond pattern of the scavenger receptor cysteine-rich domain. Journal of Biological Chemistry, 1996 Oct 25, 271(43):26924-30. The next several are a series on interaction of the domain with a ligand: 8. Aruffo A; Bowen MA; Patel DD; Haynes BF; Starling GC; Gebe JA; Bajorath J. CD6-ligand interactions: a paradigm for SRCR domain function? Immunology Today, 1997 Oct, 18(10):498-504. 17. Bodian DL; Skonier JE; Bowen MA; Neubauer M; Siadak AW; Aruffo A; Bajorath J. Identification of residues in CD6 which are critical for ligand binding. Biochemistry, 1997 Mar 4, 36(9):2637-41. 23. Skonier JE; Bowen MA; Emswiler J; Aruffo A; Bajorath J. Mutational analysis of the CD6 binding site in activated leukocyte cell adhesion molecule. Biochemistry, 1996 Nov 26, 35(47):14743-8. 26. Skonier JE; Bowen MA; Emswiler J; Aruffo A; Bajorath J. Recognition of diverse proteins by members of the immunoglobulin superfamily: delineation of the receptor binding site in the human CD6 ligand ALCAM. Biochemistry, 1996 Sep 24, 35(38):12287-91. 47. Raab M; Yamamoto M; Rudd CE. The T-cell antigen CD5 acts as a receptor and substrate for the protein-tyrosine kinase p56lck. Molecular and Cellular Biology, 1994 May, 14(5):2862-70. Wed Aug 26 18:00:37 PDT 1998 In the "DOMO" database the domain of interest is DM00148 http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?-e+[domo-id:DM00148] 58 occurences of the domain are listed in the DOMO database. The 1hce-t72-global alignment is a not bad little beta barrel, and can be "improved" to 1hce-t72-hand1.a2m. 1hce is loaded with histidine, but has only one cysteine. Somehow it seems an unlikely model for the cysteine-rich domain. The 1ahjA-t72-vit alignment is short (33 residues with 8 conserved) and totally disagrees with the neural net on secondary structure. The 1vvc/1vvc-t72-global alignment is interesting, in that there are several cystine bridges, some of which are preserved by the alignment. I can hand edit to preserve a few more (1vvc-t72-hand1). One unconserved bridge actually looks pretty good, since there is a cysteine one space over on the next strand which could easily be the new bridge. It would be very valuable to know which cystine bridges form in the CD5 domain, since that could really guide the alignment. I think 1vvc is a strong candidate! Fri Aug 28 19:15:36 PDT 1998 1envA is just a coiled coil. The 1vvc alignments do NOT agree with Resnick et al. about the pairing of the cysteines. They claim C2-C7 C3-C8 C5-C6 C1 RLSWYDPDFQARLTRSNSKCQGQLE C2 VYLKDGWHMVCSQSWGRS C3 SKQWEDPSQASKVCQR C4 LNCGVPLSLGPFL C5 VTYTPQSSIICYGQL C6 GSFSNCSHS C7 RNDMCHS C8 LGLTCLE The domain they looked at did not HAVE C1 and C4, so they couldn't say how they paired. The 1vvc alignments pair 1-3, 2-4, 5-7, 6-8, though our alignments are don't match the 8 residues cysteines of 1vvc exactly, so we get C2-C3, C4-C7, C5-C8. There is no way to get the two domains of 1vvx together to get C2-C7 alignments from it. 1 September 1998 Christian I did an Entrez structure search for "disulfide receptor": From 1itb FSSP: t72-1cd1A.cbarrett1.a2m isn't a convincing alignment but does allow me to pair most of the cysteines by sacrificing identities. While T0072 is the first domain of the CD5 receptor, 1hnf is the first domain from the CD2 receptor. 1hnf-t72.cbarrett1.a2m is one that should be looked at by somebody else. A full-length alignment with one 2-residue gap plus a 1-residue shift results in two structurally adjacent conserved patterns, three tight-turn conservations, and the C5-C6 disulfide bridge. Without the calcium atom, and assuming that Resnick et al. are in error about a couple of the disulfide bridges, I could see C2-C3 possibly pairing (might be too much of a twist) and C7-C8 pairing. Without the Nag1 hetero group, C1-C4 could quite easily pair. t72-sum98.rdb:t72 1hnf -1.270 t72-sum98_6.rdb:t72 1hnf -1.270 t72-t98-mixed.rdb:1hnf T0072 w0.5 0 2 -0.710 t72-t98.rdb:1hnf T0072 varh50 0 2 -1.270 Investigating 1hnf FSSP family: 1hngA very similar to 1hnf but only 48% IDE to it.. 1cdb-t72.cbarrett1.a2m pairs up C1-C8, C6-C7, C4-C5, but leaves C2 and C3 floating. 1tvdA ... building t72-1vcA.cbarrett1.a2m is not quite as convincing an alignment as 1hnf. 1igtB Probably not. t72-1ah1.cbarrett1.a2m is a compact structure with about as much residue conservation as any of these others. A couple of the disulfides are possible without too much fussing. I probably still prefer 1hnf over this one. 2ncm low %IDE, can match up C5-C6. 2tgi This is an interesting structure but doesn't immediately provide the cysteine pairing that the paper suggests. From the model library hits: 1agt is just a fragment.