13 July 1998 Kevin Karplus blast and double-blast find nothing. t71.t98_6 finds 2reb, but with score only -5.22. t71-t98 finds 1hvc with score -3.63 t71-sum finds 2reb with score -6 and 1hvc with score -5.26---so there is some agreement in both directions! Unfortunately, 2reb and 1hvc are not structural homologs. t71.remote_4 finds 1hvsA (fssp rep 1fmb), which is a structural homolog of 1hvc. T71 is supposed to have two domains though, so MAYBE the two matches are for different domains (we should be so lucky). From t71-sum98.rdb: fssp-rep 2reb -6 2reb 1hvc -5.26 1hvc 1rea -5.220 2reb 2rec[ABCDEF] -5.220 2reb 5pal -5.22 1rro 1aj8A -4.89 1aj8A I'm going to use t71.remote_3 as the main alignment for prediction. It includes a little more than t71.t98_6, but is still manageable (remote_4 seems to blow up and misalign t71). t71.remote_3 predicts 2reb (score -5.72) When summed 2reb is the top hit (score -6.5). This is in the range with 60 true positives for 200 false positives (77% false), but at the good end, so probably closer to 31 true, 50 false (62% false). 14 July 1998 Kevin Karplus The evolutionary tree for t71.remote_3 is rather misleading, since many of the sequences have almost nothing in common. There are a few shared motifs (one 34-long one), but only the really close sequences have full-sequence matches. 7 August 1998 Christian PMID:8550547 found that a non-clathrin-binding part of a normally- clathrin-containing protein contains an SH3 domain. T71 is part of a clathrin associated protein complex and is the only part of the complex that does not bind clathrin. PMID:8662627 Eps15, the Epidermal growth factor receptor substrate, binds to T71. This suggests that the receptor and t71 may have a common structure, which is 1aojA/B. Unfortunately, the score is quite bad (~ -.04 nat). The interesting thing is that 1aojA/B contains an SH3 domain that exists as a novel intertwined dimer of two identical chains. This is novel because all other SH3 domains are composed of a single chain. The authors of T71 tell us that it is a two-domain protein of 238 residues, with the domain break at 125. This gives the domains lengths of 125 and 113--pretty much the same length. Aligning the two T71 domains with SAM: t71-1st ....EDNFARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFIFYGNKTSTQFLNFTPTLICA t71-2nd fqptEMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDT--------EITKAKIIGFGSALL-- t71-1st DDLQTNLNLQTKPVDPTVDGGAQVQQVINIECISDFTEAPVLNIQFRYGGTFQNVSVKLPIT t71-2nd ------EEVDPNPAN-FVGAGIIHTKTTQIGCLLRLE--PNLQAQM-YRLTLRTSKDTVSQR t71-1st LNKF t71-2nd LCEL Without any hand-editing, I find ~15% exact residue identity. Slight fiddling could probably improve this. 1aojA/B is only a fragment of EPS8_MOUSE, so I built a model from the two T71 halves and scored/aligned EPS8_MOUSE and the two T71 halves. Even though EPS8_MOUSE gets a score of -0.38 nats, the SH3 domain of the 821-residue EPS8 is aligned with only a couple deletions! Here it is t71-2nd ................EMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDT t71-1st ................EDNFARFVCKNNGVLFENQLLQIGLKSEFRQNL sp|Q08509|EPS8_MOUSE aysssmyhrgphadhgEAAMPFKSTPNHQVDRNYDAVKTQPK-KYAKSK t71-2nd --------EITKAKIIGFGSALL--------EEVDPNPAN-FVGAGIIH t71-1st GRMFIFYGNKTSTQFLNFTPTLICADDLQTNLNLQTKPVDPTVDGGAQV sp|Q08509|EPS8_MOUSE ---YDFVARNSSELSVMKDDVLEILDDRRQWWKVRNASGD----SGFVP t71-2nd TKTTQIGCLLRLE--PNLQAQM-YRLTLRTSKDTVSQRLCEL--lseqf t71-1st QQVINIECISDFTEAPVLNIQFRYGGTFQNVSVKLPITLNKF--..... sp|Q08509|EPS8_MOUSE NNILDI---MRTPESGVGRADPPYTHTIQKQRTEYGLRSADT--psaps The obvious hypothesis to draw from this that t71 forms an SH3 domain in the unique manner in which 1aojA/B does. Continuing... Eps15 has an N and C terminus, the latter of which binds to T71. PMID:7797522 finds that the N-treminus SH3 domain of CRK also, and specifically, binds the the C-terminus domain of Eps15, the same place where t71 binds. The relevant structure is 1cka, which receives a pitiful score of .06 in t67-t98sum. So there's an obvious literature hint here that t71 forms an SH3 domain. But we wouldn't know this from the scores. I checked all the t98-sum98.rdb entries down to -2 nats and none of them had an SH3 domain. 10 August 1998 Christian Since 1aojA/B don't align well, I align EPS8_MOUSE and trimmed it until only the 1aojA/B aa sequence was left. The two halves of t71 aligned to this structure are 1aojA/t71.1st-1aojA.pw.a2m and 1aojA/t71.2nd-1aojA.pw.a2m. My hand-edited alignments are t71.1st-1aojA.cbarrett-hand.pw.a2m and t71.2nd-1aojA.cbarrett-hand.pw.a2m in the same directory. Mon Aug 10 14:40:47 PDT 1998 Kevin Karplus I looked at Christian's alignments and was not convinced. That is, there may well be this domain twice, but these alignments are not impressive. I found 4 gapless alignments that had 8-10 conserved residues, often in pairs across a beta hairpin. The pair gapless2 and gapless4 are a nice pair in that there is a line of conserved residues across the beta sheet. Unfortunately, this pair crosses the domain boundary around 125. Gapless3 and and gapless4 are nice in that they are adjacent in the sequence, but gapless3 crosses the domain boundary. Maybe t71 has 2 FULL SH3 domains? Unfortunately, the normal SH3 domain (as in 1ckaA) is a single barrel, and can be aligned in many places in the sequence. We quite likely have SH3 domains, but are they individual barrels or interdigitated like 1aoj? If they are interdigitated, what is the the interdigitation pattern? The best scoring non-self alignment made automatically is 1aojA/1aojA-t71-const-global T0071 238 0.34 -1.11 but the alignment is terrible. PROSITE pattern PS50002; SH3 lists a lot of SH3 domains. There are even some proteins with 2, 3, or 4 in a row: SH17_HUMAN 4 NCK_HUMAN 3 SLA1_YEAST 3 (2 togeter, 1 later) YKA7_CAEEL 3 We looked at a lot of alignments for t71, and liked some of them---1gbrA/1gbra-barrel1.a2m has a particularly high conservation in the second domain, and 1ckaA-t71-barrel33 is gapless, and barrel30 is similar, but introduces a small insertion (ST) to get better residue conservation in the first half. A possible competing alignment (somewhat weaker) is 1ckA-t71-barrell65. Current thinking---interdigitation seems unlikely, and one SH3 domain in each of the two domains seems most likely. We still have to figure out precisely how we want to align the domains (look at 1efnA and 1aboA alignments as well!). We may want to chop out the SH3 domains once we've decided where they are and try to match the piece that is left over---it may be too short to do much with. 18 August 1998 Christian I'm not hot on 1gbrA. The conservation patterns seem too scattered to suggest any conserved structure. 1ckaA-t71-barrel30 is a bit more interesting because conserved residues do seem a bit more correlated. Because of this I prefer it over barrel33, which just sorta seems like random conservation. 1aboA didn't provide anything particularly promising. I haven't looked at 1efnA, as it is still building.