13 July 1998 Kevin Karplus blast and double-blast find nothing. t71.t98_6 finds 2reb, but with score only -5.22. t71-t98 finds 1hvc with score -3.63 t71-sum finds 2reb with score -6 and 1hvc with score -5.26---so there is some agreement in both directions! Unfortunately, 2reb and 1hvc are not structural homologs. t71.remote_4 finds 1hvsA (fssp rep 1fmb), which is a structural homolog of 1hvc. T71 is supposed to have two domains though, so MAYBE the two matches are for different domains (we should be so lucky). From t71-sum98.rdb: fssp-rep 2reb -6 2reb 1hvc -5.26 1hvc 1rea -5.220 2reb 2rec[ABCDEF] -5.220 2reb 5pal -5.22 1rro 1aj8A -4.89 1aj8A I'm going to use t71.remote_3 as the main alignment for prediction. It includes a little more than t71.t98_6, but is still manageable (remote_4 seems to blow up and misalign t71). t71.remote_3 predicts 2reb (score -5.72) When summed 2reb is the top hit (score -6.5). This is in the range with 60 true positives for 200 false positives (77% false), but at the good end, so probably closer to 31 true, 50 false (62% false). 14 July 1998 Kevin Karplus The evolutionary tree for t71.remote_3 is rather misleading, since many of the sequences have almost nothing in common. There are a few shared motifs (one 34-long one), but only the really close sequences have full-sequence matches. 7 August 1998 Christian PMID:8550547 found that a non-clathrin-binding part of a normally- clathrin-containing protein contains an SH3 domain. T71 is part of a clathrin associated protein complex and is the only part of the complex that does not bind clathrin. PMID:8662627 Eps15, the Epidermal growth factor receptor substrate, binds to T71. This suggests that the receptor and t71 may have a common structure, which is 1aojA/B. Unfortunately, the score is quite bad (~ -.04 nat). The interesting thing is that 1aojA/B contains an SH3 domain that exists as a novel intertwined dimer of two identical chains. This is novel because all other SH3 domains are composed of a single chain. The authors of T71 tell us that it is a two-domain protein of 238 residues, with the domain break at 125. This gives the domains lengths of 125 and 113--pretty much the same length. Aligning the two T71 domains with SAM: t71-1st ....EDNFARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFIFYGNKTSTQFLNFTPTLICA t71-2nd fqptEMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDT--------EITKAKIIGFGSALL-- t71-1st DDLQTNLNLQTKPVDPTVDGGAQVQQVINIECISDFTEAPVLNIQFRYGGTFQNVSVKLPIT t71-2nd ------EEVDPNPAN-FVGAGIIHTKTTQIGCLLRLE--PNLQAQM-YRLTLRTSKDTVSQR t71-1st LNKF t71-2nd LCEL Without any hand-editing, I find ~15% exact residue identity. Slight fiddling could probably improve this. 1aojA/B is only a fragment of EPS8_MOUSE, so I built a model from the two T71 halves and scored/aligned EPS8_MOUSE and the two T71 halves. Even though EPS8_MOUSE gets a score of -0.38 nats, the SH3 domain of the 821-residue EPS8 is aligned with only a couple deletions! Here it is t71-2nd ................EMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDT t71-1st ................EDNFARFVCKNNGVLFENQLLQIGLKSEFRQNL sp|Q08509|EPS8_MOUSE aysssmyhrgphadhgEAAMPFKSTPNHQVDRNYDAVKTQPK-KYAKSK t71-2nd --------EITKAKIIGFGSALL--------EEVDPNPAN-FVGAGIIH t71-1st GRMFIFYGNKTSTQFLNFTPTLICADDLQTNLNLQTKPVDPTVDGGAQV sp|Q08509|EPS8_MOUSE ---YDFVARNSSELSVMKDDVLEILDDRRQWWKVRNASGD----SGFVP t71-2nd TKTTQIGCLLRLE--PNLQAQM-YRLTLRTSKDTVSQRLCEL--lseqf t71-1st QQVINIECISDFTEAPVLNIQFRYGGTFQNVSVKLPITLNKF--..... sp|Q08509|EPS8_MOUSE NNILDI---MRTPESGVGRADPPYTHTIQKQRTEYGLRSADT--psaps The obvious hypothesis to draw from this that t71 forms an SH3 domain in the unique manner in which 1aojA/B does. Continuing... Eps15 has an N and C terminus, the latter of which binds to T71. PMID:7797522 finds that the N-treminus SH3 domain of CRK also, and specifically, binds the the C-terminus domain of Eps15, the same place where t71 binds. The relevant structure is 1cka, which receives a pitiful score of .06 in t67-t98sum. So there's an obvious literature hint here that t71 forms an SH3 domain. But we wouldn't know this from the scores. I checked all the t98-sum98.rdb entries down to -2 nats and none of them had an SH3 domain. 10 August 1998 Christian Since 1aojA/B don't align well, I align EPS8_MOUSE and trimmed it until only the 1aojA/B aa sequence was left. The two halves of t71 aligned to this structure are 1aojA/t71.1st-1aojA.pw.a2m and 1aojA/t71.2nd-1aojA.pw.a2m. My hand-edited alignments are t71.1st-1aojA.cbarrett-hand.pw.a2m and t71.2nd-1aojA.cbarrett-hand.pw.a2m in the same directory. Mon Aug 10 14:40:47 PDT 1998 Kevin Karplus I looked at Christian's alignments and was not convinced. That is, there may well be this domain twice, but these alignments are not impressive. I found 4 gapless alignments that had 8-10 conserved residues, often in pairs across a beta hairpin. The pair gapless2 and gapless4 are a nice pair in that there is a line of conserved residues across the beta sheet. Unfortunately, this pair crosses the domain boundary around 125. Gapless3 and and gapless4 are nice in that they are adjacent in the sequence, but gapless3 crosses the domain boundary. Maybe t71 has 2 FULL SH3 domains? Unfortunately, the normal SH3 domain (as in 1ckaA) is a single barrel, and can be aligned in many places in the sequence. We quite likely have SH3 domains, but are they individual barrels or interdigitated like 1aoj? If they are interdigitated, what is the the interdigitation pattern? The best scoring non-self alignment made automatically is 1aojA/1aojA-t71-const-global T0071 238 0.34 -1.11 but the alignment is terrible. PROSITE pattern PS50002; SH3 lists a lot of SH3 domains. There are even some proteins with 2, 3, or 4 in a row: SH17_HUMAN 4 NCK_HUMAN 3 SLA1_YEAST 3 (2 togeter, 1 later) YKA7_CAEEL 3 We looked at a lot of alignments for t71, and liked some of them---1gbrA/1gbra-barrel1.a2m has a particularly high conservation in the second domain, and 1ckaA-t71-barrel33 is gapless, and barrel30 is similar, but introduces a small insertion (ST) to get better residue conservation in the first half. A possible competing alignment (somewhat weaker) is 1ckA-t71-barrell65. Current thinking---interdigitation seems unlikely, and one SH3 domain in each of the two domains seems most likely. We still have to figure out precisely how we want to align the domains (look at 1efnA and 1aboA alignments as well!). We may want to chop out the SH3 domains once we've decided where they are and try to match the piece that is left over---it may be too short to do much with. 18 August 1998 Christian I'm not hot on 1gbrA. The conservation patterns seem too scattered to suggest any conserved structure. 1ckaA-t71-barrel30 is a bit more interesting because conserved residues do seem a bit more correlated. Because of this I prefer it over barrel33, which just sorta seems like random conservation. 1aboA didn't provide anything particularly promising. I haven't looked at 1efnA, as it is still building. 31 August 1998 Christian Moved everything to old and copied the Makefile from t85 to rebuild the directory. 1 September 1998 Christian After the new make, here are some new things: 1kjs This is a cell adhesion protein that can match the first domain of t71. Residue conservation, but I'm not certain that it is significant. See 1kjs-t71.cbarrett1.a2m. Alignment to the 2nd domain isn't convincing. 1kit-t71.cbarrett1.a2m is probably the best alignment yet. While there are some structural concerns, conservation isn't too bad and a domain break occurs in just about the right place in t71 (mid-sequence, at the "LNKFFQP". The structure scores relatively well across the librarys and target model: t71-fssp-vit.rdb:1kit T0071 varh50 1 2 -4.780 t71-sum98.rdb:t71 1kit -4.08 t71-t98-mixed.rdb:1kit T0071 w0.5 0 2 -2.710 t71-t98-vit.rdb:1kit T0071 varh50 1 2 -5.460 t71-t98.rdb:1kit T0071 varh50 0 2 -2.540 t71.t98_6-varh50-pdb.rdb:t71 1kit varh50 0 2 -1.540 Similar structures to 1kit in the same region: Z %IDE 1eut 36.1 25 2sil 35.9 21 1nnc 16.1 12 1tbgA 10.1 8 1bbkH 7.8 4 1aomB 5.7 6 4aahA 5.2 6 1gof 3.7 6 The only encouraging ones are: t71-sum98.rdb:t71 1gof_1 -2.340 t71-t98-vit.rdb:1gof_1 T0071 varh50 1 2 -3.860 t71-t98.rdb:1gof_1 T0071 varh50 0 2 -2.340 About the same as 1kit. 1dlhA, from the 1sebA FSSP family, has moderate residue conservation, is two domains, and is fairly easy to get the split between the two domains correct. The target and structure like each other fairly well: 1dlhA-t71-global.pw.dist:T0071 238 -2.14 -3.74 t71-1dlhA-vit.pw.dist:1dlhA 180 -4.38 -4.15 Everything is starting to look equally vaguely possible. So, is it the SH3 domains, 1kit, or 1dlhA? Wed Sep 2 09:51:31 PDT 1998 Kevin Karplus Looking at top hits: 2reb/t71-2reb-vit looks pretty good for a short hit in the first domain. 1hvc/t71-1hvc-global is a pair of SH3 domains, but the intertwining between the domains makes for an uncomfortable alignment. Perhaps if the piece of the barrel aligned to the wrong domain were alinged to the end, instead of the beginning of the sequence, things would look better. Unfortunately, our methods don't allow for rotational symmetry, and so we can't express an alignment that moves a chunk from the beginning to the end while leaving the middle alone. We can do a very nice job of matching the second domain of t71 to the second domain of 1hvc if we rethread the barrel, using 1hvc-t71-dom2-main and t71-1hvc-dom2-end. We can also get the first domain similarly aligned (1hvc-t71-dom1-main, t71-1hvc-dom1-end). Let's submit this! 1fmb-t71-global looks poor---chunks of the active site are missing, alignment is choppy. However, 1fmb and its homologs are in t71.remote_3.a2m. The alignment looks like half a mini-barrel, so could be evidence for an SH3 domain. 1kjs-t71-fssp-global is just a helix (ECISDFTEAPVLNIQFRY) 1dlhA/t71-1dlhA-global has high residue conservation, but does not respect reported domain boundaries. t71-1dlhA.cbarrett1 does respect the boundaries, but doesn't look much better than a random alignment. 1kit/1kit-t71-vit gets a single chunk, but it is not compact in 3D. 1kit-t71.cbarrett1 extends the viterbi alignment, and looks ok for the first domain, but random for the second domain. The first domain has good residue conservation, but is missing the middle strands of a beta sandwich. Wed Sep 2 1998 Kevin Karplus We decided this morning to go with 1hvc, taking the piece that aligns to the wrong domain and moving it to the END of the other domain---effectively rotating the barrel a little bit. The pieces are 1hvc-t71-dom1-main t71-1hvc-dom1-end 1hvc-t71-dom2-main t71-1hvc-dom2-end From karplus@cse.ucsc.edu Wed Sep 2 14:39:43 1998 Return-Path: karplus@cse.ucsc.edu Date: Wed, 2 Sep 1998 14:39:39 -0700 From: Kevin Karplus To: cbarrett@cse.ucsc.edu Cc: markd@cse.ucsc.edu, karplus@cse.ucsc.edu Subject: 1hvc pdb file not read correctly It may be that the reason that the prediction center is having trouble with our 1hvc prediction is related to the reason why 1hvc.pdb.2d is incorrect (even though 1hvc.dssp.2d is correct). I may have to do with the funky numbering: REMARK 4 THE FIRST 99 RESIDUES WERE NUMBERED 101 - 199 IN THE 1HVC 48 REMARK 4 ORIGINAL DEPOSITION AND HAVE BEEN ASSIGNED RESIDUE NUMBERS 1HVC 49 REMARK 4 1 - 99 WITH INSERTION CODE B IN THIS ENTRY. THE FIVE 1HVC 50 REMARK 4 LINKER RESIDUES HAVE RESIDUE NUMBERS 200 - 204. THE LAST 1HVC 51 REMARK 4 99 RESIDUES WERE NUMBERED 1 - 99 IN THE ORIGINAL DEPOSITION 1HVC 52 REMARK 4 AND HAVE BEEN ASSIGNED RESIDUE NUMBERS 1 - 99 WITH 1HVC 53 REMARK 4 INSERTION CODE A IN THIS ENTRY. 1HVC 54 From karplus@cse.ucsc.edu Wed Sep 2 15:08:33 1998 Return-Path: karplus@cse.ucsc.edu Date: Wed, 2 Sep 1998 15:08:31 -0700 From: Kevin Karplus To: karplus@cse.ucsc.edu CC: cbarrett@cse.ucsc.edu In-reply-to: <199809022139.OAA10925@purr.cse.ucsc.edu> (message from Kevin Karplus on Wed, 2 Sep 1998 14:39:39 -0700) Subject: Re: 1hvc pdb file not read correctly More thoughts---there are 11 other PDB files that are > 90% identical to 1hvc. They are all single domains---1hvc is a chimera that ties together a dimer. Z-score %ID resolution 1tcxA 17.4 93 2.3 1hviA 17.3 96 1.8 1odwA 17.2 95 2.1 1ajxA 17.2 96 2.0 1a30A 17.1 93 2.0 1hxbA 16.9 95 2.3 1merA 16.8 94 1.9 1axaA 16.7 95 2.0 1bveB 13.7 94 NMR many 1bvgB 13.1 94 NMR average I'd recommend 1hviA, since it combines identity, resolution, and Z-score. We can align one domain to 1hviA and the other to 1hviB. I'll see if this will work. OK pieces are now t71-1hviA-dom1-main t71-1hviA-dom1-end t71-1hviB-dom2-main t71-1hviB-dom2-end I tried writing a draft of the results.t71 file, and decided it was too verbose, but will put it here in the notes: T0071 is the ear of alpha adpatin. We were hoping to find SH3 domains, or similar barrels based on two things found in the literature: 1) In Benmerah A, et al. "The ear of alpha-adaptin interacts with the COOH-terminal domain of the Eps 15 protein." [J Biol Chem. 1996 May 17; 271(20): 12111-12116.] it was noted that T0071 binds to a domain at the end of EPS15. This suggested to us that T0071 may have a similar structure to other Eps15-associated proteins (such as EAST), which are known to contain SH3 domains. See, for example, Lohi O, et al. "EAST, an epidermal growth factor receptor- and Eps15-associated protein with src homology 3 and tyrosine-based activation motif domains." [J Biol Chem. 1998 Aug 14; 273(33): 21408-21415.]. 2) Roos J, et al. "Dap160, a neural-specific Eps15 homology and multiple SH3 domain-containing protein that interacts with Drosophila dynamin." [J Biol Chem. 1998 Jul 24; 273(30): 19108-19119.] reports that "src homology 3 domains and Eps15 homology domain[s are] motifs frequently found in proteins associated with endocytosis." 3) In Shpetner HS, et al. "A binding site for SH3 domains targets dynamin to coated pits." [J Biol Chem. 1996 Jan 5; 271(1): 13-16], it was suggested that "a protein containing an SH3 domain is involved in recruiting dynamin to coated pits and provide the first evidence for a biological role for SH3 domains in dynamin function." Our top two scores were in the where about 48% of the hits were false positives in our tests. The top hit (for 2reb) did not look very promising, but the second hit (for 1hvc) had barrels similar to the SH3 domains that we were hoping for. We decided to align the two domains of T0071 to the two barrels of 1hvc. The alignment we got automatically covered most of one barrel, except for the initial two strands of the barrel, which it picked up from the other barrel. We finally decided to thread the barrel differently, with the two strands coming from the end of the T0071 domain, rather from the beginning as in 1hvc. Because the CASP web site was unable to understand the 1hvc pdb file, we redid the alignment to 1hivA and 1hivB, which are almost identical to 1hvc. Thu Sep 3 11:11:22 PDT 1998 Kevin Karplus I'll have to resubmit the 1hviA alignment, with more conciliatory remarks. I'm also doing a search with just the first domain, since the 1hvc hit really fits the second domain much better. I looked at the genbank entry for ADAC_RAT, to see if there are any intron boundaries corresponding to either the domains or the barrel-rearrangement I'm proposing, but so far as I could tell, ADAC_RAT comes from a single exon. (I may be misunderstanding the genbank format, though). For t71-first, wu-blast finds nothing (best 1aa3 -0.357) double-blast finds nothing. The top hits with t71-first.t98_6 are 2reb and its copies (including 1aa3, which is a fragment of 2reb), then come a number of homologs of 1iakA (56-59% residue id), then 1sat and its close homologs. 2reb -7.120 2reb 1aa3 -5.990 2reb 1a6aA -4.800 ? 1dlh[AD] -4.780 1iakA 56% 1seb[AE] -4.780 1iakA 56% 2sebA -4.780 1iakA 56% ... 1sat -4.360 1sat ... Of 2reb, 1dlhA, 1sat, 1hviA, 1iakA, the best alignments are 2reb/t71-first-2reb-vit 2reb 303 -7.14 -8.17 1dlhA/t71-first-1dlhA-global 1dlhA 180 -9.27 -8.09 1sat/t71-first-1sat-global 1sat 468 -9.37 -7.95 2reb/t71-first-2reb-global 2reb 303 -5.55 -5.87 1dlhA/t71-first-1dlhA-vit 1dlhA 180 -5.02 -4.80 1sat/t71-first-1sat-vit 1sat 468 -4.81 -3.86 1hviA/1hviA-t71-first-global t71-first 125 -1.68 -2.88 2reb/2reb-t71-first-vit t71-first 125 -3.43 -2.77 1iakA/t71-first-1iakA-vit 1iakA 182 -2.70 -2.33 1iakA/t71-first-1iakA-global 1iakA 182 -2.66 -2.11 1dlhA/1dlhA-t71-first-vit t71-first 125 -1.14 -1.77 1iakA/1iakA-t71-first-vit t71-first 125 -0.53 -0.59 The 2reb alignments all contain the same gapless alignment, but extend it more or less far. The 2reb-t71-first-vit is the shortest, but the part of the extension in t71-first-2reb-vit is believable. The 1dlhA alignment is not very convincing--half a beta sandwich. Fri Sep 4 09:49:14 PDT 1998 Kevin Karplus Using the target98 library, the best hits for the first domain are 1fnb_2 t71-first varh50 0 2 -4.770 1fnc 1se2_1 t71-first varh50 0 2 -4.620 1se4 1fnd t71-first varh50 0 2 -3.560 1fnc 1fnc t71-first varh50 0 2 -3.520 1fnc 2bopA t71-first varh50 0 2 -3.330 2bopA 1knyA t71-first varh50 0 2 -3.230 1knyA 1kanA t71-first varh50 0 2 -3.200 1knyA 99% 2crd t71-first varh50 0 2 -3.120 2crd 1ktx t71-first varh50 0 2 -3.070 1ktx 2kauC_2 t71-first varh50 0 2 -3.010 2kauC Summing both ways still leaves 2reb in front: t71-first 2reb -9.12 2reb ... [2reb copies and close homologs] t71-first 1ignA -5.09 1ignA t71-first 5cytR -4.9 1ycc t71-first 1a6aA -4.800 ? t71-first 1knyA -4.8 1knyA t71-first 1dlh[AD] -4.780 1iakA 56% t71-first 1seb[AE] -4.780 1iakA 56% t71-first 2sebA -4.780 1iakA 56% t71-first 1fnb_2 -4.770 1fnc Using Viterbi scoring with the fssp models, the top hits are 1phr t71-first varh50 1 2 -6.440 1bba t71-first varh50 1 2 -6.050 1ignA t71-first varh50 1 2 -5.900 1kjs t71-first varh50 1 2 -5.590 1pcl t71-first varh50 1 2 -5.220 1kit t71-first varh50 1 2 -4.780 1qba t71-first varh50 1 2 -4.410 With the target98-mixed models the best hits are 1fnc t71-first w0.5 0 2 -5.160 1fnc 1quf t71-first w0.5 0 2 -4.970 1fnc 1aa3 t71-first w0.5 0 2 -3.680 2reb 1knyA t71-first w0.5 0 2 -3.380 1knyA 1se4 t71-first w0.5 0 2 -3.190 1se4 1kit t71-first w0.5 0 2 -3.170 1kit With Viterbi scoring on the target98 models, the best hits are 1ignA t71-first varh50 1 2 -5.970 1ignA 1fnb_2 t71-first varh50 1 2 -5.850 1fnc 1fnc t71-first varh50 1 2 -5.460 1fnc 1kit t71-first varh50 1 2 -5.460 1kit 1fnd t71-first varh50 1 2 -5.100 1fnc 1bglA_4 t71-first varh50 1 2 -5.060 1bglA 1mmoD t71-first varh50 1 2 -4.990 1mhyD 2crd t71-first varh50 1 2 -4.850 2rcd 1kanA t71-first varh50 1 2 -4.840 1knyA 99% 1knyA t71-first varh50 1 2 -4.840 1knyA The t71-first.remote_4 alignment includes 1aa3 (and the Swissprot sequences for 2reb), so naturally the 2reb sequences score very well. Widening the set of possible alignments, we now get the following top ones: 1aa3/t71-first-1aa3-global 1aa3 63 -1.68 -8.44 2reb/t71-first-2reb-vit 2reb 303 -7.14 -8.17 1dlhA/t71-first-1dlhA-global 1dlhA 180 -9.27 -8.09 1sat/t71-first-1sat-global 1sat 468 -9.37 -7.95 1aa3/t71-first-1aa3-vit 1aa3 63 -6.55 -7.68 1fnc/1fnc-t71-first-vit t71-first 125 -5.81 -7.14 1kjs/1kjs-t71-first-fssp-global t71-first 125 -5.41 -6.32 2reb/t71-first-2reb-global 2reb 303 -5.55 -5.87 1kit/1kit-t71-first-vit t71-first 125 -6.05 -5.46 5cytR/t71-first-5cytR-vit 5cytR 103 -5.44 -5.46 1bba/1bba-t71-first-fssp-global t71-first 125 -6.28 -5.21 5cytR/t71-first-5cytR-global 5cytR 103 0.43 -4.99 1ignA/t71-first-1ignA-vit 1ignA 189 -5.12 -4.94 1knyA/1knyA-t71-first-vit t71-first 125 -4.67 -4.84 1dlhA/t71-first-1dlhA-vit 1dlhA 180 -5.02 -4.80 1knyA/t71-first-1knyA-global 1knyA 253 -4.43 -3.94 1sat/t71-first-1sat-vit 1sat 468 -4.81 -3.86 1phr/t71-first-1phr-global 1phr 154 0.12 -3.75 The t71-first-2reb-global alignment still looks pretty good, with 11 identical residues out of 63 aligned residues. I'd probably trim a little of the ends though, since there are no identical residues and the secondary structure doesn't match the prediction. t71-first-1aa3-global pulls out just the interesting part of the match, though I can make the final helix look a little more convincing by adding a 2-residue insert just before it (13 identical out of 58, with 1 2-residue insert t71-first-1aa3-hand1.a2m). The 1dlhA alignment still looks poor. The t71-first-1sat-global alignment has several residue ids, but is not compact in 3D. 1fnc-t71-first-vit has only 6 identical residues out of 17 aligned. 1kjs/1kjs-t71-first-fssp-global gets a very nice helix match for ECISDFTEAPVLNIQFR near the end of the domain, but nothing else. 1kit-t71-first-vit gets some fairly high residue identities, but for widely separated strands of a beta sandwich. Although we can sweep the unaligned parts of t71-first into the beta sandwich with pretty good residue conservation (1kit-t71-first-hand1.a2m), there are still too many holes in the beta sheet to make it a believable prediction. The t71-first-5cytR-vit alignment gets 5 identical residues out of 22 aligned---too small to be useful. The t71-first-5cytR-global alignment gets 13 identical residues in a gapless alignment of the first 60 residues. This may be better than the 2reb alignment! The cytochrome group that would hold the heme is not aligned to (which is good, since we don't expect a heme for T0071). 1bba-t71-first-fssp-global gets a 16-residue alignment with 5 identical residues. t71-first-1ignA-vit gets 8 identical residues out of 34 aligned, but the alignment is to a non-compact surface loop, and the structure is unlikely to be preserved in a different context. Fri Sep 4 11:20:35 PDT 1998 I think I'll resubmit t71, with the t71-1aa3-hand1 alignment for the first domain, and the 1hivB alignments for the second domain. I'll mention the 5cytR alignment and the 1kjs one, but not submit them. From karplus@cse.ucsc.edu Tue Sep 8 09:31:01 1998 Return-Path: karplus@cse.ucsc.edu Date: Tue, 8 Sep 1998 09:31:00 -0700 From: Kevin Karplus To: adamz@sb9.llnl.gov CC: karplus@cse.ucsc.edu In-reply-to: <9809041818.ZM2552@sb9.llnl.gov> (adamz@sb9.llnl.gov) Subject: Re: 1hvc vs. 1hivB OK, I see how what I wrote could be misunderstood. Since T0071 has expired, I assume I can't change anything in the prediction now, but if you want, you can change the paragraph Because the CASP web site was temporarily unable to understand the 1hvc pdb file, we redid the alignment to 1hviB, which is almost identical to one domain of 1hvc. These alignments are as good as the 1hvc ones, but 1hviA and 1hviB did not score nearly as well in the initial search, because they were not able to align simultaneously both parts of the barrel with only one copy of the domain. to the paragraph We are submitting an alignment to 1hviB, which is the single-domain protein that the chimeric two-domain 1hvc was created from. Although this change was initially prompted by a temporary bug in the CASP web stie, we decided that it made more sense to submit the single-domain protein when we were only aligning to one domain. The single-domain protein was not found directly by our methods, since the rotation of the barrel is not permissible in a simple alignment. It was only because of the duplication of the barrel in the 1hvc alignment that we were able to find this rotated alignment. From adamz@sb9.llnl.gov Wed Sep 9 16:28:57 1998 Mail-from: From adamz@sb9.llnl.gov Wed Sep 9 16:28:57 1998 Return-Path: adamz@sb9.llnl.gov Received: from popcorn.llnl.gov (popcorn.llnl.gov [128.115.18.60]) by services.cse.ucsc.edu (8.8.4/8.6.12) with ESMTP id QAA27674 for ; Wed, 9 Sep 1998 16:28:56 -0700 (PDT) Received: from sb9.llnl.gov (sb9.llnl.gov [128.115.17.117]) by popcorn.llnl.gov (8.8.8/LLNL-3.0.2) with SMTP id QAA11449 for <@popcorn.llnl.gov:karplus@cse.ucsc.edu>; Wed, 9 Sep 1998 16:28:55 -0700 (PDT) Received: by sb9.llnl.gov (951211.SGI.8.6.12.PATCH1042/930416.SGI) id QAA15214; Wed, 9 Sep 1998 16:28:54 -0700 From: "Adam Zemla" Message-Id: <9809091628.ZM15212@sb9.llnl.gov> Date: Wed, 9 Sep 1998 16:28:54 -0700 In-Reply-To: Kevin Karplus "Re: 1hvc vs. 1hivB" (Sep 8, 9:31am) References: <9809041605.ZM2054@sb9.llnl.gov> <199809050017.RAA03637@purr.cse.ucsc.edu> <9809041818.ZM2552@sb9.llnl.gov> <199809081631.JAA12268@purr.cse.ucsc.edu> X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: Kevin Karplus Subject: Re: 1hvc vs. 1hivB Cc: kaf@sb1.llnl.gov, adamz@sb9.llnl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii On Sep 8, 9:31am, Kevin Karplus wrote: > Subject: Re: 1hvc vs. 1hivB > > OK, I see how what I wrote could be misunderstood. > Since T0071 has expired, I assume I can't change anything in the > prediction now, but if you want, you can change the paragraph > > Because the CASP web site was temporarily unable to understand the > 1hvc pdb file, we redid the alignment to 1hviB, which is almost > identical to one domain of 1hvc. These alignments are as good as the > 1hvc ones, but 1hviA and 1hviB did not score nearly as well in the > initial search, because they were not able to align simultaneously > both parts of the barrel with only one copy of the domain. > > to the paragraph > > We are submitting an alignment to 1hviB, which is the single-domain > protein that the chimeric two-domain 1hvc was created from. Although > this change was initially prompted by a temporary bug in the CASP web > stie, we decided that it made more sense to submit the single-domain > protein when we were only aligning to one domain. The single-domain > protein was not found directly by our methods, since the rotation of > the barrel is not permissible in a simple alignment. It was only > because of the duplication of the barrel in the 1hvc alignment that we > were able to find this rotated alignment. >-- End of excerpt from Kevin Karplus Dear Dr. Karplus, Thank you for your decision. We replaced the appropriate paragraph in your prediction: T0071AL019_1 PIN_372391_4452 We include your prediction below: PFRMAT AL TARGET T0071 AUTHOR 9070-5088-8627 REMARK REMARK Prediction date: 4 Sept 1998 REMARK Group name: UCSC-compbio REMARK Authors: Christian Barrett, Melissa Cline, Mark Diekhans, Leslie Grate, REMARK Kevin Karplus, David Haussler, and Richard Hughey REMARK University of California, Santa Cruz REMARK METHOD Overview METHOD METHOD Fold recognition was performed using the Target98 (SAM-T98) method METHOD [3] using SAM version 2.1.1 [1], a refinement of the methods developed METHOD by this group for CASP2 [2]. This method attempts to find and multiply METHOD align a set of homologs to a given sequence, then create an HMM from that METHOD multiple alignment. METHOD METHOD First, a set of sequence weights is determined from the alignment. Next, METHOD Modelfromalign is used to build the model from the alignment and the METHOD sequence weights. Finally, hmmscore performs a local, all-paths scoring METHOD of the sequences, using a reversed-sequence normalization feature. METHOD METHOD The weighting method, detailed in upcoming publications [3,4], METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an METHOD entropy method to set the final weights. METHOD METHOD Alignment generation METHOD METHOD The initial step uses BLASTP to search NRP twice: once to produce a set METHOD of very close homologs, and once to produce a set of possible homologs. METHOD METHOD The method then uses multiple iterations of a selection, training, and METHOD alignment procedure. Each iteration involves an initial alignment, a set METHOD of search sequences, a threshold value, and a transition regularizer. METHOD METHOD The first iteration uses a single sequence (or seed alignment) as the METHOD initial alignment and the close homologs found by BLASTP are used as the METHOD search set. The threshold is set very strictly, so that only good matches METHOD to the sequence are considered. This iteration uses a transition regularizer METHOD that was designed to match the gap costs used by BLASTP. METHOD METHOD On subsequent iterations the input alignment is the output from the METHOD previous iteration, the search set is the larger set of possible METHOD homologs found by BLASTP, and the thresholds are gradually loosened. METHOD The second through second-from-last iteration use a ``long-match'' METHOD transition regularizer, and the final iteration uses a transition regularizer METHOD trained on FSSP alignments. METHOD METHOD References METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. METHOD http://www.cse.ucsc.edu/research/compbio/sam.html. METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. METHOD Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and METHOD Genetics, Suppl. 1, 134-9, 1997. METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, METHOD Department of Computer Engineering, Univ. of California, Santa Cruz, 1998. METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, METHOD and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574-578, Nov 1994. METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. METHOD Mian, and D. Haussler, CABIOS 12(4):327-345, 1996. METHOD METHOD METHOD Based on the commonness of SH3 domains in proteins associated with METHOD endocytosis (in particular in EGF which binds to Eps15), we were METHOD hoping that the ear-domain of the alpha adaptin would also contain SH3 METHOD domains. METHOD METHOD Our top two scores were in range the where about half the hits were METHOD false positives in our tests. The top hit (for 2reb) did not have the METHOD hoped-for barrels, but the second hit (for 1hvc) had barrels similar to METHOD SH3 domains. We first tried to align the two domains of T0071 to the METHOD two barrels of 1hvc, but later decided to use an alignment to 1aa3 (a METHOD fragment of 2reb) for the first domain, and 1hvc only for the second METHOD domain. We had two other possible hits for the first domain (5cytR METHOD and 1kjs), but preferred the better-scoring 1aa3/2reb hit. METHOD METHOD The automatic alignment for 1hvc covered the first barrel and the METHOD first two strands of the second barrel. Most of the alignment came METHOD from the second domain. We finally decided to thread the barrel METHOD differently, with the two strands coming from the end of the second METHOD T0071 domain (where the automatic alignment had switched to the other METHOD barrel), rather from the beginning as in 1hvc. The chain breaks are METHOD located so that the rearrangement can be made with minimal change to METHOD the structure, though the binding site in the center of 1hvc may be METHOD lost. METHOD METHOD We are submitting an alignment to 1hviB, which is the single-domain METHOD protein that the chimeric two-domain 1hvc was created from. Although METHOD this change was initially prompted by a temporary bug in the CASP web METHOD site, we decided that it made more sense to submit the single-domain METHOD protein when we were only aligning to one domain. The single-domain METHOD protein was not found directly by our methods, since the rotation of METHOD the barrel is not permissible in a simple alignment. It was only METHOD because of the duplication of the barrel in the 1hvc alignment that we METHOD were able to find this rotated alignment. METHOD MODEL 1 PARENT 1aa3 V 15 I 268 L 16 N 269 F 17 F 270 E 18 Y 271 N 19 G 272 Q 20 E 273 L 21 L 274 L 22 V 275 Q 23 D 276 I 24 L 277 G 25 G 278 L 26 V 279 K 27 K 280 S 28 E 281 E 29 K 282 F 30 L 283 R 31 I 284 Q 32 E 285 N 33 K 286 L 34 A 287 G 35 G 288 R 36 A 289 M 37 W 290 F 38 Y 291 I 39 S 292 F 40 Y 293 Y 41 K 294 G 42 G 295 N 43 E 296 K 44 K 297 T 45 I 298 S 46 G 299 T 47 Q 300 Q 48 G 301 F 49 K 302 L 50 A 303 N 51 N 304 F 52 A 305 T 53 T 306 P 54 A 307 T 55 W 308 L 56 L 309 I 57 K 310 D 60 D 311 D 61 N 312 L 62 P 313 Q 63 E 314 T 64 T 315 N 65 A 316 L 66 K 317 N 67 E 318 L 68 I 319 Q 69 E 320 T 70 K 321 K 71 K 322 P 72 V 323 V 73 R 324 D 74 E 325 TER PARENT 1hvi_B F 126 A 28 Q 127 D 29 P 128 D 30 T 129 T 31 E 130 V 32 M 131 L 33 A 132 E 34 S 133 E 35 Q 134 M 36 D 135 S 37 F 136 L 38 F 137 P 39 Q 138 G 40 R 139 R 41 W 140 W 42 K 141 K 43 P 146 P 44 Q 147 K 45 Q 148 M 46 E 149 I 47 V 150 G 48 Q 151 G 49 N 152 I 50 I 153 G 51 F 154 G 52 K 155 F 53 A 156 I 54 K 157 K 55 H 158 V 56 P 159 R 57 M 160 Q 58 D 161 Y 59 T 162 D 60 E 163 Q 61 I 164 I 62 K 166 L 63 A 167 I 64 K 168 E 65 I 169 I 66 I 170 C 67 G 171 G 68 F 172 H 69 G 173 K 70 S 174 A 71 A 175 I 72 L 176 G 73 L 177 T 74 E 178 V 75 E 179 L 76 V 180 V 77 D 181 G 78 P 182 P 79 N 183 T 80 P 184 P 81 A 185 V 82 N 186 N 83 F 187 I 84 V 188 I 85 G 189 G 86 A 190 R 87 G 191 N 88 I 192 L 89 I 193 L 90 T 198 T 91 Q 199 Q 92 I 200 I 93 G 201 G 94 C 202 C 95 L 203 T 96 L 204 L 97 L 210 P 1 Q 211 Q 2 A 212 I 3 Q 213 T 4 M 214 L 5 Y 215 W 6 R 216 Q 7 L 217 R 8 T 218 P 9 L 219 L 10 R 220 V 11 T 221 T 12 S 222 I 13 K 223 K 14 D 224 I 15 T 225 G 16 S 227 G 17 Q 228 Q 18 R 229 L 19 L 230 K 20 C 231 E 21 E 232 A 22 L 233 L 23 L 234 L 24 S 235 D 25 E 236 T 26 Q 237 G 27 TER END -- ********************************************************************* Adam Zemla, L-452 E-mail: adamz@llnl.gov Biology and Biotechnology Research Program Phone: (925) 423-5571 Lawrence Livermore National Laboratory Fax: (925) 422-2282 University of California, P.O.Box 808 Livermore, CA 94550, USA *********************************************************************