Wed Jul 26 14:08:22 PDT 2000 T0120 DNA repair protein XRCC4, human wu-blast gets very weak hits: 1qcrH, 1ltd[AB], 1fcb[AB], 1le4, 1ldc[AB], 1lco[AB], 1ddf double-blast gets nothing. t2k alignment finds only 2 copies of sequence. 2ry structure prediction is mixed helical and strand. The target model finds only weak hits, to two different folds: 1bui[AB] 3.0 5ptp 2.44.1.2.41 1qrz[ABCD] 3.5 5ptp 2.44.1.2.41 1ddj[ABCD] 3.6 8kme2 2.44.1.2.41 1bml[AB] 3.6 5ptp 2.44.1.2.41 1qriA 10. 1ckqA 3.47.1.1.1 1qpsA 19. 1ckqA 3.47.1.1.1 1qrhA 20. 1ckqA 3.47.1.1.1 1cl8A 22. 1ckqA 3.47.1.1.1 1qc9[ABC] 22. 1ckqA 3.47.1.1.1 1eriA 22. 1ckqA 3.47.1.1.1 1ckqA 22. 1ckqA 3.47.1.1.1 The template models add a few hits 1quuA 0.97 1quuA 1.7.1.1.3 1kvs 30.7 1xel 3.2.1.2.1 1c8zA 32.6 1c8zA 4.21.1.1.1 No two-way hits. 27 July 2000 "make remote" changes the target hits. id e-value FSSP SCOP 1mup 3.1 1mup 2.56.1.1.10 1nsj 8.2 1nsj 3.1.2.2.2 1df3A 18. ? ? (too new?) Still no two-way hits. Maybe should look at 2.44.1, 2.56.1, 1.7.1, and 3.1.2 hits. Sat Aug 26 15:20:55 PDT 2000 Kevin Karplus Remade 2track predictions Still no good hits: % Sequence ID Length Simple Reverse E-value SCOP 1be3B 439 -28.56 -10.21 1.6e-01 4.111.1 1atlA 202 -24.12 -10.07 1.6e-01 4.76.1 1b3qA 379 -31.31 -8.90 1.1e+00 1.31.2,2.38.64.101.1 1nfp 228 -27.91 -8.75 1.1e+00 3.1.15 1ltm 320 -28.52 -8.59 1.1e+00 4.2.1 1c4xA 285 -24.70 -8.35 1.1e+00 3.64.1 1dqrA 557 -25.31 -8.29 1.1e+00 3.74.1 These are all weak, and all over structure space. There is no agreement with the plain target model or the template models. 28 August 2000 Rachel Karchin top CAFASP hits: fold #server #found #top #within10% 1.7 7 13 4 0 1.110 5 10 3 0 8.3 5 5 1 1 1.108 5 5 1 0 3.64 4 6 1 0 3.31 4 5 0 2 1.106 3 4 0 2 3.1 2 4 1 0 1.49 2 3 1 1 8.1 2 2 1 2 2.18 1 5 1 0 6.1 2 2 1 1 8.4 2 2 0 0 Our top template model hit is 1.7.1.1.3 Three out of five Genthreader CAFASP hits are 1.7.1 Some information about XRCC4: o Has a counterpart in yeast -- LIF1 (weak but significant sequence homology) o Crystal structure has been determined by Wei Yang and Martin Gellert group at NIH/NDDK. It was presented at the NDDK mid-Atlantic crystallography workshop in May 2000 by Murray Junop but I haven't been able to ferret out any details through web searching. o Article about XRCC4 (Jan '98) available online at http://www.jbc.org/cgi/content/full/273/3/1794 "murine XRCC4 expressed in insect cells exists primarily as a disulfide-linked homodimer although it can also form large multimers" Tue Sep 5 10:35:24 PDT 2000 remaking 2track Top 2-track unchanged. Sun Sep 10 16:55:17 PDT 2000 Checking some of our top alignments: 1c4xA/T0120-1c4xA-2track-global 1c4xA 281 -3.83 -14.50 2.5e-06 Some good residue ID and 2ry conservation, but interior strands of sheet missing, and insertion regions have predicted strands. Can't fix by moderate realignments. 1quuA/1quuA-T0120-global T0120 336 -12.08 -12.27 1.8e-05 Just matches large helices. Doesn't have predicted sheets. 1ltm/T0120-1ltm-2track-global 1ltm 309 -2.32 -11.47 5.0e-05 Some nice helical matches, but skips over all predicted sheets. 1atlA/T0120-1atlA-2track-local 1atlA 200 -24.13 -10.10 1.4e-04 1be3B/T0120-1be3B-2track-local 1be3B 419 -28.60 -10.21 1.4e-04 1ltm/T0120-1ltm-2track-global 1ltm 320 -2.95 -10.45 1.4e-04 1quuA/1quuA-T0120-vit T0120 336 -16.14 -9.94 3.7e-04 1b3qA/T0120-1b3qA-2track-local 1b3qA 379 -31.31 -8.90 1.0e-03 1c4xA/T0120-1c4xA-2track-local 1c4xA 281 -24.71 -8.42 1.0e-03 1c8zA/1c8zA-T0120-vit T0120 336 -11.15 -8.07 1.0e-03 1ckqA/T0120-1ckqA-vit 1ckqA 261 -11.07 -8.46 1.0e-03 7 identical residues in 18-residue gapless alignment Not compatible with t120-1-115/1ckqA/T0120-1-115-1ckqA-local (see below) 1eriA/T0120-1eriA-vit 1eriA 261 -11.07 -8.46 1.0e-03 1ltm/T0120-1ltm-2track-local 1ltm 309 -27.93 -8.67 1.0e-03 1nfp/T0120-1nfp-2track-local 1nfp 228 -27.91 -8.75 1.0e-03 1quuA/1quuA-T0120-local T0120 336 -21.04 -8.24 1.0e-03 1ckqA/T0120-1ckqA-local 1ckqA 261 -16.89 -7.11 2.7e-03 1eriA/T0120-1eriA-local 1eriA 261 -16.89 -7.11 2.7e-03 1c4xA/1c4xA-T0120-fssp-global T0120 336 -25.32 -6.51 7.4e-03 1c8zA/1c8zA-T0120-local T0120 336 -17.14 -6.76 7.4e-03 1ckqA/T0120-1ckqA-local 1ckqA 276 -16.84 -6.97 7.4e-03 1eriA/T0120-1eriA-local 1eriA 276 -16.84 -6.97 7.4e-03 1c8zA/T0120-1c8zA-vit 1c8zA 265 -8.94 -5.85 2.0e-02 1ckqA/1ckqA-T0120-local T0120 336 -14.69 -5.28 2.0e-02 1ckqA/1ckqA-T0120-vit T0120 336 -8.93 -5.44 2.0e-02 1eriA/1eriA-T0120-local T0120 336 -14.69 -5.28 2.0e-02 1eriA/1eriA-T0120-vit T0120 336 -8.93 -5.44 2.0e-02 1kvs/1kvs-T0120-global T0120 336 -7.87 -5.27 2.0e-02 1c8zA/T0120-1c8zA-local 1c8zA 265 -14.17 -3.95 1.4e-01 1kvs/1kvs-T0120-local T0120 336 -24.02 -3.95 1.4e-01 1kvs/T0120-1kvs-2track-local 1kvs 338 -17.07 -2.74 3.6e-01 Wed Sep 13 12:46:23 PDT 2000 Kevin Karplus The T0120.remote-t2k seems to have picked up some real signal. The top 2-track predictions for T0120.remote-t2k are % Sequence ID Length Simple Reverse E-value SCOP 1dceA 568 -29.15 -10.05 1.6e-01 3.9.3? 1.110.6? 1.110.8? 1dkzA 219 -29.89 -9.30 4.3e-01 5.17.1 1a87 321 -31.50 -9.29 4.3e-01 6.1.1 1qs2A 401 -27.40 -8.72 1.2e+00 4.144.1,4.144.1 1be3B 439 -23.19 -8.45 1.2e+00 4.111.1 1qs1A 462 -23.84 -8.04 1.2e+00 4.144.1,4.144.1 1qcrA 446 -20.97 -7.79 3.2e+00 4.111.1 1qbeA 132 -21.23 -7.76 3.2e+00 4.70.1 1c0aA 585 -23.17 -7.72 3.2e+00 2.38.4,4.59.1,4.87.1 Remaking joints with remote-t2k as target alignment. Top alignments now: 1a87/T0120-1a87-2track-global 1a87 297 -8.71 -29.28 7.6e-13 not bad, and hand-editable to get pretty good residue id. 1a87/T0120-1a87-karplus1.a2m Still one missing helix. 1a87 is colicin N, a transmembrane channel-forming toxin. This seems an unlikely homolog for a DNA repair protein. 1dkzA/T0120-1dkzA-2track-global 1dkzA 215 -16.26 -21.62 2.3e-09 Looks good---very nice striping across beta sheet. Only extends up to about 210, which I had previously conjectured was a domain boundary. This is "substrate binding domain of dnak biological_unit"---that is a heat-shock protein 70---a chaerpone. Domain break around 115. The helices have a rather ambiguous alignment. 1dkzA/T0120-1dkzA-karplus1.a2m POSSIBLE PREDICTION 1qbeA/T0120-1qbeA-2track-global 1qbeA 123 -8.33 -20.84 6.2e-09 very low residue ID. capsid protein. Unlikely match. 1nsj/T0120-1nsj-global 1nsj 205 -10.45 -15.63 9.2e-07 phosphoribosylantranilate isomerase Mutant. TIM barrel. Very good residue id, poor 2ry match. Large insertion predicted to be long helix (an unlikely insertion). 1nfp/T0120-1nfp-2track-global 1nfp 228 -3.25 -14.53 2.5e-06 somewhat choppy alignment with moderate residue ID. flavoprotien. 1quuA/1quuA-T0120-global T0120 336 -12.08 -12.27 1.8e-05 long helix with turn roughly confirming 2ry structure prediction for 140-210 (turn slightly off). Not very useful for full fold prediction. 1mup/T0120-1mup-global 1mup 166 -2.83 -11.57 5.0e-05 nice beta barrel, but with many insertions. Poor 2ry match. 1atlA/T0120-1atlA-2track-global 1atlA 200 0.17 -10.74 1.4e-04 Nice match to coil and helix in middle---rest of alignment rather dubious. atrolysin c (hemorrhagic toxin c, form d) 1dceA/T0120-1dceA-2track-local 1dceA 567 -29.16 -10.05 1.4e-04 Looks pretty good, and can be extended to cover entire protein 1dceA/T0120-1dceA-karplus1.a2m POSSIBLE PREDICTION Good match on first (all beta) domain up to about 115 Next domain (116-202) all helical, somewhat ambiguous alignment Third domain (203-end) beta helix. RAB geranylgeranyltransferase alpha subunit. Somehow seems like an unlikely function for the target. 1a87/T0120-1a87-2track-local 1a87 321 -31.50 -9.29 3.7e-04 1dkzA/T0120-1dkzA-2track-local 1dkzA 219 -29.89 -9.30 3.7e-04 1quuA/1quuA-T0120-vit T0120 336 -16.14 -9.94 3.7e-04 1be3B/T0120-1be3B-2track-local 1be3B 419 -23.24 -8.47 1.0e-03 1c8zA/1c8zA-T0120-vit T0120 336 -11.15 -8.07 1.0e-03 1mup/T0120-1mup-local 1mup 157 -18.78 -8.90 1.0e-03 1qs1A/T0120-1qs1A-2track-local 1qs1A 402 -23.97 -8.16 1.0e-03 1qs2A/T0120-1qs2A-2track-local 1qs2A 401 -27.40 -8.72 1.0e-03 1quuA/1quuA-T0120-local T0120 336 -21.04 -8.24 1.0e-03 1c0aA/T0120-1c0aA-2track-local 1c0aA 585 -23.17 -7.72 2.7e-03 1kvs/T0120-1kvs-2track-local 1kvs 338 -20.51 -7.07 2.7e-03 1mup/T0120-1mup-vit 1mup 157 -11.42 -7.34 2.7e-03 Wed Sep 13 15:47:59 PDT 2000 Trying to predict the last domain: 211-end. T2k alignment finds only sequence itself. Remote search adds a few. Nothing found in PDB with regular target model or remote target model. Regular 2-track finds nothing. Remote 2-track finds 2 weakly: 1cr6B, 1aocA. Very weak hits with 4 templates: 1poiA, 1d4oA, 1skyE, 1a9cA. Top alignments: 1cr6B/T0120-211-end-1cr6B-2track-local 1cr6B 541 -19.48 -9.62 3.7e-04 not compact. missing 2 interior strands of beta sheet. 1aocA/T0120-211-end-1aocA-2track-local 1aocA 175 -14.60 -5.15 2.0e-02 low residue identity. not compact. 1a8rA/T0120-211-end-1a8rA-2track-local 1a8rA 221 -13.62 -4.50 5.4e-02 small fragment and not compact. 1aocA/1aocA-T0120-211-end-vit T0120-211-end 126 -3.64 -2.70 3.6e-01 1-helix match, with only identical residues before and after helix. 1cr6B/T0120-211-end-1cr6B-vit 1cr6B 541 -3.42 -1.96 8.1e-01 match 1 strand and bit of loop. 1poiA/T0120-211-end-1poiA-local 1poiA 317 -10.93 -1.52 8.1e-01 short, gappy alignment, poor 2ry match. Basically, it looks like we can't find a match for this domain. Wed Sep 13 16:16:06 PDT 2000 Kevin Karplus Let's try doing separate searches for the first two domains: 1-115 and 116-120. Thu Sep 14 03:25:45 PDT 2000 First domain: Even make remote finds no more homologs. remote 2track weak hits 1svy 4.90.1 1din 3.64.1 1svr 4.90.1 1azpA 4.8.2 1d0nA 4.90.1,4.90.1,4.90.1,4.90.1,4.90.1,4.90.1 1whtA 3.64.1 1ekjB ? target weak hits 1ckqA 3.47.1 1a8rA 4.80.1 1cfe 4.92.1 template weak hits 1bs2A 1.28.1,3.19.1,4.54.2 Of these, we have seen 3.64.1 (1c4xA) and 3.47.1 (1ckqA) in the full-length search. Thu Sep 14 09:03:50 PDT 2000 make remoter adds some more sequences that may be related. Top remoter target hits are 1fcb[AB], 1lco[AB], 1ldc[AB], 1ltd[AB] all of which are flavocytochrome b (represented in FSSP by 1gox, glycolate oxidase, which they are 41% identical to) I'll make a t99 alignment for 1fcbA (the best resolution of these) and try using it as a template also. remoter 2track gets as top hits 1cleA 534 -18.72 -8.43 1.2e+00 3.64.1 1crl 534 -18.05 -7.26 3.2e+00 3.64.1 1qjwA 365 -17.34 -6.19 8.7e+00 3.5.1 1cozA 129 -17.44 -6.06 8.7e+00 3.19.1 3cbh 365 -15.94 -5.90 2.3e+01 3.5.1 2bvwA 362 -16.81 -5.89 2.3e+01 ? Hmm, we're seeing more of 3.64.1. Top alignments are 1fcbA/T0120-1-115-1fcbA-vit 1fcbA 511 -44.14 -40.97 1.3e-17 gapless alignment of 3 strands of mixed beta sheet. 13 identical residues. poor 2ry match. 1fcbA/T0120-1-115-1fcbA-local 1fcbA 511 -48.40 -39.73 3.5e-17 Essentially the same as vit alignment. slightly longer 14 identical residues. 1svy/T0120-1-115-1svy-2track-global 1svy 102 -12.77 -15.12 9.2e-07 looks pretty good. 20 identical residues with one insertion. Pretty good 2ry match. (fixed the overalignment to residues not in atom list: 1svy/T0120-1-115-1svy-karplus.a2m POSSIBLE PREDICTION. 1svy/T0120-1-115-1svy-2track-local 1svy 114 -19.92 -7.86 2.7e-03 Same alignment with less of C-terminus. 1cozA/T0120-1-115-1cozA-global 1cozA 129 -2.05 -13.15 6.8e-06 rather gappy alignment. Missing an interior strand. 1svr/T0120-1-115-1svr-2track-global 1svr 94 -8.26 -12.40 1.8e-05 similar to 1svy alignment, only 16 identical residues. 1svr/T0120-1-115-1svr-karplus.a2m 1bs2A/T0120-1-115-1bs2A-vit 1bs2A 603 -10.57 -10.04 1.4e-04 short gapless fragment, ok 2ry match, 12 identical residues Can be extended to 24 identical residues with one gap in a reasonable place. 1bs2A/T0120-1-115-1bs2A-karplus.a2m POSSIBLE PREDICTION 1ckqA/T0120-1-115-1ckqA-vit 1ckqA 261 -11.26 -9.77 3.7e-04 gapless alignment with 16 identical residues. DNA-binding. Missing lots of sheet. 1ckqA/T0120-1-115-1ckqA-local 1ckqA 261 -16.25 -7.85 2.7e-03 23 identical residues with 1 1-residue deletion. DNA-binding. Can be extended (weakly) to get another couple of basic residues aligned. 1ckqA/T0120-1-115-1ckqA-karplus.a2m POSSIBLE PREDICTION. 1cozA/T0120-1-115-1cozA-global 1cozA 126 -2.87 -9.35 3.7e-04 rather gappy alignment. Missing interior strand. 1azpA/T0120-1-115-1azpA-2track-global 1azpA 66 0.20 -8.54 1.0e-03 DNA-binding. poor 2ry match. 12 identical residues with one short insertion. 1cfe/1cfe-T0120-1-115-vit T0120-1-115 115 -10.45 -8.56 1.0e-03 6 identical residues in very short fragment. 1cleA/T0120-1-115-1cleA-2track-local 1cleA 534 -18.72 -8.43 1.0e-03 10 identical residues, with one gap and one insertion. Missing most of sheet. 1crl/T0120-1-115-1crl-2track-local 1crl 534 -18.05 -7.26 2.7e-03 11 identical residues, with one gap and one insertion. Missing most of sheet. 1bs2A/T0120-1-115-1bs2A-local 1bs2A 603 -14.53 -6.17 7.4e-03 13 identical residues in gapless alignment Poor 2ry match. Thu Sep 14 03:25:45 PDT 2000 Second domain: make remote does find some more possible homologs, though not many. remote 2track weak hits 1dl2A 1.97.2 1hsm 1.22.1 2kinB 3.31.1 7aatA 3.62.1 1aa0 8.1.14 target weak hits 1nfn 1.25.1 template weak hits 1quuA 1.7.1,1.7.1 1bco 2.45.1,3.50.3 1cwpA 2.9.1 Of these, we have seen 1.7.1 (1quuA) in the full-length search. Thu Sep 14 09:21:21 PDT 2000 make remoter target model finds nothing, as does remoter 2track. Best current alignments: 1quuA/1quuA-T0120-116-210-vit T0120-116-210 95 -17.03 -12.67 1.8e-05 16 identical residues, helix turns in places different from predicted. 1quuA/1quuA-T0120-116-210-local T0120-116-210 95 -20.80 -10.38 1.4e-04 essentially the same. 1quuA/1quuA-T0120-116-210-global T0120-116-210 95 -0.96 -9.98 3.7e-04 essentially the same. 1cwpA/1cwpA-T0120-116-210-global T0120-116-210 95 0.98 -11.07 5.0e-05 12 identical residues with 2 gaps. DNA binding. Poor 2ry match. 1cwpA/1cwpA-T0120-116-210-local T0120-116-210 95 -14.33 -6.05 7.4e-03 9 identical residues with 1 gap poor 2ry match. 1cwpA/1cwpA-T0120-116-210-vit T0120-116-210 95 -9.01 -6.63 7.4e-03 6 identical residues in short gapless fragment. poor 2ry match. 1aa0/T0120-116-210-1aa0-2track-global 1aa0 113 -9.59 -10.17 1.4e-04 7 identical residues on coiled-coil. 1aa0/T0120-116-210-1aa0-2track-local 1aa0 113 -22.38 -5.06 2.0e-02 2kinB/T0120-116-210-2kinB-2track-global 2kinB 100 -9.99 -9.20 3.7e-04 pulls out one strand and some helices. Rest of sheet missing. 2kinB/T0120-116-210-2kinB-2track-local 2kinB 100 -17.07 -5.25 2.0e-02 1bco/1bco-T0120-116-210-local T0120-116-210 95 -15.84 -6.14 7.4e-03 not compact. No way to get sheet that is crucial to fold. 1bco/1bco-T0120-116-210-vit T0120-116-210 95 -9.71 -6.58 7.4e-03 same problems 1bco/T0120-116-210-1bco-vit 1bco 327 -3.93 -2.28 3.6e-01 very short fragment with poor 2ry match. 1hsm/T0120-116-210-1hsm-2track-local 1hsm 79 -21.21 -5.98 2.0e-02 10 identical residues. one turn almost in predicted place. One predicted turn missing, one extra predicted turn. 1hsm/T0120-116-210-1hsm-2track-global 1hsm 79 -10.15 -3.56 1.4e-01 essentially the same. 1nfn/T0120-116-210-1nfn-vit 1nfn 132 -7.86 -5.70 2.0e-02 11 identical residues in 19 residue fragment. One helix. Confirms helix prediction here. 1nfn/T0120-116-210-1nfn-2track-global 1nfn 132 -10.03 -4.69 5.4e-02 low residue identity to gover 3 helices of 4-helix bundle. No conservation near turns. 1nfn/T0120-116-210-1nfn-local 1nfn 132 -12.76 -4.18 5.4e-02 Two helices of 4-helix bundle align well, but no conservation near turn---just confirms the ampipathic helix predictions. 7aatA/T0120-116-210-7aatA-2track-local 7aatA 401 -22.41 -5.09 2.0e-02 not compact. 7 identical residues no conservation near turns. Thu Sep 14 14:33:25 PDT 2000 Kevin Karplus Since I'm running out of time, I'll submit 5 models: 1dkzA/T0120-1dkzA-karplus1.a2m 1dceA/T0120-1dceA-karplus1.a2m 1svy/T0120-1-115-1svy-karplus.a2m 1bs2A/T0120-1-115-1bs2A-karplus.a2m 1ckqA/T0120-1-115-1ckqA-karplus.a2m For the 2ry structure, I'll piece together the "remoter" models for the first two domains and the "remote" model for the thrid domain. Tue Nov 28 16:50:27 PST 2000 Looked at the final result. There is indeed a domain break at 116. The second domain is just one long helix all the way to 202. The third domain does not seem to be in the PDB file. DALI says t120 shares weak similarity with 1aa0 (5.8), 1ezjA (4.7), 1htmB (4.1), 2occI (4.0). Of these, only 1aa0 came up in our searches, and that is mainly a long-helix hit for the second domain.