8 June 2000 Kevin Karplus T0096 FadR, E. coli No blast hits. Pretty good double-blast hit to 1lea and 1leb. Over 100 sequences in the T2K alignment (none of them in PDB). No tight grouping (phytree gets 52 different families). 9 June 2000 Kevin Karplus Top target model hits chain E-value FSSP rep SCOP hit both ways 1qaa[AB] 0.91 (theoretical model of LexA repressor) 1lea, 1leb 1.5 1lea 1.4.4.2.1 * 1bia, 1bib 4.7 1bia 1.4.4.1.1 1pbf 8.9 1pbe 3.3.1.2.3,4.14.1.2.1 1dob 14. 1pbe 3.3.1.2.4,4.14.1.2.2 1do[c-e] 15. 1pbe 3.3.1.2.4,4.14.1.2.2 1bgn 15. 1pbe 3.3.1.2.4,4.14.1.2.2 1pxa, 1pxc 15. 1pbe 3.3.1.2.4,4.14.1.2.2 1iu[s-x] 15. 1pbe 3.3.1.2.4,4.14.1.2.2 1d7lA 15. 1pbe 3.3.1.2.4,4.14.1.2.2 1cgp[AB] 15. ? 1.4.4.4.1,2.77.4.1.1 * 1vhi[AB] 15. 1b3tA 4.48.8.1.4 Top template model hits 1cgpA_1 0.062 ? 1.4.4.4.1 * 1bi0 0.24 1bi0 1.4.4.20.1,1.74.1.1.1,2.32.1.2.1 2dtr 0.24 1bi0 1.4.4.20.1,1.74.1.1.1,2.32.1.2.1 2cgpA 1.9 ? 1.4.4.4.1,2.77.4.1.1 (was previously called 2cgpC) 4eugA 6.4 3eugA 3.14.1.1.3 1udg 7.6 3eugA 3.14.1.1.2 2yhx 7.9 2yhx 9.8.1.1.8 1b78A 8.8 1b78A 3.46.3.1.1 1qr7A 11. 1qr7A 3.1.9.4.1 1a7j 11. 1a7j 3.31.1.5.1 1kit 12. 1kit (2.28.1.8.1)^2,2.63.1.1.6 2yhx_2 20. 2yhx 9.8.1.1.8 1bqz 22. 1bqz 1.2.2.1.2 1lea 23. 1lea 1.4.4.2.1 * 1b54 27. 1ct5A 3.1.5.2.1 The two-way hits are 1lea and 2cgpA, which are in the same superfamily: "Winged helix" DNA-binding domain. The FSSP neighbors of 1lea that occur in the hit lists are 1: 1lea 1lea 18.3 0.0 72 72 100 0 0 1 S Lexa repressor DNA binding domain (NMR, minimized avera 2: 1lea 1leb 13.4 1.3 71 72 100 0 0 1 S Lexa repressor DNA binding domain (NMR, 28 structures) 6: 1lea 1bia 6.7 2.3 61 292 15 0 0 4 S Bira bifunctional protein (acts as biotin operon repres 12: 1lea 1bi0 5.7 2.0 63 214 18 0 0 5 S diphtheria toxin repressor (dtxr) biological_unit According to Swissprot, this target is a DNA-binding protein, so the winged-helix motif is a likely structure: Function: Multifunctional regulator of fatty acid metabolism. Represses transcription of at least eight genes required for fatty acid transport and beta-oxidation and activates transcription of at least two genes required for unsaturated fatty acid biosynthesis (such as faba) and iclr, the gene encoding the transcriptional regulator of the acebak operon encoding the glyoxylate shunt enzymes. Binds a DNA sequence in the operator of fadb. Binding of FadR is specifically inhibited by long chain fatty acyl-coa compounds. Subunit: homodimer. Similarity: Belongs to the GNTR family of transcriptional regulators. I think we should go with some winged-helix alignment. Of course, the people who solved the structure give lots of info away: Two domain protein: DNA binding and acyl-coenzyme A binding. The latter domain has a novel fold (no significant DALI hits). Site of acyl-CoA binding known from mutations, yet the structure determination concerns the apo form, so this could be an interesting ab-inito structure determination target for the last domain, but also a docking target to find the binding site (if you can predict the structure acurately, that is :-) ). The protein is a HOMODIMER, *both* in presence and absence of DNA. Binding of acyl-CoA presumably causes a conformational change, switching off DNA binding and thus transcriptional respression. For 8. below: weak homology to HTH DNA binding domains for the first 72 residues of the structure. No homology for C-terminal domain. I don't think we'll be able to predict the novel fold, but we should be able to get a good model for the first 72 residues. The alignment 1bia/T0096-1bia-global.pw, though scoring very well, does not look so good. There is a mid-helix gap, and half the alignment seems to have very little conservation. The alignment 1bia/1bia-T0096-fssp-global.pw looks better, but I'm not convinced the second domain is being handled right. Also the sequence EVLATANEVAD is strongly predicted as being helical, but the 1bia alignment puts it on a turn and beta strand. The alignment 1bi0/T0096-1bi0-global.pw looks good, with a cluster of conserved residues in the second domain and a pretty good alignment with few gaps (and those in reasonable places) for the first domain (up to about residue 120). The alignment 1bi0/1bi0-T0096-fssp-global.pw also looks very good in the first domain (slightly better than 1bi0/T0096-1bi0-global.pw), but the second domain may be overalignment. The Viterbi alignment 1bi0/T0096-1bi0-vit.pw only picks up a tiny piece, and the local alignment 1bi0/T0096-1bi0-local.pw also seems to be too small. The EVLATANEVAD sequence does correspond to a helix in the 1bi0/1bi0-T0096-fssp-global.pw alignment, so this is definitely my favorite alignment right now. The alignment 1lea/T0096-1lea-global.pw is excellent for about 70 residues---this is probably the "weak homology" referred to by the submitters of the structure. The local alignment 1lea/T0096-1lea-local.pw is the same, but without the useless alignment to residues that aren't in the ATOM list, but the fssp alignment 1lea/1lea-T0096-fssp-global.pw seems poorer. The alignment 2cgpA/T0096-2cgpA-vit.pw gets only a little piece of the DNA binding site, and so is probably not useful. The local alignment 2cgpA/T0096-2cgpA-local.pw is also too small to be very useful. For the 1bi0 alignments, the domain break seems to occur at about FAELDYNIFRGLAFAS, so the first domain is 1-140 (or 155), and the second domain is (140 or) 155-239. The domain split done for CAFASP is in the region 111-128 (IFIRTAFRQHPDKAQEVL), which is reasonably compatible with the 1bi0 alignment. The T99 search for the second domain (stored on the CAFASP page) gets few hits, but one comes up in both directions: 1b78A a pyrophosphatase. This has a weak similarity (Z score 2.5) to 1bia in FSSP. The alignment 1b78A/1b78A-T0096-fssp-global.pw, though not scoring all that well, doesn't look too bad, but our predicted "domain boundary" occurs in the middle of a beta sheet, so this seems a bit unlikely. Perhaps we need to do a second-domain-only set of alignments. Mon Jun 19 13:46:24 PDT 2000 Kevin Karplus Made a subdirectory c-term with just the C-terminal domain, and ran the standard Makefile on it. The target model found some weak hits: chain E-value FSSP SCOP 1d7eA 9.1 3pyp [23]pyp 9.8 3pyp 4.91.2 [23]phy 9.8 3pyp 4.91.2 2pyr 9.8 3pyp 4.91.2 1elrA 23. 1a17 2mjp[AB] 26. 1b78A 3.46.3 1b78[AB] 26. 1b78A 3.46.3 The template models found only one hit 1b78A 14. 1b78A 3.46.3 The cterm/3pyp/T96C-3pyp-global alignment looks pretty good. The alignment probably starts at about FAELDY..., compatible with the previous domain-boundary guess. Somewhat surprisingly, the global aligmment gets a much better simple-model score than the local alignment. I don't know why this should happen---it seems unlikely to be correct behavior. The local alignment cterm/3pyp/T96C-3pyp-local (and the Viterbi alignment 3pyp/T96C-3pyp-vit) are just the main gapless fragment of the global alignment. The 3pyp/3pyp-T96C-fssp-global alignment agrees in a couple of major fragments with the T96C-3pyp-global, but is missing two strands of the beta sheet, so is less convincing. The 1b78A/T96C-1b78A-global.pw alignment is not very convincing, having large chunks of the beta sheet missing, and not being very compact. The local alignment is better, but has only one strand of a 3-strand sheet. The alignment 1b78A/1b78A-T96C-fssp-global is fairly compact, but is missing a long beta strand, making an uncloseable gap. Mon Jun 26 09:52:19 PDT 2000 Remade 2ry predictions 28 June 2000, Kevin Karplus Looked up CAFASP summary today. Fairly strong consensus on SCOP family 1.4.4: 24 1.4.4 9 1.74.1 8 4.14.1 8 3.3.1 8 2.32.1 6 2.77.4 6 1.37.1 6 1.114.1 Our top hits (1lea, 1bia, 2cgpA) are all to this family, so we are in good company here. The first domain is known to have "weak homology" to HTH DNA binding domains, and the second domain has no DALI-similar known structure, so we should probably only predict the first domain. Sat Aug 26 15:42:59 PDT 2000 Remade 2track predictions Moderate hits % Sequence ID Length Simple Reverse E-value SCOP 1bi0 226 -35.32 -18.01 5.2e-05 1.4.4.20.1,1.74.1.1.1,2.32.1.2.1 2dtr 226 -31.48 -15.09 1.0e-03 1.4.4.20.1,1.74.1.1.1,2.32.1.2.1 1lea 84 -29.00 -13.77 7.7e-03 1.4.4.2.1 1qbjA 81 -26.14 -13.00 7.7e-03 1.4.4.16.1 1bib 321 -26.61 -10.86 1.6e-01 1.4.4.1.1,2.32.1.1.1,4.87.1.2.1 1bia 321 -26.61 -10.02 1.6e-01 1.4.4.1.1,2.32.1.1.1,4.87.1.2.1 All the top hits are 1.4.4 superfamily Fri Sep 1 12:16:27 PDT 2000 Kevin Karplus Top alignment scores: 1lea/T0096-1lea-2track-global 1lea 72 -15.89 -23.03 3.1e-10 1bia/T0096-1bia-global 1bia 292 -117.76 -22.24 6.6e-10 1qbjA/T0096-1qbjA-2track-global 1qbjA 81 -13.08 -22.67 8.4e-10 1bi0/T0096-1bi0-2track-global 1bi0 226 -4.17 -21.74 2.3e-09 2dtr/T0096-2dtr-2track-global 2dtr 226 0.04 -19.71 1.7e-08 1bi0/T0096-1bi0-2track-local 1bi0 226 -35.32 -18.01 4.6e-08 2dtr/T0096-2dtr-2track-local 2dtr 226 -31.48 -15.09 9.2e-07 1lea/T0096-1lea-global 1lea 72 -39.34 -14.88 1.0e-06 The 1lea/T0096-1lea-2track-global and 1lea/T0096-1lea-global alignments are identical, with 14 identical residues. There is one insertion and 2 deletions, all in reasonable places. Approx 70 residues are aligned. The 1bia/T0096-1bia-global aligned the PDB seqres and atom sequences differently. After fixing the discrepancy (in 1bia/T0096-1bia-global-fixed.a2m) the first domain has about 14 identical residues, but the insertions are not as well placed as in 1lea. It continues with a long prediction in the next domain of 1bia, but this is unlikely to be correct given the "novel fold" claim. The 1qbjA/T0096-1qbjA-2track-global alignment has only 1 insertion, with 15 identical residues. I unaligned the residues not present in the PDB file and move the insert over one residue to make 1qbjA/T0096-1qbjA-karplus.a2m. I can sacrifice one identical residue to get the insertion in a better place (away from the DNA) 1qbjA/T0096-1qbjA-karplus2.a2m, which is currently my favorite prediction. 1bi0/T0096-1bi0-2track-global needs to have the two PDB sequences realigned. After fixing there are 9 identical residues and one insertion in the first domain. The insertion is in the same place as for 1qbjA/T0096-1qbjA-karplus2.a2m, but is 6 residues instead of 3. 2dtr/T0096-2dtr-2track-global seems identical to 1bi0/T0096-1bi0-2track-global. Let's go with 1qbjA/T0096-1qbjA-karplus2.a2m. Tue Sep 5 10:35:28 PDT 2000 remaking 2track Mon Nov 27 13:26:15 PST 2000 Superfamily 1.4.4 is correct. Don't know yet about alignment.