Wed Jul 26 12:53:07 PDT 2000 Kevin Karplus T0119 Benzoate dioxygenase reductase, Acinetobacter sp. known homolog. wu-blast finds excellent match to 1cqx[AB], very good matches to 1pfd, 1fxa[AB], 1qog[AB], 1qo[de], ... Here are the top BLAST hits, picking just the first example of each FSSP rep ID FSSP SCOP 1cqxA 1cqxA 1.1.1.1.48, 2.41.1.3.1, 3.18.1.4.1 1pfd 1awd 4.13.6.1,1 1fdr 1fdr 2.41.1.1.4, 3.18.1.1.4 1qfjA 1qfjA 2.41.1.1.6, 3.18.1.1.6 2pia 2pia 2.41.1.2.1, 3.18.1.2.1, 4.13.6.2.4 2cnd 1ndh 2.41.1.1.7, 3.18.1.1.7 1qg0A 1fnc 2.41.1.1.2, 3.18.1.1.2 1cqxA and 1awd are not structurally similar, so these may represent different domains. There seem to be 2.41.1 and 3.18.1 domains present, but we need to pick the right ones carefully. double-blast puts 1qfj[ABCD 1.1e-29 1qfjA 2.41.1.1.6, 3.18.1.1.6 1cqx[AB] 1.2e-21 1cqxA 1.1.1.1.48, 2.41.1.3.1, 3.18.1.4.1 2pia 2.6e-15 2pia 2.41.1.2.1, 3.18.1.2.1, 4.13.6.2.4 2cnd 1.3e-14 1ndh 2.41.1.1.7, 3.18.1.1.7 1ndh 1.3e-14 1ndh 2.41.1.1.8, 3.18.1.1.8 1cn[ef] 1.3e-14 1ndh 2.41.1.1.7, 3.18.1.1.7 1frr[AB] 4.4e-13 1awd 4.13.6.1.7 The BLAST alignments clearly break into two regions: The first region is up to about residue 100, and the second region starts around 120-150. The first region matches ferredoxins (1pfd best), and the second region matches the 2nd and 3rd domains of flavohemoprotein (1cqxA). 2pia has both domains, so may make a good template, even though the individual matches are not as great as some of the other templates. Unfortunately, 2pia has the ferredoxin after the other two domains, not before, so it may not be an appropriate template. There are so many PDB sequences in the t2k alignment, that we may need to use an earlier iteration to reduce noise. The 1cqxA/1cqxA-T0119-global alignment, though scoring excellently grabs too much of the first domain. The 1cqxA/1cqxA-T0119-vit and -local alignments have the same problem. The 1cqxA/T0119-1cqxA-local alignment does a good job of selecting the domain. and has gaps in reasonable places, but 1cqxA/T0119-1cqxA-global again tries to cover the first domain. The 1awd/1awd-T0119-fssp-global is to the wrong part of the sequence, so we probably need to do a domain split to get good alignments. The T2k alignment seems to indicate fairly strongly that the first domain ends at residue 100, with the second domain starting somewhere between 95 and 125. Sat Aug 26 15:15:43 PDT 2000 Kevin Karplus Remade 2track predictions Good top hits: % Sequence ID Length Simple Reverse E-value SCOP 1cqxA 403 -158.69 -130.38 1.2e-53 1.1.1.1.48,.41.1.3.1,3.18.1.4.1 1qfjA 232 -145.76 -117.30 5.3e-48 2.41.1.1.6,3.18.1.1.6 1ndh 272 -134.50 -106.70 3.2e-43 2.41.1.1.8,3.18.1.1.8 1qg0A 308 -130.74 -106.15 3.2e-43 2.41.1.1.2,3.18.1.1.2 1fdr 248 -129.31 -105.23 8.6e-43 2.41.1.1.4,3.18.1.1.4 1que 303 -127.16 -104.93 2.3e-42 2.41.1.1.3,3.18.1.1.3 1fnc 314 -127.46 -104.32 2.3e-42 2.41.1.1.1,3.18.1.1.1 1fnd 314 -126.13 -103.85 6.3e-42 2.41.1.1.1,3.18.1.1.1 1cne 270 -120.23 -100.55 1.3e-40 2.41.1.1.7,3.18.1.1.7 2pia 321 -123.87 -99.33 3.5e-40 2.41.1.2.1,3.18.1.2.1 1amoA 615 -110.65 -79.92 1.7e-31 2.41.1.4.1,3.16.4.2.1,3.18.1.3.1 1awd 94 -68.15 -52.99 8.9e-20 4.13.6.1.6 1qt9A 98 -65.93 -52.95 8.9e-20 4.13.6.1.1 1czpA 98 -65.84 -50.87 6.6e-19 4.13.6.1.1 Nothing really new here--we still need to split the sequence into two domains and search separately with them. 27 Aug 2000 Rachel Karchin information about Acinetobacter benzoate dioxygenase reducatase encoded by benC (BENC-ACICA which is given as homolog of t119) Known to have an N-terminal region resembling chloroplast-type ferredoxins and a C-terminal region resembling several oxidoreductases. Also has regions similar to certain monooxygenase components. Source: J. Bacteriol 1991 Sep, 173:17 5385-95 www.cfsn.com/hist1.html Also look at: http://metallo.scripps.edu/PROMISE/ARHD.html for general structural information about aromatic ring dioxygenases. Those of the hydroxylase class are either (alpha-beta)n or (alpha)n oligomers. This page also contains some detailed information about the hydroxylase component of naphthalene 1,2-dioxygenase (NDO) from Pseudomonas which might be helpful. It cites a May 1998 article from "Structure" (Kauppi et. al Vol. 6, No. 5 pp. 571-586) which describes the 3D structure of NDO in detail. Abstract can be found at: http://journals.bmn.com/journals/list/browse?uid=JSTR.st6503&rendertype=abstract The ARHD web page also contains this info: BENC_ACICA has a ferredoxin domain which I have bracketed in t119 below: MSNHQVALQF EDGVTRFICI AQGETLSDA [A YRQQINIPMD CREGECGTCR AFCESGNYDM PEDNYIEDAL TPEEAQQGYV LACQCRPTSD AVFQIQASSE] VCKTKIHHFE GTLARVENLS DSTITFDIQL DDGQPDIHFL AGQYVNVTLP GTTETRSYSF SSQPGNRLTG FVVRNVPQGK MSEYLSVQAK AGDKMSFTGP FGSFYLRDVK RPVLMLAGGT GIAPFLSMLQ VLEQKGSEHP VRLVFGVTQD CDLVALEQLD ALQQKLPWFE YRTVVAHAES QHERKGYVTG HIEYDWLNGG EVDVYLCGPV PMVEAVRSWL DTQGIQPANF LFEKFSAN and a ferredoxin-reductase domain: MSNHQVALQF EDGVTRFICI AQGETLSDAA YRQQINIPMD CREGECGTCR AFCESGNYDM PEDNYIEDAL TPEEAQQGYV LACQCRPTSD AVFQIQASSE [VCKTKIHHFE GTLARVENLS DSTITFDIQL DDGQPDIHFL AGQYVNVTLP GTTETRSYSF SSQPGNRLTG FVVRNVPQGK MSEYLSVQAK AGDKMSFTGP FGSFYLRDVK RPVLMLAGGT GIAPFLSMLQ VLEQKGSEHP VRLVFGVTQD CDLVALEQLD ALQQKLPWFE YRTVVAHAES QHERKGYVTG HIEYDWLNGG EVDVYLCGPV PMVEAVRSWL DTQGIQPANF LFEKFSAN] Tue Aug 29 11:00:20 PDT 2000 Kevin Karplus The citation Rachel found puts the domain split at 100, consistent with what we were expecting. Let's split there and do two searches. The naphthalene 1,2-dioxygenases in PDB are 1eg9 and 1ndo (2.31.1, 4.15.3, 4.108.3) [Some are 1ndoA and some 1ndoB]. (Not real Built subdirectories t119-1-100 and t119-101-end t119-1-100 finds many hits with blast: 1pfd, (1fxa[AB], 1qog[AB], 1qoe, 1qod, 1qt9A, 1czp[AB]), ... and double-blast: (1czp[AB], 1fxa[AB], 1qt9A, 1qog[AB]), ... Top-scoring PDB sequences in T2k alignment are 1qog[AB], (1czp[AB], 1qt9A, 1fxa[AB]), T2k target model finds (2cjn, 2cjo, 1roe), (1czp[AB], 1fxa[AB], 1qt9A), 1qog[AB], (1dox, 1doy), ... [All are ferredoxins with about 70% identity to FSSP rep 1awd.] SAM-T99 puts 1roe, 1qogA, 1dox, 1qofA, 4fxc, ... We have template models for 1czpA, 1frd, 1awd, 1qt9A 1czpA -66.1 1.71434900491769e-25 -75.77 2.3e-29 -56.43 4.2e-21 1frd -62.615 5.59262254537874e-24 -69.74 9.4e-27 -55.49 1.1e-20 1awd -61.89 1.15472616998171e-23 -70.76 3.5e-27 -53.02 8.4e-20 1qt9A -57.04 1.47505171118876e-21 -- -- -57.04 1.5e-21 Top 2-track: 1czpA, 1awd, 1qt9A, 1frd TOP-SCORING ALIGNMENT 1czpA/T0119-1-100-1czpA-global looks fine. When target alignment is limited to a family containing the target (T0119-1-100-family), the top-scoring sequences are (1roe, 2cjn, 2cjo), 1qog[AB], (4fxc, 1qof[AB],1qt9A, 1czp[AB]. 1fxa[AB]) And the top-scoring 2-track hits are 1awd, 1qt9A, 1czpA The 1czpA/T0119-1-100-1czpA-global alignment looks better than the best "family alignment" 1awd/T0119-1-100-family-1awd-T0119-1-100-global. I think that the T0119-1-100.t2k.2d secondary structure is the one to use. t119-101-end finds many hits with blast: 1cqx[AB], 1fdr, 1qfj[ABCD], 2pia, ... and double-blast: 1qfj[ABCD], 1cqx[AB], ... The top-scoring PDB sequences in the T2K alignment are 1ndh, 1cne, (2cnd, 1cnf), 1qgo[AB], 1qga[AB] Top-scoring with target model: (1cnf,2cnd), 1ndh, 1cne, 1cqx[AB], 1qfj[ABCD], ... Top-scoring with template models: 1cqxA, 1fdr, 1ndh, 1cne, 1que, 1fnd, 1qg0A, 1fnc, 2pia, 1qfjA, 1amoA Top-scoring with bidirectional score 1ndh, 1cqxA, 1cne, 1qfjA, 1qg0A, 1que, 1fdr, 2pia Top-scoring 2track: [NOT FINISHED YET] SCOP has these as 2-domain (or in some cases 3-domain) proteins: 2.41.1 and 3.18.1 They are in different SCOP families. I think we want to concentrate on the reductases in 2.41.1.1 and 3.18.1.1---these are 1fn[bcd], 1bx[01], 1fr[nq], 1qfz[AB], 1qfy[AB], 1qga[AB], 1qg0[AB], 1b2r, 1bjk, 1qu[ef], 1fdr, 1a8p, 1qfj[ABCD], 2cnd, 1cn[ef], 1ndh Of these, our top models are 1ndh, 1cne, 1qfjA, 1qg0A, 1que, 1fdr. When target alignment is limited to a family containing the target (T0119-101-end-family), the top-scoring sequences are 1qfj[ABCD], 1cqx[AB], (1cnf, 2cnd), 1ndh, 1cne and the top 2-track hits are 1qfjA, 1cqxA, 1fdr, 2pia Tue Sep 5 10:35:23 PDT 2000 remaking 2track Wed Sep 6 12:10:31 PDT 2000 Kevin Karplus Top-scoring alignments for T0119-101-end-family: 1qfjA/T0119-101-end-family-1qfjA-T0119-101-end-global 1qfjA 232 -119.72 -133.76 5.2e-58 This one looks very good---I made minor edits to get 1qfjA/T0119-101-end-1qfjA-karplus.a2m 1qfjA/T0119-101-end-family-1qfjA-T0119-101-end-local 1qfjA 232 -121.43 -111.09 1.9e-48 1ndh/T0119-101-end-family-1ndh-T0119-101-end-global 1ndh 272 -80.64 -98.25 8.2e-43 1cne/T0119-101-end-family-1cne-T0119-101-end-global 1cne 270 -76.10 -95.41 1.7e-41 1cqxA/T0119-101-end-family-1cqxA-T0119-101-end-local 1cqxA 403 -105.59 -95.06 1.7e-41 1cqxA/T0119-101-end-family-1cqxA-T0119-101-end-global 1cqxA 403 -42.64 -87.68 4.9e-38 1ndh/T0119-101-end-family-1ndh-T0119-101-end-local 1ndh 272 -88.51 -76.74 3.0e-33 1cne/T0119-101-end-family-1cne-T0119-101-end-local 1cne 270 -83.83 -73.48 5.9e-32 For the second domain, let's go with 1qfjA/T0119-101-end-1qfjA-karplus.a2m Alignment actually starts at 107. OK Let's submit 1czpA/T0119-1-100-1czpA-global.pw.a2m.gz 1qfjA/T0119-101-end-1qfjA-karplus.a2m and paste together T0119-1-100.t2k.2d T0119-101-end.t2k.2d