12 May 2000 Kevin Karplus chorismate-pyruvate lyase One sequence also named 4-hydroxybenzoate synthetase. This one looks hard--no good matches. The top-scoring target match: 1sesA Seryl-trna synthetase (serine-trna ligase) The portion of 1sesA that matches is a coiled-coil with a turn, and does not match at all the secondary structure prediction. The top-scoring template hits are 1tcoA + 1auiA (1auiA is FSSP rep) 1bxn[EGAC] (1burA is FSSP rep) Note: 1bxn is a lyase, but is a homo-4mer, while T86 is a monomer. Is this functional relationship relevant? 1tcoA and 1auiA are serine-threonine phosphatase 2B (calcineurin). 15 May 2000 Kevin Karplus Saira sent mail suggesting reading "Xue & Lipscomb PNAS 92:10595 (1995) in terms of the types of chorismate binding residues one might expect to be conserved." Full message in "mail". She also sent a PSI-BLAST alignment, which is a little different from the T2K alignment. Both include AF187880_4 (an unkown protein from pseudomonas) with similar alignments, and both have a third sequence, but the third sequence is different in the two multiple alignments. T2K has a Neisseria meningitidis hypothetical protein, while PSI-BLAST has T36851, a possible transcription regulator from Streptomyces coelicolor. I have saved the PSI-BLAST alignment as T0086.psi-blast.a2m.gz (see T0086.psi-blast.pa for pretty-aligned version). The two different multiple alignments give VERY different secondary structure predictions! (possibly in part due to the gap in the psi-blast alignment) sequence:SHPALTQLRALRYCKEIPALDPQLLDWLLLEDSMTKRFEQQGKTVSVTMIREGFVEQNEIPEELPLLPKESRYWLREILLCADGEPWLAGRTVVPVSTLSGPELALQKLGKTPLGRYLFTSSTLTRDFIEIGRDAGLWGRRSRLRLSGKPLLLTELFLPASPLY PSI-BLAST:CCCHHHHHHHHCCCCCCCCCCHHHHHHHHCHHHHHHHHCCCCCEEEEEEECCCC---------CCCCCHCCCCCEEEEEEECCCCCCCCCCCEEECECCCCCHHHHHHHCCCCCCCEECCCCCCCCCEEEEECCCCCCCEEEEECCCCCCEEEHHHCCCCCCCC consensus:CCC????????CCCCCCCCCCHHHHHHHH????HHHHH??CCC?EEEEEE?CCC??????????CCCC?CCC??????????CCC???????EEEC?CCCC?HHHHHHHCCCCCCC??CCCCCCCCCEEEE?CCCCC??EE?E?CCCCC?E??HHHCCCCCCCC T2K.2d:CCCCCCCCCCCCCCCCCCCCCHHHHHHHHHCCCHHHHHHHCCCCEEEEEEECCCCCCCCCHHHHCCCCCCCCHHHHHHHHHHCCCEEEEEEEEEECCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCEEEECCCCCCEEEEHECCCCCCHEHHHHHCCCCCCCC Question: should we make a new multiple alignment, forcing T36851 in with the T2K sequences, and do a new search starting with it? 17 May 2000 Kevin Karplus I looked up "chorismate lyase" in PUBMED and got the following recent article: Crystallization and 1.1- Diffraction of Chorismate Lyase from Escherichia coli Carrie Stover*, Martin P. Mayhew*, Marcia J. Holden*, Andrew Howard, D. T. Gallagher*, J Struct Biol 2000 Feb;129(1):96-9 I suspect that this is the crystal we are trying to predict for. The authors claim that chorismate lyase is homologous to chorismate mutase (2cht and 1com). The fssp representative for these is 2chsA. Some puzzles include: chorismate mutase is a trimer with the active site on the interface, while chorismate lyase is a monomer. The paper that Saira recommended gives the active sites of chorismate mutase. Of the weak hits we found before, the only one similar to 2chsA, according to FSSP, is a target-model score: NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 63: 2chsA 1cg2A 2.5 3.3 62 389 12 0 0 7 S carboxypeptidase g2 biological_unit which is only weakly similar to 2chsA. The best-scoring of the forced alignments so far is 1auiA/T0086-1auiA-global.pw.dist:1auiA 521 -145.26 -11.87 1.4e-05 which actually looks pretty good. I can't look at it with see-a2m yet, since the difference between the SEQRES and the ATOM records in PDB confuses the sae--rasmol linkage, but there are 30 conserved residues. The best-scoring 2chsA alignment: 2chsA/2chsA-T0086-fssp-global.pw.dist:T0086 164 -54.03 -3.33 6.9e-02 has 23 conserved residues, which do cluster nicely. One gap in the helix is easily closed by shifting part of the alignment. Two of the chunks of unaligned residues are predicted to be beta sheets, and are on adjacent loop regions in 2chsA. This prediction looks overall pretty good. There is another chorismate mutase structure (5csmA), from yeast (YCM) rather than Bacillus subtilis (BCM) or E coli (ECM). It has a quite different structure (all helical, rather than alpha-beta), but we might as well check it for similarity also. The best-scoring 5csmA alignment 5csmA/5csmA-T0086-fssp-global.pw.dist:T0086 164 -72.43 -8.25 5.2e-04 has around 39 conserved residues, and doesn't look too bad. There are some gaps in mid-helix, but this may just mean that the helices are shorter (since similar gaps appear in adjacent helices). Of the weak hits we found, the ones similar to 5csmA are all template-model matches: NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 12: 5csmA 1sesA 3.3 4.8 85 421 10 0 0 7 S Seryl-trna synthetase (serine-trna ligase) complexed wi 24: 5csmA 1cpt 2.8 3.9 118 412 8 0 0 14 S Cytochrome p450-terp The paper that Saira recommended compares the two different chorismate mutase structures. They have the same active-site cavities (within 1.1 Angstrom for 94 residues) but the scaffolding that supports the active site is quite different. The yeast one (5csmA) looks more promising as a template, since it does not rely on an adjacent monomer to produce the active site. Unfortunately, our alignment only conserves one of the 4 active-site residues that Xue and Lipscomb identify as conserved between ECM and YCM. Question: should we try making a multiple alignment of 5csmA+1sesA+1cpt from the fssp file, and use it as a seed for a t99 or t2k HMM to try to align T0086 to each of these templates? Oops, there is a third chorismate mutase: 1ecmA---this is the one that Xue and Lipscomb compare to 5csmA. It's best alignment is not such a great score, perhaps because so much of the signal is ampipathic helix: 1ecmA/T0086-1ecmA-global.pw.dist:1ecmA 109 -42.09 0.07 1.0e+00 The structure we found that is somewhat similar to 1ecmA is 14: 1ecmA 1sesA 4.3 5.7 55 421 15 0 0 4 S Seryl-trna synthetase (serine-trna ligase) complexed wi I think it would be valuable to build an active-site model of chorismate mutase by running T2K on a structural alignment of 1ecmA and 5csmA. This alignment may have to be done with the Yale aligner, not DALI or VAST, since the active sites are alignable, but the scaffolding that supports them has different secondary structure elements. It should be verified with the active-site alignment given in Xue and Lipscomb's paper. I'll extract the alignment from 1ecmA.fssp.a2m also, for comparison. We might also want to do structural alignments of the 2chs PDB files with 1ecm and 5csm, to see if there is a similarity there that DALI misses. The FSSP alignment of 1ecmA to 5csmA has the three helix matches in the Xue and Lipscomb article, but not the match to the helix in 1ecmB (the other monomer in 1ecm). 18 May 2000 Kevin Karplus Saira sent mail suggesting an archeal sequence that may be a structural homolog (B69085). Full message in "mail". 19 May 2000 Kevin Karplus I tried various ways of "improving" the 5ecmA+1ecm alignment, by modifying the multiple alignment, but I still like the 5ecm-T0086-fssp-global alignment best. I have not yet played with Saira's multiple alignment. Tue Jun 6 13:35:29 PDT 2000 Redid 2ry prediction with new neural net. 8 June 2000 Kevin Karplus Looked at summary of CAFASP servers. People seem to be all over the map on folds for this protein, with no one claiming a significant hit. The most popular superfamily has only 4 hits, and most superfamilies have only one. Most popular hits: # superfamily 4 4.79.2 Homing endonucleases 4 2.47.1 Acid proteases 3 1.101.1 Trp repressor The two chorismate mutases are 001.121.001 (1ecmA) 004.064.001.002 (2chs*, 1com*, 2cht*) None of the servers had either of these in the top five. Mon Jun 26 09:48:51 PDT 2000 Remade 2ry predictions Mon Jun 26 09:52:02 PDT 2000 Remade 2ry predictions Tue Jul 11 16:45:05 PDT 2000 Christian I investigated one of the top template hits, 1bxnA. It has two domains: a TIM-barrel and an alpha/beta domain. T0086 is trying to align to the TIM-barrel, but all of the alignments are bad. 1burA-T0086-vit is the only interesting 1burA alignment. There is a good 20 residue motif match. That's about it. Mon Jul 17 Christian I investigated the following structures that are in the (extended) structural neighborhood of 2chsA and 1sesA. 1qu9A 1cliA 1ctt 1qf6A 1atiA 1adjA 1pysA 1c0aA 1lylA 12asA None look promising. Right now, I think one of the alignments like 5csmA is our best bet. Tue Jul 18 13:30:03 PDT 2000 Kevin Karplus Using the untested, uncalibrated 2-track HMM (coeff 0.5,0.5) the top-scoring chains in fssp-in-scop are % Sequence ID Length Simple Reverse E-value SCOP 1bouB 302 -19.51 -8.35 4.7e-01 3.51.5 1xvaA 293 -22.45 -8.30 4.9e-01 3.61.1 1tig 94 -22.60 -7.95 7.0e-01 4.55.1 1tyfA 193 -20.24 -7.23 1.4e+00 3.11.1 1a4mA 349 -19.73 -7.08 1.7e+00 3.1.8 1gfs 321 -17.40 -7.00 1.8e+00 3.2.1 Note: none of these were top-scoring before, though template 1cg2A (in 3.51.4) is at least two-way match at the fold level. Looking at the 1cg2A and 1bouB alignments (not using 2-track to create them) 1cg2A/1cg2A-T0086-local spreads into two domains, missing a strand of the first domain and getting only spooty coverage in the other. 1cg2A/1cg2A-T0086-vit is just two strands and a helix, but seems like a reasonable fragment. 1bouB/T0086-1bouB-vit contains short helices, but does not seem to be big enough to be of much use, having an unmatched beta strand sticking out at the end. 1bouB/T0086-1bouB-local seems to be the same. 1bouB/1bouB-T0086-vit is just a tiny helical fragment. The fragmentary matches may be useful for mini-threading, but don't seem to add up to a full fold. Thu 20 July 2000 Kevin Karplus Rachel reports new fold prediction in CAFASP: 4.48 (including 1npk)