Kevin Karplus 7 May 1998 T44 seems to be SW:RTCA_ECOLI. Target44 is claimed to have no homologs with high sequence identity, but to have high structural similarity to some (unspecified) protein, and that the PhD prediction of secondary structure is fairly accurate. Based on the moderately high scores for the top-scoring chain obtained in scoring in both directions the target98 models (-8.630 and -4.210, summing to -12.840), I suspect that the protein of interest is 1eps, 5-enol-pyruvyl-3-phosphate synthase, which has no structural similarity to other PDB files (according to fssp). [[Correction 9 May 1998---that was with the 1997 fssp database. In the latest database 1uae udp-n-acetylglucosamine enolpyruvyl transferase (mura) has Z score 24.0, but only 19% residue ID. We need to make an alignment of 1eps and 1uae to use as a template, even though 1uae did not score that well as a sequence against t44.t98_6.]] No other chain scores well in both directions, so the second-place chain is much worse. Question: is 5-enol-pyruvyl-3-phosphate synthase functionally related to RNA-3'terminal phosphate cyclase? Both seem to involve enzymatic reactions on phosphates. To do: Fetch swissprot entry for t44 (accession U18997). Get the EC numbers for both t44 and 1eps (perhaps from swissprot) Do secondary structure prediction for t44 using PhD. Do secondary structure prediction for t44 with own tools Build joint model of t44 and 1eps and check secondary structure prediction vs. template. Kevin Karplus 7 May 1998 The first attempt at building a joint alignment (t44-1eps.pw.a2m) was not very successful---only 42 residues were aligned out of 347. These are probably the key positions that make 1eps be recognized as a pssible remote homolog, but are not a sufficiently large alignment to be worth submitting. I'll try again starting with the 1eps model, but I don't expect much better from that. We'll probably have to go to global alignment (or, more subtly, set FIMSTRENGTH to a value that makes FIMs a little less greedy). Going to FIMSTRENGTH=-0.95 isn't enough. I noticed that 1eps and t44 are different enough that the build-joint98 script doesn't get both sets of homologs into the training set until the third (and final) iteration. I may need to change the thresholds for build-joint98, from -18,-12,-6 to -9,-5,-1. This probably means adding a new "thresholds" parameter to build-joint98. Kevin Karplus 8 May 1998 Got the PhD predictions for t44. The single-sequence prediction looks poor (with a very long helix and an EEEHHHHHE prediction), but the multiple-sequence prediction looks reasonable, and is probably the prediction that crystallographers thought was good. The EC classification for 1eps is E.C.2.5.1.19 3-Phosphoshikimate 1-carboxyvinyltransferase. (CATH gives it as 2.5.1.9 --- that may just be a typo) Reaction: Phosphoenolpyruvate + 3-phosphoshikimate = phosphate + 5-O- (1-carboxyvinyl)-3-phosphoshikimate. Other name(s): 5-Enolpyruvylshikimate-3-phosphate synthase. Epsp synthase. 3-ENOL-Pyruvoylshikimate-5-phosphate synthase. 1EPS is in SCOP family 1.Root: scop 2.Class: Alpha and beta (a+b) Mainly antiparallel beta sheets (segregated alpha and beta regions) 3.Fold: IF3-like beta-alpha-beta-alpha-beta(2); 2 layers; mixed sheet 1243, strand 4 is antiparallel to the rest 4.Superfamily: Enolpyruvate transferase duplication: 6 repeats of this fold are organised in two similar domains 5.Family: Enolpyruvate transferase 6.Protein: 5-enol-pyruvyl shikimate-3-phosphate(EPSP) synthase 7.Species: Escherichia coli In the same family are 1uae and 1naw. There are no others in the same superfamily. In the same fold class are 1ife and and 1tig. NOTE: can't get secondary structure from DSSP, since 1eps is alpha carbons only. May have to fake the DSSP data somewhat to compare 2ndry structure info. -------------------------------------------------- T44 is RTCA_ECOLI RNA 3'-TERMINAL PHOSPHATE CYCLASE (EC 6.5.1.4) (RNA-3'-PHOSPHATE CYCLASE) (RNA CYCLASE). ENZYME: EC 6.5.1.4 Official Name: RNA-3'-PHOSPHATE CYCLASE. Alternative Name(s): RNA 3'-TERMINAL PHOSPHATE CYCLASE. RNA CYCLASE. Reaction catalysed: ATP + RNA 3'-TERMINAL-PHOSPHATE <=> AMP + DIPHOSPHATE + RNA TERMINAL-2',3'-CYCLIC-PHOSPHATE There is one PDB entry for EC 6.5.1.* in http://www.biochem.ucl.ac.uk/bsm/enzymes 1a0i E.C.6.5.1.1 Dna ligase (ATP). Reaction: Atp + (deoxyribonucleotide)(N) + (deoxyribonucleotide)(M) = amp + pyrophosphate + (deoxyribonucleotide)(N+M). Other name(s): Polydeoxyribonucleotide synthase (ATP). Polynucleotide ligase. Sealase. Dna repair enzyme. Dna joinase. Associated disease/disorder: Bloom disease Comments: Catalyses the formation of a phosphodiester at the site of a single- strand break in duplex dna. Rna can also act as substrate to some extent. There is 1 PDB entry in enzyme class E.C.6.5.1.1 1a0i Structure: 1. Dna ligase. Source: 1. Bacteriophage t7. Gene: lig. Expression_system_vector: pet22 We should probably look for alignments to 1a0i as well as to 1eps. Hmm---no need, 1a01 is all helical, and the PhD prediction has definite strands: >T0044 LLLEEEEELLLLLLLHHHHHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHHHH HHHLLEEEEELLLLEEEEELLLEELLLLEEEELLLLLLEEEEEELLLLLEELLLLLLEEE EELLLLLLLLLLHHHHHHHHHHHHHHLLLLLEELLLLLLLLLLLLLEEEEEELLLLLLLL LLLLLLLLEEEEEEEEEEELLLLLLLHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLEEEE ELLLLLLELEEELLLLELLHHHHHHHHHHHHHHHHHHLLLLLHHHHLLLEEEEELLLLLL LLLLLLLEEEELEEEEEEEEELLEEEEEEELLLLLLLLLLLELLLLL -------------------------------------------------- From karplus@cse.ucsc.edu Fri May 8 16:39:44 1998 Return-Path: karplus@cse.ucsc.edu Date: Fri, 8 May 1998 16:39:43 -0700 From: Kevin Karplus To: markd@cse.ucsc.edu CC: karplus@cse.ucsc.edu, cbarrett@cse.ucsc.edu Subject: DSSP needed for 1eps We need a DSSP file for 1eps, one of the alpha-only proteins, because it may be our prediction for t44. Could you create one? -------------------------------------------------- From compbio.casp-request Fri May 8 17:10:42 1998 Return-Path: karplus@cse.ucsc.edu Date: Fri, 8 May 1998 17:10:23 -0700 From: Kevin Karplus To: palm@crysv1.ncifcrf.gov Cc: compbio.casp@cse.ucsc.edu, casp3@sb7.llnl.gov Subject: RNA-3'terminal phosphate cyclase Thanks for submitting your protein to the CASP3 contest. I noticed one discrepancy that you might want to clear up with the organizers of the contest. You gave the accession number as U18997, but that has sequence 1 mlepllakig ihqqttllrh gfypagggvv atevspvasf ntlqlgergn ivqmrgevll 61 agvprhvaer eiatlagsfs lheqnihnlp rdqgpgntvs levesenite rffvvgekrv 121 saevvaaqlv kevkrylast aavgeyladq lvlpmalaga geftvahpsc hlltniavve 181 rflpvrfsli etdgvtrvsi e not MVKRMIALDG AQGEGGGQIL RSALSLSMIT GQPFTITSIR AGRAKPGLLR QHLTAVKAAT EICGATVEGA ELGSQRLLFR PGTVRGGDYR FAIGSAGSCT LVLQTVLPAL WFADGPSRVE VSGGTDNPSA PPADFIRRVL EPLLAKIGIH QQTTLLRHGF YPAGGGVVAT EVSPVASFNT LQLGERGNIV QMRGEVLLAG VPRHVAEREI ATLAGSFSLH EQNIHNLPRD QGPGNTVSLE VESENITERF FVVGEKRVSA EVVAAQLVKE VKRYLASTAA VGEYLADQLV LPMALAGAGE FTVAHPSCHL LTNIAVVERF LPVRFSLIET DGVTRVSIEG SHHHHHH I suspect that you meant Swissprot sequence RTCA_ECOLI, which is accession number P46849. Here is the beginning of the Swissprot entry: LOCUS 2507354 339 aa 01-NOV-1997 DEFINITION RNA 3'-TERMINAL PHOSPHATE CYCLASE (RNA-3'-PHOSPHATE CYCLASE) (RNA CYCLASE). ACCESSION 2507354 PID g2507354 DBSOURCE SWISS-PROT: locus RTCA_ECOLI, accession P46849 class: standard. extra accessions:P46848,Q47349,created: Nov 1, 1995. Could you check to see whether the accession number was incorrectly posted on the CASP3 site? (Or worse, was the sequence posted incorrectly?) Thanks, Kevin Karplus -------------------------------------------------- From markd@Grizzly.COM Fri May 8 18:32:12 1998 Return-Path: markd@Grizzly.COM Date: Fri, 8 May 1998 18:34:21 -0700 (PDT) From: Mark Diekhans To: karplus@cse.ucsc.edu CC: cbarrett@cse.ucsc.edu In-reply-to: <199805082339.QAA11519@purr.cse.ucsc.edu> (message from Kevin Karplus on Fri, 8 May 1998 16:39:43 -0700) Subject: Re: DSSP needed for 1eps References: <199805082339.QAA11519@purr.cse.ucsc.edu> >We need a DSSP file for 1eps, one of the alpha-only proteins, because >it may be our prediction for t44. Could you create one? I will see what I can do. From compbio-request Fri May 8 19:10:41 1998 Return-Path: markd@Grizzly.COM Date: Fri, 8 May 1998 19:12:44 -0700 (PDT) From: markd@cse.ucsc.edu To: karplus@cse.ucsc.edu CC: cbarrett@cse.ucsc.edu, compbio@cse.ucsc.edu In-reply-to: <199805082339.QAA11519@purr.cse.ucsc.edu> (message from Kevin Karplus on Fri, 8 May 1998 16:39:43 -0700) Subject: Re: DSSP needed for 1eps References: <199805082339.QAA11519@purr.cse.ucsc.edu> >We need a DSSP file for 1eps, one of the alpha-only proteins, because >it may be our prediction for t44. Its there, in the DSSP directory, along with the other 4 alpha-only PDBs in FSSP. Following up on Christian's MaxSprout suggestion, It turns out that EBI keeps precomputed PDB files from MaxSprout for all PDB alpha-only entries at: http://www2.ebi.ac.uk/dali/maxsprout/MODEL.html From PALM@CRYSV1.NCIFCRF.GOV Fri May 8 20:13:00 1998 Return-Path: PALM@CRYSV1.NCIFCRF.GOV From: PALM@CRYSV1.NCIFCRF.GOV Date: Fri, 8 May 1998 23:12:03 -0400 (EDT) To: karplus@cse.ucsc.edu Subject: P46849 is correct Dear Dr. Karplus Thanks for the notice about the error in the CASP entry. I took the accession number U18997 from a cyclase comparison paper ( Genschik et al., EMBO J., 1997, 16(10), 2955-2967), but didn't check it. This contains the N-terminal and C-terminal part separately as two ORFs ( because there first was a sequencing error). Swissprot (P46849) now contains the full correct ORF (ca. 340 aa). I will notice CASP. Gottfried Palm -------------------------------------------------- Kevin Karplus 9 May 1998 I noticed that I had checked the old, not the new version of FSSP for structural homologs to 1eps. There IS one in fssp-24-4-98: 1uae. This has only a 19% sequence identity with 1eps and does not come up in the target98 search. 1uae is udp-n-acetylglucosamine enolpyruvyl transferase, and has EC code 2.5.1.7 I built the a2m file from the fssp alignment of 1eps (couldn't for 1uae, because of a bug in the fssp2a2m script), and used it as a seed for a target98 alignment, then used that to build a model for aligning 1eps and t44. The fssp-t98 alignment disagrees with the fssp alignment in several ways--many of the differences are just small shifts in where insertions or deletions are done (something fssp is not that reliable on), but there is also a large alignment of residues near the beginning that fssp regards as unaligned, and some major shifts in the aligned ones in that region. I suspect that this may be cause by flexing the hinge between the two domains---a tranformation that DALI can't handle very well (I haven't looked at 1uae yet, so this is just conjecture). I'll have to look at the two structures with the two alignments, and see which one I like better. From compbio.casp-request Sat May 9 14:42:22 1998 Return-Path: karplus@cse.ucsc.edu Date: Sat, 9 May 1998 14:42:20 -0700 From: Kevin Karplus To: mark@cse.ucsc.edu Cc: compbio.casp@cse.ucsc.edu Subject: constrained alignments Mark, I'd like you to produce constrained alignments for 1eps and 1uae using your best guess at a method---I think the alignments I got for t44 using 1eps-fssp-t98 as a template are better than the ones using 1eps alone. [Caveat: I have not looked at the alignment using any of the visualization tools yet, and it may be much worse than I think.] The structural alignment with 1uae gives me a much better idea where the variablity can occur. (Alex, here is an example where showing two molecules superimposed would be good---if we could superimpose 1uae and 1eps according to their alignment, then look at a putative alignment of t44 with them, it would be quite useful.) Kevin -------------------------------------------------- From markd@Grizzly.COM Sat May 9 17:11:08 1998 Return-Path: markd@Grizzly.COM Date: Sat, 9 May 1998 17:13:14 -0700 (PDT) From: Mark Diekhans To: karplus@cse.ucsc.edu In-reply-to: <199805092346.QAA15507@purr.cse.ucsc.edu> (message from Kevin Karplus on Sat, 9 May 1998 16:46:48 -0700) Subject: Re: more fssp2a2m bugs References: <199805092346.QAA15507@purr.cse.ucsc.edu> >The fixes to fssp2a2m are moderately important---I can wait a couple >of days, but not a couple of weeks. No problem >The 1eps alignment should only use 1eps and 1uae as constrained >sequences, then try to align their homologs to that constrained seed. >I did something similar, but without constraints to create the >fssp-t98 alignment that I think is giving a good alignment. Underway. -------------------------------------------------- Kevin Karplus 9 May 1998 The second-highest scorers (after 1eps) in t44.t98_*-varh50-pdb.rdb are all kinases represented by 1hcl in FSSP, though the 1hcl model did not score the t44 sequence well. I'll try building joint models of t44 with the kinases, though I don't expect much (the number of kinases will swamp out t44). The second highest scorer in t44-t98.rdb is 1rcd, a ferritin represented by 2fha in FSSP. This has a very weak structural relationship to the third-scorer 1ecmA (endo-oxabicyclic transition state analogue). None of these look very promising as templates. Note: the 1eps scorer in both directions (-12.84) is in the range where 54 more true positives got 10 more false positives for chothia/domains and 103 more true positives got 31.33 more false positives for the fssp test, so the probability of predicting "new fold" should be about 16-23% (even without the hint that the fold is in the database). I have two promising alignments: 1eps/1eps-t44-fssp-global.pw.a2m.gz 1eps/1eps-t44-const-global.pw.a2m.gz based on the 1eps fssp alignment file. I also need to create alignments for the 1uae file, but Mark Diekhans has been having some trouble parsing that file. -------------------------------------------------- From markd@Grizzly.COM Sat May 9 21:20:06 1998 Return-Path: markd@Grizzly.COM Date: Sat, 9 May 1998 21:22:11 -0700 (PDT) From: Mark Diekhans To: karplus@cse.ucsc.edu In-reply-to: <199805092346.QAA15507@purr.cse.ucsc.edu> (message from Kevin Karplus on Sat, 9 May 1998 16:46:48 -0700) Subject: Re: more fssp2a2m bugs References: <199805092346.QAA15507@purr.cse.ucsc.edu> >The 1eps alignment should only use 1eps and 1uae as constrained >sequences, then try to align their homologs to that constrained seed. >I did something similar, but without constraints to create the >fssp-t98 alignment that I think is giving a good alignment. A constrained target 98 alignment of FSSP 1eps is in: /projects/compbio3/usr/markd/constrain/casp3/models/1eps.constr-t98.a2m.gz Attempting to build one from FSSP 1uae showed a new format weirdness in the FSSP equivalence section, so I will have to deal with that first. Also, I am getting close on fssp2a2m. Since I suspect I will uncover even more FSSP weirdness when running on the full FSSP distribution, I would like to install it and run it myself. If if I could do that in the models 97 directory, the end result would be all of the fssp alignments being built. What is an easy way to do this without triggering a ton of other things from being built? Also can I replace fssp2a2m without triggering builds you are currently doing from running it. If this isn't straight forward, I can do it in a parallel directory and then we can just move them in place. Mark -------------------------------------------------- I looked at 1eps-t44-fssp-global.pw and 1eps-t44-const-global.pw with saee this morning. I like the 1eps-t44-fssp-global alignment---the pieces that are omitted tend to be coherent units, with the dangling ends not too far apart. I need to see what the alignment does with the secondary structure predictions, and I also need alignments to 1uae (and 1naw). NEWS---in the latest FSSP 3-5-98 there is another chain in this family: 1a2n. I'll want to make sure this gets included in the 1uae fssp alignment. I looked at the 1uae-t44-const-global alignment this afternoon--it looks ok, but there are a few somewhat awkward gaps to bridge. This target looks like an alignment challenge! ------------------------------------------------------------ 25 June 1998 Remade the alignments with the newest version of SAM, 1eps is still best with sum score -12.7, which is in the range domains 45 T 10F 18% probability false fssp 103T 31.33F 23% probability false wu-blast doesn't find anything. double-blast failed because NRP was corrupted. 8 July 1998 Kevin Karplus double-blast still doesn't find anything. The 1eps-t44-fssp-global alignment doesn't seem to get good alignment of secondary structure (using PhD). Possibly one of the other alignments is better. 10 July 1998 Christian The subsequence "HGF YPAGGGVV" of t44 is the catalytic site of the enzyme. It is the Prosite signature sequence (entry PS01287) and is "reminiscent of various ATP, GTP or AMP glycine-rich loops". 10 July 1998 Kevin Karplus Made a one-residue shift in 1eps-t44-fssp-global.pw to match the good part of t44-1eps.pw and called the result 1eps-t44-hand.pw This is probably our best alignment. I tried using the 1uae constrained model to align 1eps and t44, but the result was not very good---at least, it didn't look any better to me than the 1eps-t44-fssp-global.pw alignment. 15 July 1998 Kevin Karplus I re-created the 1eps alignments today, using t44.t98_6 as the target alignment. I also created a posterior alignment. There seem to be a few incompatible alignments. I did an all-against-all measure-shift and clustered the alignments according to which had shift scores >= 0.3. There were 3 clusters: One cluster of alignments with high similarity is 1eps-t44-joint 1eps-t44-vit t44-1eps-joint t44-1eps-vit The most dissimilar in this cluster are the two joint alignments. The "center" of the cluster is t44-1eps-vit. These alignments are all very short. Another cluster is 1eps-t44-constr-global 1eps-t44-global=1eps-t44-post 1uae-1eps-t44-constr 1eps-t44-fssp-global The center of the cluster is 1eps-t44-fssp. The remaining cluster is t44-1eps-global t44-1eps-post. 15 July 1998 Christian I have looked at the 1eps-t44-fssp-global alignment and found the alignment to the second half of the structure to be encouraging. My only change was to unalign three residues that followed a proline in a helix. My hand alignment is 1eps-t44-fssp-global-cbarrett-hand.pw.a2m. The first half of the structure possesses the most questionable piece of the alignment, but it even seems plausible given the one large deletion. 1eps-t44-global.pw.a2m has two difficult-to-justify deletions in the structure's second half. The breaks essentially leave no plausible way to reconnect the backbone