Wed Jun 24 14:20:10 PDT 1998 Kevin Karplus The score-target script seems to be failing. I ran hmmscore directly and got a very clear hit: % Sequence ID Length Simple Reverse 1rmg 422 -139.14 -123.48 This makes it difficult to believe the "no known homologous sequence" claim. 1rmg is rhamnogalaturonase, while t68 is polygalaturonase, which sounds like a pretty close functional match to me. In fact, 1rmg is SW:RHGA_ASPAC, which appears in the t68.t98_6 alignment (in fact it appears already in t68.t98_2). OK, score-target fixed. t68-sum98.rdb now has one obvious hit: 1rmg -245.11 The alignment t69-1rmg.pw looks pretty good, with most of the insertions and deletions on the pieces that stick out from the main "right-handed parallel beta helix". Note: 1rmg is in PDB since 26-FEB-97, but does not seem to be in CATH or SCOP yet. NOTE: blast does NOT get the 1rmg hit---top hits are pdb|1GAL| Glucose Oxidase (E.C.1.1.3.4) 66 0.082 1 pdb|2HAD| Haloalkane Dehalogenase (pH 6.2) /pdb|1EDB| ... 60 0.42 1 pdb|1HDE|A Xanthobacter autotrophicus /pdb|1HDE|B Xanthob... 55 0.92 1 pdb|1LMN| Lysozyme (E.C.3.2.1.17) /pdb|1LMQ| Mol_id: ... 49 0.97 2 fssp reps (1gal, 2had->1ede, 1hde->1ede, 1lmn->3lzt) None of these is a structure homolog for 1rmg, but 1rmg IS in the database being searched (it finds itself). Hmm---using wu-blastp locally gets 1rmg as the highest scorer, so there is something wrong with BLASTP 1.4.11 [24-Nov-97] on http://www.ncbi.nlm.nih.gov Double-blast gets two hits: T0068 1rmg -48.0919 6.7e-22 1.3e-21 SW:MPA2_CRYJA_146:414 T0068 2mtaC -4.42285 3.8e-10 0.012 GP:ANPGAE_1_114:340 Again 1rmg is the obvious correct hit. From compbio.casp-request Wed Jun 24 16:26:05 1998 Return-Path: karplus@cse.ucsc.edu Date: Wed, 24 Jun 1998 16:26:03 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: can't trust those target submissions T0068 was listed as "no known homolog", but it has a very obvious one: 1rmg, a right-handed parallel beta helix. This is a clear winner (score -245.11 with SAM-T98), and is findable with single-blast (-4.51) and double-blast (-48.1) The template could also be found by name search: t68 is polygalaturonase 1rmg is rhamnogalaturonase Interestingly, 1rmg is NOT found by http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast?Jform=0 nor http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-newblast (1rmg IS in the database ncbi searches, as it can be found using 1rmg as a query). It looks like BLASTP 1.4.11 and BLASTP 2.0.4 are NOT as good for this search as wu-blastp. The initial alignments look pretty good, with conserved residues clustering near each other on the beta sheets. ------------------------------------------------------------ 24 June 1998 It might be worth trimming t68.t98_6_sorted.a2m to exclude the branches of the tree that are outside the subtree containing t68 and 1rmg (getting rid of subfamily 12), then retraining that alignment. That might produce very slightly better alignments. One could probably get a similar effect by retraining t68.t98_2, but using the subtree is a little more elegant. Starting a target98 alignment wiht that subtree alignment as a seed and very strict thresholds might also be useful. 14 July 1998 The latest t68.t98_6.tree requires the entire tree to get RHGA_ASPAC into the same subtree as t0068, so there is no point to trimming to a subtree. 21 July 1998 Kevin Karplus The 1rmg alignments are in fairly good, but not excellent agreement. The one that agrees most with the rest is 1rmg-t68-vit, whose worst measure_shift score is 0.6838 with t68-1rmg-joint, though that is raised to 0.735 if 46 columns are dropped from t68-1rmg-joint, or 0.7337 if 45 are dropped form 1rmg-t68-vit. The best agreement with 1rmg-t68-vit is 1rmg-t68-post=1rmg-t68-global (0.912, going to 0.917 with 3 fewer columns). 21 July 1998 Christian The probable active site for t68 is the His of the subsequence FGTGHGMS. From cbarrett@cse.ucsc.edu Tue Jul 21 14:59:09 1998 Return-Path: cbarrett@cse.ucsc.edu X-Authentication-Warning: moo.cse.ucsc.edu: cbarrett owned process doing -bs Date: Tue, 21 Jul 1998 14:59:07 -0700 (PDT) From: Christian Barrett X-Sender: cbarrett@moo To: karplus@cse.ucsc.edu Subject: t68 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII I have looked at the alignments for this target and have produced a hand alignment that is essentially 1rmg-t68-vit with minor tweaking. I unaligned the match to the surface helix that occurs in the middle of the long insert and made two 1-residue shifts near the end of the alignment. If we want to be more conservative with this prediction, we may want to unalign residues at the end of the alignment. My hand alignment is 1rmg-t68-vit.cbarrett-hand.pw.a2m. Christian From karplus@cse.ucsc.edu Tue Jul 21 15:12:58 1998 Return-Path: karplus@cse.ucsc.edu Date: Tue, 21 Jul 1998 15:12:57 -0700 From: Kevin Karplus To: cbarrett@cse.ucsc.edu Cc: karplus@cse.ucsc.edu In-reply-to: (message from Christian Barrett on Tue, 21 Jul 1998 14:59:07 -0700 (PDT)) Subject: Re: t68 I'll look at 1rmg-t68-vit.cbarrett-hand.pw.a2m tomorrow. I was most interested in what happened in a couple of places near the ends of beta hairpins. Wed Jul 22 19:03:31 PDT 1998 I looked at Christian's alignment and thought it was much too unagressive in aligning things. I started over with 1rmg-t68-global which predicts for almost everything. 23 July 1998 Draft of results: T0068, polygalaturonase, has a very clear homology to 1RMG, rhamnogalacturonase, which has a fairly clear functional similarity as well. Wu-blast scored 1rmg highest, but with a score that has about 75% false positives for superfamilies. Double-blast scored 1RMG highest, in a score range with only 5% false positives. Our method also scored 1RMG highest (at -245.3), well into the range where we had no false positives. Interestingly, blastp (gapped or ungapped) run on NCBI's web site failed to score 1RMG at all---perhaps because the web site is set up for speed rather than thoroughness. We found many conserved residues in stripes across the beta sheets, and hand-tweaked the alignments in weakly aligned areas to continue this conservation pattern. The C terminus of the alignment required the most adjustment and is probably still misaligned. From venc@september.llnl.gov Thu Jul 30 18:54:37 1998 Return-Path: venc@september.llnl.gov Date: Thu, 30 Jul 1998 18:53:51 -0700 From: venc@september.llnl.gov (Ceslovas Venclovas) To: karplus@cse.ucsc.edu Subject: Update for target T0068 Dear Predictor, In the target description form, to facilitate initial classification, we have asked crystallographers to provide some information regarding homology to known structures. In some cases the distinction is not entirely clear or only part of the target is homologous. This may lead to some confusion. E.g. it has come to our attention that although the sequence similarity is weak, T0068 has been classified (Swiss-Prot Q00001) as glycosyl hydrolase homologous to PDB 1RMG. Please, see http://www.expasy.ch/cgi-bin/lists?glycosid.txt Sincerely, Ceslovas Venclovas for CASP3 organizers -- Protein Structure Prediction Center, Lawrence Livermore National Laboratory, Livermore, CA 94550 E-mail: venclovas1@llnl.gov Phone: (925) 422-3097 Fax: (925) 423-3608 From karplus@cse.ucsc.edu Thu Jul 30 20:40:19 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 30 Jul 1998 20:40:18 -0700 From: Kevin Karplus To: venc@september.llnl.gov Cc: karplus@cse.ucsc.edu In-reply-to: <199807310153.SAA07562@september.llnl.gov> (venc@september.llnl.gov) Subject: Re: Update for target T0068 Thank you for the clarification. The 1RMG homology is not weak---it is extremely strong. Only a poor set of default parameters causes NCBI blast to miss it. All the other tools we tried (including default wu-blast) found the homology with no difficulty. There is some difficulty in getting a good alignment, so t0068 is still an excellent fold-recognition target. I have question on t0066. I have a prediction of the interaction of t0064 and t0065 which consists of aligning them both to the same structure, but I see no way to submit this as a prediction with the current format. Since the prediction is the same as for the chains separately, should I just submit a file with METHOD but no MODEL for t0066? Please advise me on how to submit this. (We've already submitted T0064 and T0065, if you need to look at our submissions to figure out what I'm talking about.) Kevin Karplus