Kevin Karplus 9 May 1998 T43 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (HPPK) appears to be SW:HPPK_ECOLI (does this match P26281?). TO DO: get the swissprot record for HPPK_ECOLI and look up its EC number. Started the standard 2-way searches using target98 today. The top-scorers with the target alignment were the homologs of 2end (Endonuclease V (E.C.3.1.25.1)): 1enj, 1vasA, 1eni, 2end, 1enk. Of these, 1enj seems to score the best, though none score super well. Close behind are homologs of 1bv1 (birch pollen allergen bet v 1): 1btv and 1bv1. Next: 1jsg, a proto-oncogene. Next: 1ccd, 1utrA, and 1utrB; but not their fssp representative 1utg. Next: 1gtqA and 1gtqB. The top scorers among the template models are 1aorA, 1ris, 1har, 1bcpA, 1prtA, though there are no strong scores. 1aorA is Aldehyde ferredoxin oxidoreductase protein , with no structural homologs in fssp. 1ris is ribosomal protein s6, with some structural similarity to 1psdA, 1pysB, and 1spbP. Looking at the sum in both directions, the top scores are for 1ris, (2end, 1enj, 1vasA, 1eni), (1btv, 1bv1), 1aorA The two highest scores are only -4.34 and -4.21, which are in a range where the chothia/domains test got 4 more true positives for 14 more false positives (or 21 more true for 114 more false), and 23.5 more true for 127.73 more false in the fssp test. This means that the probability for predicting a new fold should be about 84%. Looking at alignments for 1ris and 2end, I see a common motif being matched: T0043 LETSLAPEELLNHTQRIELQQGRVRKAERWGPRTLD- 1ris ----------------LENYGARVEKVEELGLR---- short match 1ris LDQSQLALEKEIIQRALENYGARVEKVEELGLRRLA- long match 2end ---------------------GAVRKHVANGKRVRDI short match 2end LVSELADQHLMAEYRELPRVFGAVRKHVANGKRVRDI long match The question is---is this a common motif? Is it in PROSITE, for example? Is it of any use in predicting the structure? This is really about all we're finding in 1ris and 2end, though we can set parameters to grow this seed into a more complete alignment. Doing a search with just this motif as a seed gets a fairly small set of sequences---indeed the 1ris sequence is rejected from motif.t98 up to iteration 5. The motif seems to be fairly characteristic of the 2end homologs, and not a common motif. -------------------------------------------------- 11 May 1998 Kevin Karplus None of these global alignments look very good---there are large gaps in awkward places. Maybe it is necessary to make the fssp alignments. The structures for 1ris, 2end, and 1bv1 are quite diffferent, though all combine alpha helices and beta strand is a single small protein. From karplus@cse.ucsc.edu Mon May 11 18:13:30 1998 Return-Path: karplus@cse.ucsc.edu Date: Mon, 11 May 1998 18:13:29 -0700 From: Kevin Karplus To: markd@cse.ucsc.edu Cc: karplus@cse.ucsc.edu Subject: three more alignments 1ris, 2end, 1bv1 -------------------------------------------------- 26 May 1998 top score -4.34 means TRUE FALSE new-fold chothia-domains 58 479 89% fssp 23.5 127.73 84% -------------------------------------------------- Kevin Karplus 16 June 1998 If I push the method to extremes, (with remote_4.a2m) I get 1gcl[ABCD] -5.160 (1gcmA is FSSP rep) leucine zipper 1enj -4.550 (2end is FSSP rep) 1eni -4.520 (2end is FSSP rep) The 1gclA, 1gcmA match is just to a single helix in the middle of t43---it is a leucine zipper, and there is a fair chance that this helix is doing something interesting with the other copies of the target in the crystal, but this is not enough to make a fold prediction. The 2end alignment is mainly to the same helix. It seems to support a for PEELLNHTQRIELQQGRVRKAER, merging two adjacent helices in the current neural net prediction. (The 1gcmA alignment also suggested a single helix there.) Given that all our best hits are to this helix, I think we should probably go with a "new-fold" prediction, perhaps with a statement that our methods found homologs to this helix. ------------------------------------------------------------ 22 June 1998 Kevin Karplus Looked at the The best scoring hits for target 43 were as follows: 1ris 1enj and its homologs 2end, 1casA, and 1eni 1bv1 and its homolog 1btv, 1aorA However, the best scores were a modest -4.34 and -4.21. In that range, the number of false positives outnumber the true positives at a rate of approximately 8 to 1. When we tried loosening the thresholds for building the target model, we had an additional hit to 1gclA and 1gmcA. The 1gcmA alignment suggests that EELLNHTQRIELQQGRVRK is a long helix. The somewhat better alignment to 2end suggests that the helix extends a bit further: T0043 VALETSLAPEELLNHTQRIELQQGRVRKAERWGPRTLD LSLGGGSLHHHHHHHHHHTTHHHHHHHHHHHTTLLGGG 2end LTLVSELADQHLMAEYRELPRVFGAVRKHVANGKRVRD (There is a squished place at a proline in the 2end helix, which DSSP labels as a turn, but there is no change in the axis of the helix there, and T0043 lacks the proline, so most likely has a single continuous helix.) The 1ris alignment, on the other hand, suggests a strand for the second half of what the 2end alignment predicts as a helix: T0043 NHTQRIELQQGRVRKAERWGPRTLDLDI. HHHHHHHHTTLEEEEEEEEEEEEEEEEE 1ris IIQRALENYGARVEKVEELGLRRLAYPI None of these remote-homolog hits are sufficient to make a fold prediction. The residue identities are fairly large for both the 2end and 1ris alignments, so even secondary structure prediction is difficult from these alignments. We favor the long-helix interpretation, because the proline (in WGPRT) seems more likely to be at the end of a helix than in the middle of a beta strand. The sequence RVRKAER does seem to be a good candidate for a "chameleon" sequence that can be either helix or strand depending on environment. 19 August 1998 Christian Moved everything to old and remaking with new Makefile.