PFRMAT TS TARGET T0052 AUTHOR 9070-5088-8627 REMARK REMARK Prediction date: Wednesday June 10, 1998 REMARK Group name: UCSC-compbio REMARK Authors: Christian Barrett, Melissa Cline, Mark Diekens, Kevin Karplus, REMARK David Haussler and Richard Hughey REMARK University of California, Santa Cruz REMARK METHOD METHOD UCSC Computational Biology METHOD METHOD All experiments were performed using SAM version 2.1.1 [1] using a METHOD refinement of the methods used by this group in CASP2 [2]. METHOD METHOD Overview of the method METHOD METHOD Fold recognition was performed using the Target98 (SAM-T98) method METHOD [3]. This method attempts to find and multiply align a set of METHOD homologs to a given sequence, then create an HMM from that multiple METHOD alignment. METHOD METHOD First, a set of sequence weights is determined from the alignment. Next, METHOD Modelfromalign is used to build the model from the alignment and the METHOD sequence weights. Finally, hmmscore performs a local, all-paths scoring METHOD of the sequences, using a reversed-sequence normalization feature. METHOD METHOD The weighting method, detailed in upcoming publications [3,4], METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an METHOD entropy method to set the final weights. METHOD METHOD Alignment generation METHOD METHOD The initial step uses WU-Blast, BLASTP version 2.0aMP from Washington METHOD University, to select the potential homologs from the non-redundant database. METHOD NRP is searched twice to produce two sets of homologs: one of very close METHOD homologs (E<0.00003) and one of possible homologs (E<500). METHOD METHOD The target98 method then uses multiple iterations of a selection, METHOD training, and alignment procedure. For each iteration it needs an METHOD initial alignment, a set of sequences to search, a threshold value, METHOD and a transition regularizer. Alignments in the library were built METHOD with 4 iterations, with thresholds -40, -30, -24, -16, but the target METHOD alignment was built with 6, with thresholds -50, -40, -30, -22, -16, and METHOD -14. METHOD METHOD On the first iteration the single sequence (or seed alignment) passed METHOD to the method is used as the initial alignment and the close homologs METHOD found by WU-BLAST are used as the search set. The threshold is set METHOD very strictly, so that only really good matches to the sequence are METHOD considered. This iteration uses a transition regularizer that was set METHOD up to try to match the gap costs used by WU-Blast. METHOD METHOD On subsequent iterations the input alignment is the output from the METHOD previous iteration and the search set is the larger set of possible METHOD homologs found by WU-Blast. The thresholds are gradually loosened. METHOD For the second through second-from-last iteration, a ``long-match'' METHOD transition regularizer is used, and for the final iteration a METHOD transition regularizer trained on FSSP structural alignments is used. METHOD METHOD References METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. METHOD http://www.cse.ucsc.edu/research/compbio/sam.html. METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. METHOD Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and METHOD Genetics, Suppl. 1, 134 9, 1997. METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, METHOD Department of Computer Science, University of California, Santa Cruz, 1998. METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, METHOD and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574 578, Nov 1994. METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. METHOD Mian, and D. Haussler, CABIOS, vol 12, pp 327 345, Aug 1996. METHOD METHOD Results METHOD METHOD The Target98 method found no homologs in NRP for T52 other than METHOD itself, and so the model built from the target98 alignment is not METHOD likely to be very powerful in finding remote homologs. METHOD METHOD The top scoring possible homologs in PDB were as follows: METHOD chain score FOUND by model METHOD 1pmd -6.21 t52 METHOD 1hsq -4.51 1hsq library model METHOD 1pdgA -3.25 t52 METHOD 1broA -2.83 t52 METHOD METHOD 1pmd did get a fairly good score, though its structural homologs 1btl, METHOD 2bltA, and 3pte did not and -6.2 is in the range where the probability METHOD of a match being a false positive is about 70%. The alignment of T52 METHOD to 1pmd was only moderately compact and included an unsupported helix METHOD at one end. Also the two known cystine bridges did not map to close METHOD positions in 1pmd, so we decided that this match was unlikely to be METHOD correct. METHOD METHOD The alignment of T52 to 1hsq seemed to match only a tiny fragment: METHOD WQPSNFIE METHOD WFPSNYVE METHOD with a few other very short matches scattered along the chain. METHOD This is a motif with a strand and tight turn or short helix. While it METHOD is an interesting bit of secondary structure, it is too small to be METHOD suitable for fold recognition. There is no similar motif in 1pmd. METHOD MODEL 1 PARENT NONE TER END