From compbio.casp-request Sat May 9 21:55:08 1998 Return-Path: karplus@cse.ucsc.edu Date: Sat, 9 May 1998 21:55:07 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: t43,t44,t52 I've started work on 3 chains: the earliest targets (t43 and t52) and the one where we got a big hint (t44). So far t44 is the most promising, with a score strong enough that we would have only bet 16-23% on "new fold" even without the hint that the fold was in the database. I am 80% certain that 1eps (and 1uae and 1naw) is the fold we want, but am having trouble getting a good alignment. So far, I think the best alignment is one done using 1eps-fssp-t98 as the basis for a model. T43 and T52 are harder. The scores are poor, and I'd bet 84% on "new fold" for T43 and 70% for "new fold" on T52 (based on the percentage of false positives near the best summed score for the target in the SAM-T98 tests). I could be induced to change those estimates if it turns out that the pieces that align most consistently are core parts of the templates, and if the functions of the templates and the target are sufficiently similar. I have not yet done a search for proteins of similar function (except for t44, which had only one possible hit in the same enzyme class, and it was easily rejected based on the hints we were given---the secondary structure was clearly different). From compbio.casp-request Mon May 11 02:11:51 1998 Return-Path: karplus@cse.ucsc.edu Date: Mon, 11 May 1998 02:11:50 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: t43--t52 summary I looked at the ten available targets t43--t52. Here is a quick summary of each---see the individual README files for details: t43: New fold 84% other possible: 1ris 2end (& maybe 1bv1) t44: (hints: NOT NEW FOLD, PhD fairly accurate) would have predicted new fold at only 20% without hint. Only 1 high scorer: 1eps=1uae=1naw Have fssp and constrained alignments to try using for joint models. t45 new fold 84-89% other possible: 1smnA, 1aq6A, 1spgA, 1aqzA none of these are compatible folds. t46 new fold 85-92% other possible: immunoglobulin light chains or 1cid_1, which may have a structural similarity to part of an immunoglobulin. May be worth building fssp models for 1cid and restricting to 1cid_1, or using fssp model for appropriate immunoglobulins. t47 known homolog (1mup, 1bebA, 1obpA, 1epaA) I've asked markd to build the fssp and constrained alignments for these 4 chains, in the hope of getting a better alignment. t48 known homolog 1dcpA no point to building fssp alignment---all significant structural similarity is to chains with 100% residue ID. t49 known homologs (3pte, 2bltA) [compatible structures] I've asked markd to build the fssp and constrained alignments for these, as they have significant structural homologs with low sequence id. t50 known homologs 1reqA and 1bmtA have different structures Choosing one or merging the predictions will be tough. Note: homologous to 1bmtA_2, not 1bmtA_1, so SCOP classification is 3.13.5.1.1 I've asked markd to build the fssp and constrained alignments for these. t51 Claim for homology in target release probably bogus (in same crystal as t50, which may help in predicting both). new fold 85-92% other possible (all inconsistent): 1svb, 1aei[ABCDEF], 1mdl, 1hrdA, 1af2A, ... I'm tempted to leave this as "new fold", unless some strong hint comes from t50. t52 new fold 60-75% other possible: 1pmd, 1hsq, 1pdgA, 1broA Of these, only 1pmd currently looks promising. Models built from FSSP alignents are unlikley to help, as 1pmd is large and multi-functional, while t52 is only 101 residues. (Scoring with t5 models finds 1pmd, but not vice versa). From compbio.casp-request Mon May 11 19:10:40 1998 Return-Path: karplus@cse.ucsc.edu Date: Mon, 11 May 1998 19:10:32 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: update on targets t43: none of the alignments I've looked at so far look very good moderately high residue identity, but for short stretches and no consistency between the different possible targets. This one looks tough. t44: Very nice alignment to 1eps and pretty good one to 1uae. Waiting for fssp-t98 to finish to see if that gives better alignment for 1uae. t45: no news today---stil no great templates. t46: all the hits are to immunoglobulins (1cid_1) is an immunoglobulin too. The 1tetL alignments (not using fssp information) look pretty bad--tiny fragments or non-compact regions. I'll have to try some fssp alignments---the target98 alignments take forever on immunogloublins, because there are over 10,000 in NRP. t47: Alignment is trivial---the alignment with 1mup is gapless and has a huge residue identity. Everyone will get this, and it will become purely a homology modeling problem, where we have no expertise. t48: Alignment is trivial---the alignment with 1dcpA is gapless and has a huge residue identity. Everyone will get this, and it will become purely a homology modeling problem, where we have no expertise. t49: 3pte-t49-const-global alignment looks pretty good. Still waiting for other alignments to be done. This has a high enough sequence id for anyone to find, but the alignment has gaps and so can be easily screwed up. This will be a real test of our alignment techniques. t50: The alignments here all look preety good, but there are minor differences. I think this is another one everyone will get, but where the quality of alignment will vary. t51: No progress---nothing looks good here. Maybe we need to see if the 1bmtA_2 domain for t50 occurs in other multi-domain proteins, to see if there are possible templates that we haven't looked at. Neither 1bmtA nor 1reqA look very promising. t52: The t52-1pmd and t52-1pmd-global alignments are only moderately compact, but crosses a domain boundary. I'd like to try with 1pmd fssp alignments, to see if they align any better. 1pmd-t52 is a tiny fragment and 1pmd-t52-global is a scattered set of fragments, though the biggest one is consistent with the t52-pmd alignments. The 1hsq alignments seem to match only a tiny fragment: WQPSNFIE WFPSNYVE A rather fancy strand-1turn helix-strand motif. There doesn't seem to be a corresponding structure in 1pmd. This one is going to be a tough call. From compbio.casp-request Tue May 26 17:35:07 1998 Return-Path: karplus@cse.ucsc.edu Date: Tue, 26 May 1998 17:35:05 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Cc: leslie@cse.ucsc.edu Subject: status report Still haven't gotten any word on the format for predictions. Secondary-structure predictor just about ready for training on large dataset, but scripts specifying chains not written yet. Don't have any secondary-structure predictions for targets yet. t43 still no strong prediction 85-89% new fold t44 1eps or 1uae. want 2ry structure to help align. t45 still no strong prediction 84-89% new fold t46 still no strong prediction 85-92% new fold t47 1mup or 1bebA or 1obpA---most likely gapless alignment to 1mup t48 1dcpA gapless t49 I like t49-3pte-global or t49-2bltA-global, 2ry structure might help choose an alignment t50 1reqA-t50-const-global and 1bmtA-t50-const-global both look good (How different are they?) t51 no hits yet 85-92% new fold. need to look for proteins with the domain of t50 for possible homologs. t52 1pmd only decent hit. Alignments are ok, but not great. 65-75% new fold t53 1pfkA is best current hit (60-69% new fold) no great alignments. t53-1djxB currently most promising. t54 1prcC (70-82% new fold) haven't looked at alignments t55 1esl, 1rtm1, 1hlj all good hits, may want to look for galactose-specificity in lectin to choose model. 1esl-t55-fssp-global alignment looks good, preserves cystine bonds. t56 76-82% new fold. Secondary structure hint given, so far no matches on 2ry structure. t57 1gypA-t57-fssp-global looks good in places, but has some suspect regions. 1dapA-t57-fssp-global doesn't look quite as good. Probably need the constrained alignments. I've been using Leslie's new version of sae (saen), which works a bit better than the old one for looking at alignments in 3D. Fixes needed: 1) structure strings should be read like other FASTA files. 2) should look at PDB file name to pick default guide sequence. 3) residue specifications to PDB should include chainID (as in 15:A-23:A 4) should give "center" command for aligned region 5) should have "restrict" button to restrict rasmol to guide chain From compbio.casp-request Thu Jun 11 12:40:35 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 11 Jun 1998 12:40:34 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: limited summary Now that we've finished t52, I've been looking at three chains: the earliest deadline targets t54 and t43 the most recently released target t62 Nothing new on t43 yet---I made a 2d prediction, but have not looked yet to see if it matches any of the alignments. There are some fairly high confidences on the 2d prediction, so it should be useful. T54 is interesting, in that several of the top hits are related, increasing my confidence somewhat in there being a match in there, despite the low scores. I'm also interested in trying a match to a pair of chains, 1prcL + 1prcC, so Melissa and Mark should probably start thinking about how to create the file to submit if that works. T62 looks like fun---we get excellent scores for all the reductases, though blast only finds a few of them. I've been trying to decide which reductase family to use as a model, and have pretty much settled on 2pia, even though it doesn't get the highest scores. When I build evolutionary trees from any of the alignments, the smallest subtree that includes the target and a sequence with known structure always includes only 2pia. I'll probably end up using a restriction of 2pia.constr-t98 to just the interesting subtree as the model to align to. From compbio.casp-request Wed Jun 17 10:37:52 1998 Return-Path: karplus@cse.ucsc.edu Date: Wed, 17 Jun 1998 10:37:50 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: targets t43 and t54 Targets t43 and t54 need to go out early next week. Both look to me like we'll be using the "NONE" prediction again. We had no strong hits, and the alignments we did get don't look so great in 3D. I do need to look at the t43-1ris hits in 3D still (I ran out of time yesterday), so I might still change my mind on that one. Melissa, could you draft the METHODS sections again, using the t54 METHODS as a base, and incorporating any notes that look relevant from the README files? For t43 a lot of the hits came just from a single, long helix in the middle of the chain that has a good ampipathic helix pattern (neural net predicts helix with 98% probability or higher). For t54, we had a hint that it was probably a new fold, and all we got were small pieces of secondary (or sometimes super-secondary) structure. There were some hints about the structure in a published abstract, and none of the structures came close to those hints. From compbio.casp-request Thu Jun 18 22:16:50 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 18 Jun 1998 22:16:48 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: new targets T64 very similar to repressor proteins (sum score -100) We'll need constrained model for 1r69 to get decent alignment. T65 no good finds with target model, library models still running T66 = T64+T65 We probably can't do a thing with this, unless we decide to try to map t64+t65 to a homodimer. T67 no good finds: 84-89% bet on false positive. From compbio.casp-request Thu Jul 16 15:34:15 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 16 Jul 1998 15:34:14 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: status report I'm keeping a one-line-per-target status report in ~/pce/casp3/list-of-hits Since we have a bunch of targets due at once on 31 August, I propose that we first finish the ones due earlier (t49, t68, t55), then finish all the other ones with "easy" homologies (t47, t48, t50, t57, t58, t60, t62, t64, t688, t69, t70). That will give us the most time to agonize over the hard ones. It looks like t44 is the only one that falls in the twilight zone, so I hope we did well on it. Here is the current status report: TARGET CHAIN easy? sent? comments t43 new hard x t44 1eps mid in twilight zone--tough to align t45 new? hard t46 new? hard t47 1mup easy maybe 1bebA or1obpA---1mup is gapless t48 1dcpA easy gapless t49 3pte easy may need to twiddle alignment t50 1bmtA easy But may need alignment to 1reqA as well t51 new? hard in same crystal as t50. Any help? t52 new hard x decided not to use weak alignments t53 1pfkA? hard t54 new hard x got some super-secondary helix matches t55 1esl easy redo using t55.t98_5 t56 1wqjA? hard check reported 2ry structure. t57 1gadP easy alignment may need tweaking t58 1akz easy maybe use t58.t98_2 t59 ? hard t60 1gifA easy one 1-residue insertion t61 ? hard 7catA? 1iphA? t62 2pia easy used trees to pick 2pia rather than 2cnd t63 1pex hard x dubious, but pretty prediction t64 1adr easy? structure of second half not determined! t65 ? hard t66 two sequences t64+t65 t67 8abp? hard t68 1rmg easy ncbi blast mysteriously misses this! t69 1hup easy use t69.close.retrain 1hup, 1rtm1, or 1rdi1 t70 1gfn easy maybe 2omf, 2por, or other porin t71 2reb? 1hvc? hard use t71.remote_3? t72 1vvc? 1ktx? hard From compbio.casp-request Thu Aug 27 10:52:19 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 27 Aug 1998 10:52:15 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: 10 targets left, need to do ~1 a day We are getting close to the end of the CASP contest, and we still have 10 targets left (plus three that are done but not submitted yet). Here are the ones left to predict, sorted by due date: 1sep t56 2mhr? hard matches reported 2ry. 2sep t77 1tif? hard 4sep t67 1rhi2? hard paper suggests 3pgk, I don't believe it 5sep t80 1t7pA+B hard DNA-binding, probably long helix in minor groove 6sep t81 1auoA? hard maybe 1oyc instead 7sep t71 1ckaA? hard use t71.remote_3? 8sep t72 1vvc? hard 8sep t83 1lmb3+ hard 2 domains? why match a DNA-binding site for this enzyme? 14sep t84 hard piece together peptide from 4blmA+1ft1A+1fgjA ?sep t85 2cthA+1ycc? mid? 2 domains? which cytochrome(s)? Here is a status report on the 10 remaining ones: I'm ready to give up on t56, and submit the 2mhr (4-helix bundle) alignment. I see that t77 is still being worked on by Christian. I'm willing to go with a 1rhi2 alignment for t67, though not with great confidence. The t80 alignments still need some work, but I have reasonable hope for 1t7pB or a chimera of 1t7pA and 1t7pB. Christian and I are still looking at alignments for t81---we have some info about the active residues that is helping weed out the poor alignments. We still have alignments to look at for t71---we're relying heavily on conservation patterns to get the right alignment of some SH3 domains. For t72, we have a very promising hit, but we need to check the reported cystine-bridges, to see if we can match them up well. For t83, we have a good-scoring hit for the first half of the protein, but one which is not functionally very sensible---there is no expectation that this enzyme binds DNA. This is probably a new fold, but we might use the first half-alignment anyway. For t84, we have a fair idea how the peptide folds (helix, turn, helix), but will have to piece together the prediction from three different proteins---a rather ugly submision for such a small peptide. We'll have to rely mainly on text, or borrow the use of Insight to piece together a 3D model. For t85, we have several rather different cytochrome hits. Choosing the right one(s) is going to be difficult. I suspect that the overall topology of the protein is a little different from any of the existing cytochromes, and the best we can do is to provide unconnected cytochrome domain hits. From compbio.casp-request Sat Aug 29 11:15:06 1998 Return-Path: karplus@cse.ucsc.edu Date: Sat, 29 Aug 1998 11:15:04 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: only 5 left! We're down to five predictions left to go. It would be REALLY nice if we could get these off by the time Christian leaves for vacation on Wednesday. The remaining ones are t71 We are fairly sure there are some SH3 domains, based on what this binds to, despite poor scores. Figuring out exactly where the SH3 domains are is hard, because residue identity is low. We currently favor one of the 1ckaA alignments. This prediction is basically Christian's work, so I'd like him to make the final (or nearly final) decision on which alignment to submit. t72 no good hits. I liked 1vvc until I looked up the published cystine bridges, which don't match. We might submit it anyway, with appropriate caveats that it doesn't match the reported cystine bridges. The cystines are very important to the CD5 domain, so failure to get them right is pretty damning. t80 DNA-binding. We have only rather remote hits. The DNA gycosylases seem to be a rather diverse group of proteins. There are a few DNA-binding proteins among our best hits, and we may choose one of them somewhat arbitrarily. t83 All our top scores were repressor-like DNA-binding domains, which is a bit strange, since t83 is not expected to bind DNA. One of the close homologs found in building the target98 model is SINR_BACLI, a close homolog of T0064. We will probably report the alignment of the first domain to 1lmb3, and either not report on the second domain or give one of our possible alignments. I still need to look at the top 8 candidates (see the README). t84 This is a short peptide, with no full-length matches in the database. I have a fair idea about what it looks like, with a secondary structure assignment and alignments to helices and the turn. TLLHHHHHHHTTLLTTHHHHHHHHHHHHHHHHHHT I'll probably have to piece this together with Insight, since the matches are too small to be meaningful by themselves. From compbio.casp-request Wed Sep 2 18:31:25 1998 Return-Path: karplus@cse.ucsc.edu Date: Wed, 2 Sep 1998 18:31:23 -0700 From: Kevin Karplus To: compbio.casp@cse.ucsc.edu Subject: CASP almost done Christian and I sent off 4 more predictions today, leaving only 1 (t83) still to do. I may try to provide some 3d coordinates for t84 also, as the alignment was in too many pieces to be easily evaluated.