Mon Jun 12 10:45:51 PDT 2000 Kevin Karplus T0098 Spo0A protein, B. stearothermophilus is very similar to other sporulation proteins, but that is all that the T2K alignment includes. Blast and double-blast don't find any promising targets, and neither does the target model searching PDB. The template model library finds only one weak hit: chain E-value FSSP rep 1nseA 36 1nseA The weak blast hits are 1pog 1au7A 1bxm 1bxm The predicted secondary structure is mainly helical, with fairly strong helix predictions. The 1au7A alignments get good raw scores, but are almost completely canceled with the reverse-null model. The alignment 1au7A/T0098-1au7A-global.pw has only 6 conserved residues and several gaps in awkward places (including omitting the helix that goes in the DNA main groove in 1au7A. The alignment 1au7A/1au7A-T0098-fssp-global.pw breaks in reasonable places, but has NO conserved residues. This is a pretty flimsy alignment to base a prediction on. The alignment 1nseA/T0098-1nseA-vit.pw has 9 conserved residues with no gaps, but only covers 31 residues, and is not compact in 3-space. The alignment 1bxm/T0098-1bxm-vit.pw has 8 conserved residues in a gapless alignment of 21 residues. Not enough for fold recognition, but a nice piece of secondary (or supersecondary) structure. The secondary structure corresponds nicely with the prediction for the chunk that is aligned. Mon Jun 26 09:52:21 PDT 2000 Remade 2ry predictions Tue Jul 11 12:13:06 PDT 2000 No strong CAFASP consensus. Top hits: 8 1.77.1 1jwe 1b79A 7 1.41.1 4cln 1cll 3cln 1cdmA 1rro 5icb 1ahr 7 1.23.1 1aoi[BD] 1tafA 1bfmA 5 1.75.1 5 1.36.1 5 1.1.1 4 3.72.1 4 3.31.1 4 1.4.1 Wed Jul 26 16:13:40 PDT 2000 CAFASP Fold summary fold #server #times #first signif 1.77 8 8 6 1 1.41 4 7 3 0 1.75 5 5 0 0 1.23 4 7 0 0 1.36 4 5 1 0 1.1 4 5 0 0 3.31 4 4 0 0 3.72 3 4 1 0 1.4 3 4 1 0 Extending the search (make remote) still doesn't find anything with the target model. 27 July 2000 Kevin Karplus make 2track finds one target hit: 1di2A 36. 1di2A 4.42.1.1.1 Possible hits found so far by any technique (including CAFASP) 1nseA 4.152.1 1di2A 4.42.1 1pog 1.4.1 1au7A 1.4.1, 1.36.1 1bxm 1.125.1 beta-crytogein 1b79A 1.77.1 DNA helicase 1cll 1.41.1 calmodulin 1aoiB 1.23.1 histone 31 July 2000 Kevin Karplus Remade "make 2track" using new hmmscore get 1 hit: % Sequence ID Length Simple Reverse E-value X count 1di2A 69 -15.60 -4.00 3.6e+01 4 August 2000 Kevin Karplus Remade "make 2track" with fixed model (having a GENERIC node matching approx background probs) on t99 sequences. Top 2track hits are now % Sequence ID Length Simple Reverse E-value 1a41 234 -27.02 -8.44 1.1e+00 4.141.1 1bxm 99 -19.16 -7.80 3.1e+00 1.125.1 1ebmA 317 -24.93 -7.79 3.1e+00 ? 1qgiA 259 -22.32 -6.98 8.5e+00 4.2.1 1beo 98 -17.54 -6.52 8.5e+00 1.125.1 1unkA 87 -20.29 -6.50 8.5e+00 1.29.2 1dkzA 219 -27.28 -6.34 8.5e+00 5.17.1 1qpzA 340 -16.83 -5.98 2.3e+01 1.36.1, 3.88.1 1gumA 222 -25.59 -5.93 2.3e+01 1.47.1, 3.42.1 1cei 94 -19.55 -5.91 2.3e+01 1.29.2 2ezk 99 -22.98 -5.87 2.3e+01 1.4.1 2ezl 99 -22.83 -5.81 2.3e+01 1.4.1 1bg1A 722 -26.79 -5.75 2.3e+01 1.49.1, 2.2.5, 4.77.1 1frvB 536 -24.32 -5.53 2.3e+01 5.15.1 1bouA 139 -19.80 -5.49 2.3e+01 1.84.1 1gln 468 -25.35 -5.45 2.3e+01 1.92.1, 3.19.1 1gnwA 211 -24.69 -5.41 2.3e+01 1.47.1, 3.42.1 Of the CAFASP folds, we still have no 1.77, 1.41, 1.75, 1.23, 1.1, 3.31, 3.72 hits, but we do have 1.36 (1qpzA) and 1.4 (2ezk, 2ezl). The best-scoring alignments are now 1a41/T0098-1a41-2track-local 234 -27.02 -8.44 1.0e-03 Secondary structure match is excellent, gets 13 conserved residues. Template is isomerase (so is DNA-binding). Problem: structure has tight turn that is omitted (we predict long helix). 1unkA/T0098-1unkA-2track-global 87 -8.81 -8.49 1.0e-03 Only 8 conserved residues, long predicted helix truncated. 1nseA/1nseA-T0098-vit 121 -12.16 -7.84 1.2e-03 10 conserved residues in 31-residue fragment. Has strand where we predict helix. Conserved RGNL is good turn after helix. Mon Aug 7 14:02:36 PDT 2000 Kevin Karplus Continuing to look at alignments: 1bxm/1bxm-T0098-vit 121 -8.84 -7.09 2.5e-03 12 conserved residues in 46-residue gapless alignment. very good 2ry match 1bxm/T0098-1bxm-2track-local 99 -19.16 -7.80 2.7e-03 extends vit alignment by a few residues at beginning (to begin of helix) 1ebmA/T0098-1ebmA-2track-local 314 -24.94 -7.79 2.7e-03 DNA-binding protein 11 conserved residues over about 100 residues, with one gap (shrinking one helix near DNA) Good 2ry structure match. Sun Aug 13 2000 Christian 1nseA/1nseA-T0098-local 121 -16.30 -6.45 4.7e-03 Short match with strong conservation. 2ary structure conflicts and structure is not compact. 1au7A/T0098-1au7A-global 146 -50.69 -6.35 5.2e-03 Somewhat broken alignment that does cover two domains that are both DNA binding. 1beo/1beo-T0098-vit 121 -8.55 -6.36 7.4e-03 Short but strong match to a helix-loop-helix region. 10 conserved residues in an unbroken motif alignemnt of 33 residues. 1beo/T0098-1beo-2track-local 98 -17.54 -6.52 7.4e-03 Same as one above. 1bxm/T0098-1bxm-2track-global 99 -1.22 -6.24 7.4e-03 8 conserved residues in a 100 residue alignment that only contains one 3-residue insertion. The secondary structure prediction is quite close. 1bxm is binding some small-molecule substrate that is not DNA, but its structure is reminiscent of the other DNA binding structures here. 1qgiA/T0098-1qgiA-2track-local 259 -22.32 -6.98 7.4e-03 Gaples 55 residue alignment with 6 conserved residues. 2ary structure match is very close. 1qpzA/1qpzA-T0098-fssp-global 121 -0.22 -6.67 7.4e-03 Interesting but probably not the correct template. 1qpzA is a 3 domain structure: DNA binding, "adapter", and other. The alignment is to the adapter piece, even though the "other" domain is sequentially in the middle of the adapter domain. There are some secondary structure conflicts. Worth a quick look. 1unkA/T0098-1unkA-2track-local 87 -20.29 -6.50 7.4e-03 Short three helical piece. Not convincing. 1bxm/1bxm-T0098-local 121 -13.44 -5.48 1.2e-02 1bxm/1bxm-T0098-global 121 -4.19 -5.24 1.6e-02 1beo/1beo-T0098-local 121 -12.93 -5.07 2.0e-02 Fri Aug 11 15:36:56 PDT 2000 Kevin Karplus Using the alignments and fragments found by Rachel's fragfinder, I tried running undertaker. Run rasmol, loading und/try1/t98-opt-scwrl.pdb to see the result. (And have started a new run with more superiterations.) The undertaker-produced model is not quite as compact as I would like to see, but is not much worse than the alignments. None of the alignments, nor the undertaker-constructed model have the predicted long helix, but it may be that the prediction there has merged adjacent helices. 13 August 2000 Kevin Karplus There are three undertaker-produced models from three different runs und/try[123]/t98-opt-scwrl.pdb with very similar scores. These need to be looked at. If they are similar, we could try to submit one of them as our prediction. If they are very different, we should probably not submit. The submission should have a warning that the score function used for selecting the conformation is very new and untested. 13 August 2000 Christian, cont'd. from above I currently favor one of the 1bxm alignments for a prediction. The undertaker-produced models are similar in that they are all made of short helices and are compact. If we were to predict one of them, I would go with 2 or 3 since they are the most compact. 14 August 2000 Kevin Karplus The C-terminus of the undertaker structures (particularly try2) are apparently from the 1bxm alignments, and the N-termini of the undertaker structures are essentially the same in all three tries. I think I'd like to go with und/try2/t98-opt-scwrl.pdb as our prediction. [ON SECOND THOUGHT, und/try4/t98-opt.pdb is very similar but has no clashes, so let's use that.] Here is some text to go with the methods: Our HMM-based searches found no strong hits, so we examined several weak hits including 1nseA, 1bxm, 1au7A, 1di2A, 1a41, 1ebmA, 1qgiA, 1beo, 1unkA, 1qpzA, and 2ezk. None of the alignments were very long---perhaps the best of them was a super-secondary-structure match for the C-terminus to N-terminus of 1bxm: T0098 NTTASRVERAIRHAIEVAWSRGNLESISSLFGYTVSVSKAKPTNSEFIAMVADKLRLEHKAS 1bxm ----ATQQTAAYHTLVSILSDASFNQCSTDSGYSMLTAKALPTTAQYKLMCA---------- Because we did not have a good alignment long enough to do fold recognition, we decided to try our experimental mini-threading program (undertaker). We gave it all the alignments we had found, plus 630 10-residue fragment matches (approximately 5 starting at each residue), plus a generic library of 1-, 2-, 3-, and 4-residue fragments. Conformations were generated by splicing together fragments and evaluated with a (still-untested) score function which includes terms for burial of hydrophobic atoms, exposure of hydrophilics, rotamer preferences, secondary structure preferences (indirectly), clashes, and chain breaks. Three runs of optimization with a genetic algorithm evaluating approx 20,000, 38,000, and 68,000 conformations yielded very similar results. In all cases the "best" conformation was not packed as tightly as a real protein should be, which is probably due to flaws in the score function. We are submitting the result of the 38,000 conformation run, which seems to be the most compact. The C-terminus appears to have been taken from an alignment with 1bxm, and the N-terminus from 1ebmA, but the history feature of undertaker has not been implemented yet, so we can't trace the origins of the conformation very well. We ran the chosen conformation through SCWRL to get more standardized sidechains, and to reduce sidechain conflicts. Reoptimization of that conformation made only slight changes, but did remove the remaining sidechain clashes. Sat Aug 26 15:46:20 PDT 2000 Remade 2track predictions