20 July 1998 The 4hb1.target98-dssp.a2m file can't be made---target98 is failing when constructing it (the initial model can't even recognize itself). Of course, 4hb1 doesn't belong in FSSP, since it is wholly synthetic "de novo design". The phdset alignments still use target98 and hssp.2d, since the set has not been re-created with target98-dssp alignments. 1cm4A has totally incorrect DSSP strings, probably because it contains multiple chains superimposed on each other. 3mra is hard to predict because it is a transmembrane helix, isolated and crystalized in a non-polar solvent. 26 July 1998 Kevin Karplus I removed 1cm4A from the training set, by manually removing it from fssp-3-5-98-5.ids. 15 Nov 1999 Kevin Karplus Difficult to predict (and so potentially mis-labeled) chains from t99 dataset: bits Q3 SOV SOV(E) SOV(H) SOV(L) reason for problem 1by0 3.9512 0.0963 0.0575 0.0000 1.0000 0.0579 multiple models without proper separator 1cm4B 2.5838 0.3333 0.1722 0.0000 0.4000 0.7500 4 conformations with 0.25 occupancy 3ezmA 1.7383 0.3564 0.3642 0.2931 0.5500 0.4330 ? non-compact structure 3monB 1.5467 0.4200 0.4121 0.2657 0.2941 0.7569 strand part of sheet with other chains 1qqp2 2.3793 0.4213 0.3868 0.3030 0.2521 0.4917 ? 1aa0 1.8788 0.4425 0.5151 1.0000 0.2539 0.9464 ? 1rip 1.8242 0.4444 0.2046 0.3095 1.0000 0.2028 unusual knotted structure (real?) 1molA 1.5984 0.4468 0.4861 0.4800 0.2941 0.7392 dimerization interface 1mctI 1.4698 0.4643 0.3302 0.0000 0.0000 0.4490 ? 1pij 1.5862 0.4658 0.3624 0.1667 0.2019 0.4364 ? 1sso 1.4752 0.4677 0.4188 0.4422 0.6000 0.3924 ? hairpins not quite seen by DSSP 1lpbA 1.4907 0.4706 0.3233 0.3509 0.0000 0.3513 ? 1octC 1.7599 0.4733 0.3404 1.0000 0.3580 0.3021 ? two chains, not properly separated 1ytfC 1.4272 0.4783 0.4814 0.4654 1.0000 0.5833 dimerization interface 1gatA 1.3447 0.4833 0.3567 0.0000 0.2564 0.4162 ? 1pfsA 1.6772 0.4872 0.5883 0.5572 1.0000 0.6409 ? dimerization interface 1ltsC 1.4561 0.4878 0.5330 1.0000 0.5588 0.4075 ? 1xtcC 1.3631 0.4889 0.2373 1.0000 0.2361 0.2708 ? same as 1ltsC 1cfh 1.5314 0.4894 0.1856 1.0000 1.0000 0.1656 ? 1gps 1.3681 0.4894 0.3732 0.3529 0.3000 0.4091 ? 2mev4 1.8402 0.5000 0.6970 1.0000 0.8571 0.6852 2mprA 1.7077 0.5131 0.4957 0.3825 0.0000 0.7237 1efm 1.9456 0.5190 0.2968 0.5143 0.7441 0.2424 1erd 1.8663 0.5250 0.5177 1.0000 0.3631 0.6722 2erl 1.9448 0.5250 0.3971 1.0000 0.3082 0.6294 1qa7B 1.7487 0.5411 0.4928 0.4082 0.3956 0.6007 1erp 1.7351 0.5526 0.4693 1.0000 0.3333 0.6562 1lghA 2.2812 0.5714 0.5753 1.0000 0.5932 0.4435 1ab3 1.7190 0.5795 0.5892 1.0000 0.5525 0.6101 1kzuA 2.0247 0.6327 0.7973 1.0000 0.8378 0.6722 23 Nov 1999 Difficult to predict in dunbrack-395 data set: bits Q3 SOV objective SOV(E) SOV(H) SOV(L) 2erl 2.0972 0.5000 0.3809 1.1306 1.0000 0.2941 0.6077 helices joined by disulphides 3ezmA 1.7854 0.4455 0.4628 0.8377 0.5000 0.2200 0.4709 ? not compact 1molA 1.6171 0.4468 0.4235 0.6951 0.3949 0.1765 0.8037 ? dimerization interface 1kveA 1.6963 0.5079 0.4323 0.6826 0.6000 0.6923 0.2467 ? edge strands of sheet (middle is 1kveB) 1mctI 1.4668 0.5714 0.2645 0.4817 0.0000 0.0000 0.3897 lots of disulphide bridges 1aho 1.3976 0.5156 0.4355 0.3709 0.7460 0.0000 0.3988 lots of disulphide bridges 1vfyA 1.3918 0.5522 0.5647 0.2234 0.2500 0.5556 0.7222 lots of disulphide bridges 1b0y 1.2581 0.5882 0.3040 0.2218 0.2500 0.4453 0.2765 ? 1vqb 1.3734 0.5814 0.5059 0.2053 0.5120 0.0000 0.5756 ? helix interrupting beta hairpin? 9wgaA 1.2322 0.6059 0.3236 0.1575 0.0000 0.1122 0.4190 lots of disulphide bridges 2sn3 1.1879 0.5385 0.4035 0.1516 0.3881 0.0000 0.4467 lots of disulphide bridges 1arb 1.3508 0.6008 0.5191 0.1464 0.5482 0.1385 0.6079 ? 6rlxB 1.4297 0.5926 0.6883 0.1183 0.0000 0.8676 0.4792 peptide joined to others with disulphides 1g3p 1.2666 0.6406 0.3785 0.1048 0.2623 0.3846 0.4960 ? 3vub 1.2816 0.5545 0.5904 0.0921 0.4842 0.7273 0.6106 ? 2por 1.3496 0.6113 0.5944 0.0777 0.6201 0.2100 0.6142 porin (membrane protein) 2a0b 1.2903 0.5847 0.5731 0.0704 1.0000 0.5659 0.6223 ? bent helices 6rlxA 1.3323 0.5833 0.6404 0.0674 0.0000 0.8304 0.3472 disulphides to other chains 1koe 1.2532 0.6105 0.4736 0.0670 0.4110 0.6293 0.4552 ? many 1-long strands 1tca 1.2426 0.6088 0.4841 0.0513 0.4444 0.3858 0.5430 ? 1kp6A 1.2878 0.5949 0.5875 0.0436 0.6000 0.5556 0.6037 lots of disulphide bridges 1fglB 1.3155 0.6667 0.4931 0.0370 1.0000 0.0000 0.7396 short peptide--DSSP helix not really there 1cv8 1.2782 0.5954 0.6169 0.0129 0.6250 0.4681 0.6893 ? 25 Nov 1999 Hard-to-predict chains in full t99 Using t99-1984-IDaa13-9-6-11-9-5-8-7-ehl-seeded-trained ID bits Q3 SOV object SOV(E) SOV(H) SOV(L) 1by0 2.5216 0.2593 0.1852 2.0290 1.0000 1.0000 0.1852 DSSP wrong, multiple models 1spf 3.0118 0.5714 0.4295 1.9111 1.0000 0.3154 1.0000 lipoprotein---valine-rich helix 1ba4 2.3077 0.3000 0.4100 1.6007 1.0000 0.0000 0.9111 membrane protein 1psm 1.8694 0.3158 0.1496 1.3226 1.0000 0.1873 0.1379 crystallographers say helix, DSSP says turn 1bl1 2.2619 0.5161 0.5753 1.1366 1.0000 0.5217 0.7292 ? 1lghA 2.3258 0.5536 0.5965 1.1333 1.0000 0.6201 0.4226 * 2erl 2.0972 0.5000 0.3809 1.1306 1.0000 0.2941 0.6077 helices joined by disulphides 1rip 1.7982 0.4321 0.1963 1.0559 0.3095 1.0000 0.1943 unusual knot 1fzgD 1.9253 0.4528 0.3422 1.0518 1.0000 0.3263 0.3664 1efm 1.9611 0.5253 0.3026 1.0138 0.5143 0.7441 0.2490 * 1bm4 1.8211 0.5486 0.2412 0.8842 0.0000 1.0000 0.2559 1nubA 1.7518 0.4912 0.2854 0.8645 0.4720 0.0970 0.4761 2hipA 1.7936 0.5070 0.3239 0.8570 0.1667 0.2000 0.3650 1aml 1.8825 0.4500 0.5650 0.8570 1.0000 0.0000 0.9040 3ezmA 1.7854 0.4455 0.4628 0.8377 0.5000 0.2200 0.4709 ? not compact 1pij 1.7125 0.4521 0.3564 0.8302 0.1667 0.1731 0.4340 ? 1ba6 1.7307 0.4250 0.4421 0.8263 1.0000 0.0000 0.5526 1aa0 1.7128 0.4779 0.3195 0.8201 1.0000 0.1347 0.9474 ? 1erp 1.8077 0.5000 0.4349 0.8033 1.0000 0.2967 0.6250 * 1hvc 1.9300 0.5665 0.5150 0.7764 0.4074 0.6364 0.6430 3monB 1.5329 0.4000 0.3164 0.7514 0.2004 0.1765 0.7611 strand part of sheet with other chains 1cfh 1.5329 0.4681 0.1822 0.7500 1.0000 0.8571 0.1656 ? 1molA 1.6171 0.4468 0.4235 0.6951 0.3949 0.1765 0.8037 ? dimerization interface 1kveA 1.6963 0.5079 0.4323 0.6826 0.6000 0.6923 0.2467 ? edge strands (middle is 1kveB) 1qa7B 1.7535 0.5274 0.4897 0.6724 0.4359 0.3736 0.5739 * 1erd 1.7120 0.5000 0.5136 0.6525 1.0000 0.2917 0.8466 * 1bnx 1.8603 0.5758 0.6017 0.6331 1.0000 0.3684 0.9184 1lpbA 1.5649 0.4941 0.3808 0.6066 0.5088 0.0000 0.3925 ? 1b9wA 1.6107 0.5281 0.3864 0.6009 0.3167 0.0000 0.5080 * 2mev4 1.7915 0.5000 0.7069 0.5967 1.0000 1.0000 0.6852 5 Feb 2000 Current best training set for building a neural net: overrep-2260.ids For train/test fssp-1929.ids is split into fssp-1929-1.ids and fssp-199-2.ids sequences columns fssp-1929-1 965 210699 fssp-1929-2 964 215797 fssp-1929 1929 426496 overrep-2260 2260 497210 overrep-2260 consists of fssp-1929 plus a number of high-resolution X-ray files, culled by Dunbrack. 11 April 2002 Kevin Karplus stride-bad.ids consists of the 107 sequences that have the worst disagreements between STRIDE and DSSP labeling (bits/residue<=0.75) using the program ~karplus/dna/predict-2d/compare-real/compare-real-report-scores out of the current set of 5844 templates (pcem/indexes/t2k.ids) Here are some intersections of stride-bad.ids with tough sets: tough.ids: 1fglB 1kveA 1ltsC 3monB 6rlxA tough2.ids: 1fglB 1kveA 1opp tough3.ids 1fglB 1kveA 1gw4 1qjkA tough4.ids: 1ryp1 1ryp2 2ltnB tough5.ids: 1a1rC 1aw8A 1axcB 1cq4B 1dwnA 1edxA 1ek1A 1fh1A 1g65K 1lenB 1opp 1ryp1 1ryp2 1rypH 1rypI 1rypJ 1rypK 1rypL 1tbaA 1vppX 2ltnB ../stride/t2k-hard-stride.ids: 1a1rC 1bh1 1bomA 1ckwA 1cq4B 1edxA 1g65K 1h7dA 1hymB 1jeiA 1kkoA 1lenB 1qgeE 1rfbA 1ryp2 1rypH 1rypI 1rypJ 1rypK 1rypL 1tahB 2ltnB 3monB Note that of the 107 biggest disagreements with DSSP and 110 hardest to predict sequences, there are 23 sequences in common.