Tue Nov 16 10:45:57 PST 2010 Kevin Karplus This directory attempts to design amino acid sequences based on one of Grant Thiltgen's best combined alphabets: str2uc-near-backbone-11-20 which combines a backbone measure (str2uc) and a burial measure (near-backbone-11) into a 20-letter alphabet. My first attempt uses a symmetric network structure, with input window output 20 5 15 15 7 15 15 7 15 15 5 20 which has 6150 weights though I misnamed it 5-15-7-13-7-15-5 (I had *intended* to make the middle layer 13, not 15, units.) In preliminary results, this structure does not seem to be doing as well as the bys_rev training, which used the structure input window output 11 3 11 11 5 11 11 7 11 11 9 20 which has 3795 weights I wonder if I should kill the job now (it probably has about another 10 CPU days to run to get to the next breaking point) and start over with a properly named network that has gradually increasing window width. Of course, since my networks have twices as many parameters as the bystroff input ones, it may just be that learning is slower, and the final result will be better. Tue Nov 16 11:21:24 PST 2010 Kevin Karplus I will kill the jobs and restart them with the intended network shape, saving the results so far in misnamed-training. Tue Nov 16 11:34:49 PST 2010 Kevin Karplus Job restarted with input window output 20 5 15 15 7 13 13 7 15 15 5 20 which has 5730 weights. Tue Nov 16 12:11:45 PST 2010 Kevin Karplus If I could get 30 CPUs working at once (unlikely, since the web server will sweep up most of the CPUs), it would take about 14.5 hours to do the first round of training 300 networks. It is likely to take a couple of days instead, as I've only gotten 4 CPUs so far in 40 minutes. When I've finished training this round of networks, I should probably experiment a bit with different shape networks of roughly similar size For example, 20 3 14 14 5 14 14 7 14 14 9 20 has 5712 weights, and gradually widening windows 20 9 14 14 7 14 14 5 14 14 3 20 has 5712 weights, and gradually narrowing windows 20 3 20 20 5 20 20 7 20 has 6000 weights and gradually widening windows with no bottleneck of fewer hidden units than the inputs or outputs. 20 7 20 20 5 20 20 3 20 has 6000 weights and gradually narrowing windows with no bottleneck of fewer hidden units than the inputs or outputs. Thu Nov 18 09:33:45 PST 2010 Kevin Karplus The first round of trainign with the 5-15-7-13-7-15-5 networks is done. The average and best values are avg best tr12 0.111521 0.2067 tr23 0.10887 0.1954 tr31 0.110366 0.2036 These are not as good as the bystroff networks (which averaged 0.210485, 0.205511, and 0.201046), but they show some potential. I'll try initial training with 7-20-5-20-3 networks and see how that goes. If it is no worse, I'll try 3-20-5-20-7. If either of those is good, I might also redo bys with the good architecture. Thu Nov 18 20:26:22 PST 2010 Kevin Karplus The problem does seem to be the network structure as the 7-20-5-20-3 networks are doing much better than the 5-15-7-13-7-15-5 ones. I wonder if I should try 9-20-5, 7-25-5, or 7-30-3 also. Fri Nov 19 06:32:28 PST 2010 Kevin Karplus With about half the first-round training done the best result on 7-20-5-20-3 so far is net043/logs/str2uc-near-backbone-11-20-Gstr2uc_near_backbone_11_20-7-20-5-20-3-mult50-TR31.log:Epoch: 50 3.97199 cross_bits=3.97071 objective=0.416769 new best which compares with bys_rev's best first-round training result for the bys alphabet: net84/logs/bys-Gbys-3-11-5-11-7-11-9-mult250-TR23.log:Epoch: 50 3.81188 cross_bits=3.81158 objective=0.641748 new best So while the 3-layer architecture is doing better than the 4-layer one I started with, it is still not doing as well as the bys alphabet. Is the difference from the alphabet choice, from wider windows, from using increasing vs. decreasing window widths with layer? Maybe I should try some 2-layer designs to try to tease that out: 7-30-3, 11-20-3, 3-20-11, 3-30-7 Fri Nov 19 23:45:35 PST 2010 Kevin Karplus The 7-30-3 architecture is doing terribly, but the 7-20-5-20-3 architecture looks ok. Maybe I should try 3-20-5-20-7 next, instead of trying to find a 2-layer architecture that works. There is one network (at least) missing from the 7-20-5-20-3 list: Can't open tr31 training file net051/quality-reports/dunbrack-30pc-1763-str2uc-near-backbone-11-20-Gstr2uc_near_backbone_11_20-7-20-5-20-3-tr31-aa-mult50-from-empty.train for net051 I wonder if the Makefile has any easy way to do targeted repair: redoing just the missing files. Mon Nov 22 12:17:55 PST 2010 Kevin Karplus It doesn't look like there is huge difference between the different 3 and 4 layer architectures, but the 2-layer one failed miserably. bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 It does look like wide-early is better than wide-late, but the bystroff alphabet did better, with the best of the 7-20-5-20-3 networks only just beating the average for the bys alphabet. Mon Nov 22 15:44:25 PST 2010 Kevin Karplus I wonder if I should go for a wider initial window: 9-20-3-20-3 or 9-17-5-20-3 or 9-17-7-17-3 Thu Dec 2 15:36:21 PST 2010 Kevin Karplus 9-17-5-20-3 seems to be worse: bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 9-17-5-20-3 0.12181 0.1619 0.12046 0.1620 0.12102 0.1618 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 It looks like more hidden units is better than wider windows so let me try 5-24-5-20-3 or 5-30-3-20-3 Fri Dec 3 11:52:06 PST 2010 Kevin Karplus Hmm, 5-30-3-20-3 seems to be worse: bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 5-30-3-20-3 0.11576 0.1754 0.11483 0.1824 0.10934 0.1798 9-17-5-20-3 0.12181 0.1619 0.12046 0.1620 0.12102 0.1618 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 Is uniformly 20-wide a sweet spot or would making the network bigger help. Let me try 7-30-5-30-3 Wed Dec 15 15:03:01 PST 2010 Kevin Karplus bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 5-30-3-20-3 0.11576 0.1754 0.11483 0.1824 0.10934 0.1798 7-30-5-30-3 0.10937 0.1678 0.10654 0.1673 0.11016 0.1677 9-17-5-20-3 0.12181 0.1619 0.12046 0.1620 0.12102 0.1618 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 It looks like increasing the number of hidden units does NOT help, at least in after very short training runs. Maybe I should try 9-20-5-20-3 also. Thu Dec 16 10:56:13 PST 2010 Kevin Karplus bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 5-30-3-20-3 0.11576 0.1754 0.11483 0.1824 0.10934 0.1798 7-30-5-30-3 0.10937 0.1678 0.10654 0.1673 0.11016 0.1677 9-17-5-20-3 0.12181 0.1619 0.12046 0.1620 0.12102 0.1618 9-20-5-20-3 0.11536 0.1587 0.11430 0.1521 0.11719 0.1690 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 The 9-20-5-20-3 has lower average than some of the others and a much lower best. Mabe I should try 5-20-5-20-5 Sat Dec 18 07:49:25 PST 2010 Kevin Karplus 5-20-5-20-5 has the best average so far, and very nearly the best best, though still nowhere near as good as the bys alphabet was after 50 epochs. bit gain after 50 epochs (from quality-reports/) tr12avg tr12best tr23avg tr23best tr31avg tr31best 5-20-5-20-5 0.16974 0.2155 0.16786 0.2007 0.17164 0.2064 7-20-5-20-3 0.16936 0.2156 0.16684 0.2069 0.1665 0.2079 3-20-5-20-7 0.15297 0.2060 0.15356 0.2027 0.14861 0.1990 5-15-7-13-7-15-5 0.11152 0.2067 0.10887 0.1954 0.11037 0.2036 5-30-3-20-3 0.11576 0.1754 0.11483 0.1824 0.10934 0.1798 7-30-5-30-3 0.10937 0.1678 0.10654 0.1673 0.11016 0.1677 9-17-5-20-3 0.12181 0.1619 0.12046 0.1620 0.12102 0.1618 9-20-5-20-3 0.11536 0.1587 0.11430 0.1521 0.11719 0.1690 7-30-3 -0.1054 0.0149 -0.1157 0.0145 -0.1225 -0.0032 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 I have a choice now of doing the next stage of training for 5-20-5-20-5 or of doing more searches for an architecture. I think I'll start the next level of training, just to see if further training helps the networks improve a lot. If it does, I might try a bigger network, which would be slower to train, but might come out better in the end. Sat Dec 18 15:44:11 PST 2010 Kevin Karplus bit gain (from quality-reports/) epochs arch tr12avg tr12best tr23avg tr23best tr31avg tr31best 50 5-20-5-20-5 0.16974 0.2155 0.16786 0.2007 0.17164 0.2064 150 5-20-5-20-5 0.28647 0.2937 0.2799 0.2860 0.28202 0.2891 50 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 150 bys-3-11-5-11-7-11-9 0.33736 0.3509 0.33729 0.3522 0.33606 0.3437 250 bys-3-11-5-11-7-11-9 0.37357 0.3759 0.3717 0.3762 0.3639 0.3605 The advantage for bys (of about 0.04 bits avg, 0.06 bits best) has not changed much with the additional training---in fact the gap has increased a little for average bit gain. The bys training leveled out on the next 250 epochs, so I'll see if the str2uc-near-backbone-11-20 training catches up. Sat Dec 18 19:48:15 PST 2010 Kevin Karplus bit gain (from quality-reports/) epochs arch tr12avg tr12best tr23avg tr23best tr31avg tr31best 50 5-20-5-20-5 0.16974 0.2155 0.16786 0.2007 0.17164 0.2064 150 5-20-5-20-5 0.28647 0.2937 0.2799 0.2860 0.28202 0.2891 250 5-20-5-20-5 0.31713 0.3178 0.31117 0.3114 0.3123 0.3175 test 5-20-5-20-5 0.3072 0.3075 0.3123 50 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 150 bys-3-11-5-11-7-11-9 0.33736 0.3509 0.33729 0.3522 0.33606 0.3437 250 bys-3-11-5-11-7-11-9 0.37357 0.3759 0.3717 0.3762 0.3639 0.3605 test bys-3-11-5-11-7-11-9 0.3647 0.3663 0.3665 50 nb11-3-11-5-11-7-11-9 0.1285 0.2118 0.13003 0.2147 0.12271 0.2136 150 nb11-3-11-5-11-7-11-9 0.24283 0.2478 0.24245 0.2512 0.24144 0.2478 250 nb11-3-11-5-11-7-11-9 0.25907 0.2605 0.26177 0.2622 0.25847 0.2611 test nb11-3-11-5-11-7-11-9 0.2555 0.2569 0.2539 50 str2-3-13-5-13-7-13-9 0.09692 0.1536 0.09414 0.1485 0.09189 0.1432 150 str2-3-13-5-13-7-13-9 0.20104 0.2092 0.2003 0.2079 0.2007 0.2070 250 str2-3-13-5-13-7-13-9 0.22793 0.2291 0.22697 0.2307 0.2277 0.2299 test str2-3-13-5-13-7-13-9 0.2209 0.2244 0.2250 50 dssp-3-8-5-8-5-8-7 0.14064 0.1677 0.1399 0.1597 0.13785 0.1567 150 dssp-3-8-5-8-5-8-7 0.19263 0.1965 0.19206 0.1966 0.19065 0.1961 250 dssp-3-8-5-8-5-8-7 0.20693 0.2084 0.207 0.2093 0.20757 0.2098 test dssp-3-8-5-8-5-8-7 0.1993 0.2048 0.2089 There is still a 0.06 advantage in average or best bits saved for bys, but the current alphabet is doing better than the network trained on just str2 (though how much of that is alphabet and how much is network architecture is not clear). If I don't get predict-2nd modified to take multiple alphabets as input, I wonder whether there is any advantage to averaging the outputs of several networks, either in probability space (which would tend to blur things) or in bit-gain space (log p_network/prob_background) which would tend to sharpen things. This would require outputting the full distribution in an rdb file, which I've been planning to do for evaluating different sequence recovery metrics anyway. Mon Dec 20 17:24:35 PST 2010 Kevin Karplus I'm going to try a narrower window, still with about 6000 parameters to learn: 3-29-3-29-3 My gut feeling is that this will do worse than 5-20-5-20-5, but my gut feelings have not been too good on predicting what will work and what won't. I've also started predictions for the pb alphabet (in ../pb_rev) and want to redo the bys ones with a different architecture, since they seem too good to be true. Wed Dec 22 11:52:08 PST 2010 Kevin Karplus bit gain (from quality-reports/) epochs arch tr12avg tr12best tr23avg tr23best tr31avg tr31best 50 5-20-5-20-5 0.16974 0.2155 0.16786 0.2007 0.17164 0.2064 150 5-20-5-20-5 0.28647 0.2937 0.2799 0.2860 0.28202 0.2891 250 5-20-5-20-5 0.31713 0.3178 0.31117 0.3114 0.3123 0.3175 test 5-20-5-20-5 0.3072 0.3075 0.3123 50 3-29-3-29-3 0.06209 0.1285 0.06321 0.1221 0.05884 0.1206 OK, 3-29-3-29-3 is definitely worse than 5-20-5-20-5, so there is no need to explore that further. But should I look at 7-20-7-20-7 (a much larger network)? Or should I go through 2 intermediate checks (7-20-5-20-5 and 7-20-7-20-5)? Sun Dec 26 09:58:02 PST 2010 Kevin Karplus sequence recovery test bys-3-11-5-11-7-11-9 19.28% 19.41% 19.58% test bys-5-20-5-20-5 19.33% 19.23% 19.39% test 5-20-5-20-5 16.73% 16.68% 16.96% test str2-3-13-5-13-7-13-9 15.85% 16.06% 16.02% test dssp-3-8-5-8-5-8-7 15.55% 15.79% 15.84% test nb11-3-11-5-11-7-11-9 14.58% 14.77% 14.68% I suspect that I may get the best results by trying to combine different profiles from different networks. But do I want to average profiles (blurring the signal) or multiply them (sharpening the signal, but perhaps overdoing it, as some of the networks will produce very similar profiles). Perhaps the thing do to is to take a linear combination of the profiles from a bunch of different networks and optimize the weights to maximize sequence recovery. (Sounds like another neural net to me.) Sun Dec 26 15:47:39 PST 2010 Kevin Karplus Sequence recovery test pb-5-20-5-20-5 16.75% 16.94% 16.91% pb does just a little worse than str2uc-near-backbone-11-20 at sequence recovery.