Tue Nov 16 10:45:57 PST 2010 Kevin Karplus

This directory attempts to design amino acid sequences based on one of
Grant Thiltgen's best combined alphabets:
	str2uc-near-backbone-11-20
which combines a backbone measure (str2uc) and a burial measure
(near-backbone-11) into a 20-letter alphabet.

My first attempt uses a symmetric network structure, with

input	window	output
20	5	15
15	7	15
15	7	15
15	5	20

which has 6150 weights

though I misnamed it 5-15-7-13-7-15-5
(I had *intended* to make the middle layer 13, not 15, units.)

In preliminary results, this structure does not seem to be doing as
well as the bys_rev training, which used the structure

input	window	output
11	3	11
11	5	11
11	7	11
11	9	20

which has 3795 weights

I wonder if I should kill the job now (it probably has about another
10 CPU days to run to get to the next breaking point) and start over
with a properly named network that has gradually increasing window
width.   Of course, since my networks have twices as many parameters
as the bystroff input ones, it may just be that learning is slower,
and the final result will be better.

Tue Nov 16 11:21:24 PST 2010 Kevin Karplus

I will kill the jobs and restart them with the intended network shape,
saving the results so far in misnamed-training.

Tue Nov 16 11:34:49 PST 2010 Kevin Karplus

Job restarted with
input	window	output
20	5	15
15	7	13
13	7	15
15	5	20

which has 5730 weights.

Tue Nov 16 12:11:45 PST 2010 Kevin Karplus

If I could get 30 CPUs working at once (unlikely, since the web server
will sweep up most of the CPUs), it would take about 14.5 hours to do
the first round of training 300 networks.  It is likely to take a
couple of days instead, as I've only gotten 4 CPUs so far in 40 minutes.

When I've finished training this round of networks, I should probably
experiment a bit with different shape networks of roughly similar size

For example,
20	3	14
14	5	14
14	7	14
14	9	20
has 5712 weights, and gradually widening windows

20	9	14
14	7	14
14	5	14
14	3	20
has 5712 weights, and gradually narrowing windows


20	3	20
20	5	20
20	7	20
has 6000 weights and gradually widening windows with no bottleneck of
fewer hidden units than the inputs or outputs.

20	7	20
20	5	20
20	3	20
has 6000 weights and gradually narrowing windows with no bottleneck of
fewer hidden units than the inputs or outputs.

Thu Nov 18 09:33:45 PST 2010 Kevin Karplus

The first round of trainign with the 5-15-7-13-7-15-5 networks is done.
The average and best values are

	avg		best
tr12	0.111521	0.2067
tr23	0.10887		0.1954
tr31	0.110366	0.2036

These are not as good as the bystroff networks (which averaged
0.210485, 0.205511, and 0.201046), but they show some potential.

I'll try initial training with  7-20-5-20-3 networks and see how that goes.
If it is no worse, I'll try 3-20-5-20-7.

If either of those is good, I might also redo bys with the good architecture.

Thu Nov 18 20:26:22 PST 2010 Kevin Karplus

The problem does seem to be the network structure as the 7-20-5-20-3
networks are doing much better than the 5-15-7-13-7-15-5 ones.  I
wonder if I should try 9-20-5, 7-25-5, or 7-30-3 also.

Fri Nov 19 06:32:28 PST 2010 Kevin Karplus

With about half the first-round training done the best result on 7-20-5-20-3 so
far is 
net043/logs/str2uc-near-backbone-11-20-Gstr2uc_near_backbone_11_20-7-20-5-20-3-mult50-TR31.log:Epoch: 50  3.97199 cross_bits=3.97071 objective=0.416769 new best
which compares with bys_rev's best first-round training result for the
bys alphabet:
net84/logs/bys-Gbys-3-11-5-11-7-11-9-mult250-TR23.log:Epoch: 50  3.81188 cross_bits=3.81158 objective=0.641748 new best

So while the 3-layer architecture is doing better than the 4-layer one
I started with, it is still not doing as well as the bys alphabet.
Is the difference from the alphabet choice, from wider windows, from
using increasing vs. decreasing window widths with layer?

Maybe I should try some 2-layer designs to try to tease that out:
	7-30-3, 11-20-3, 3-20-11, 3-30-7

Fri Nov 19 23:45:35 PST 2010 Kevin Karplus

The 7-30-3 architecture is doing terribly, but the 7-20-5-20-3
architecture looks ok.  Maybe I should try 3-20-5-20-7 next, instead
of trying to find a 2-layer architecture that works.


There is one network (at least) missing from the 7-20-5-20-3 list:
Can't open tr31 training file
net051/quality-reports/dunbrack-30pc-1763-str2uc-near-backbone-11-20-Gstr2uc_near_backbone_11_20-7-20-5-20-3-tr31-aa-mult50-from-empty.train
for net051

I wonder if the Makefile has any easy way to do targeted repair:
redoing just the missing files.


Mon Nov 22 12:17:55 PST 2010 Kevin Karplus

It doesn't look like there is huge difference between the different 3
and 4 layer architectures, but the 2-layer one failed miserably.

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

It does look like wide-early is better than wide-late, but the
bystroff alphabet did better, with the best of the 7-20-5-20-3
networks only just beating the average for the bys alphabet.

Mon Nov 22 15:44:25 PST 2010 Kevin Karplus

I wonder if I should go for a wider initial window: 
	9-20-3-20-3 or 9-17-5-20-3 or 9-17-7-17-3

Thu Dec  2 15:36:21 PST 2010 Kevin Karplus

9-17-5-20-3 seems to be worse:

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
9-17-5-20-3		0.12181	0.1619		0.12046	0.1620		0.12102	0.1618
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

It looks like more hidden units is better than wider windows so let me try

5-24-5-20-3 or 5-30-3-20-3 


Fri Dec  3 11:52:06 PST 2010 Kevin Karplus

Hmm, 5-30-3-20-3 seems to be worse:

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
5-30-3-20-3		0.11576	0.1754		0.11483	0.1824		0.10934	0.1798
9-17-5-20-3		0.12181	0.1619		0.12046	0.1620		0.12102	0.1618
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

Is uniformly 20-wide a sweet spot or would making the network bigger help.
Let me try 7-30-5-30-3

Wed Dec 15 15:03:01 PST 2010 Kevin Karplus

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
5-30-3-20-3		0.11576	0.1754		0.11483	0.1824		0.10934	0.1798
7-30-5-30-3		0.10937	0.1678		0.10654	0.1673		0.11016	0.1677
9-17-5-20-3		0.12181	0.1619		0.12046	0.1620		0.12102	0.1618
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

It looks like increasing the number of hidden units does NOT help, at
least in after very short training runs.

Maybe I should try 9-20-5-20-3 also.

Thu Dec 16 10:56:13 PST 2010 Kevin Karplus

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
5-30-3-20-3		0.11576	0.1754		0.11483	0.1824		0.10934	0.1798
7-30-5-30-3		0.10937	0.1678		0.10654	0.1673		0.11016	0.1677
9-17-5-20-3		0.12181	0.1619		0.12046	0.1620		0.12102	0.1618
9-20-5-20-3		0.11536	0.1587		0.11430	0.1521		0.11719	0.1690
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

The 9-20-5-20-3 has lower average than some of the others and a much
lower best.

Mabe I should try 5-20-5-20-5

Sat Dec 18 07:49:25 PST 2010 Kevin Karplus

5-20-5-20-5 has the best average so far, and very nearly the best
best, though still nowhere near as good as the bys alphabet was after
50 epochs.

bit gain after 50 epochs (from quality-reports/)			
			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
5-20-5-20-5		0.16974	0.2155		0.16786	0.2007		0.17164	0.2064
7-20-5-20-3		0.16936	0.2156		0.16684	0.2069		0.1665	0.2079
3-20-5-20-7		0.15297	0.2060		0.15356	0.2027		0.14861	0.1990
5-15-7-13-7-15-5	0.11152	0.2067		0.10887	0.1954		0.11037	0.2036
5-30-3-20-3		0.11576	0.1754		0.11483	0.1824		0.10934	0.1798
7-30-5-30-3		0.10937	0.1678		0.10654	0.1673		0.11016	0.1677
9-17-5-20-3		0.12181	0.1619		0.12046	0.1620		0.12102	0.1618
9-20-5-20-3		0.11536	0.1587		0.11430	0.1521		0.11719	0.1690
7-30-3			-0.1054	0.0149		-0.1157	0.0145		-0.1225	-0.0032

bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640

I have a choice now of doing the next stage of training for
5-20-5-20-5 or of doing more searches for an architecture.  I think
I'll start the next level of training, just to see if further training
helps the networks improve a lot.  If it does, I might try a bigger
network, which would be slower to train, but might come out better in
the end.

Sat Dec 18 15:44:11 PST 2010 Kevin Karplus


bit gain (from quality-reports/)			
epochs	arch			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
50	5-20-5-20-5		0.16974	0.2155		0.16786	0.2007		0.17164	0.2064
150	5-20-5-20-5		0.28647	0.2937		0.2799	0.2860		0.28202	0.2891

50	bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640
150	bys-3-11-5-11-7-11-9	0.33736	0.3509		0.33729	0.3522		0.33606	0.3437
250	bys-3-11-5-11-7-11-9	0.37357	0.3759		0.3717	0.3762		0.3639	0.3605

The advantage for bys (of about 0.04 bits avg, 0.06 bits best) has not
changed much with the additional training---in fact the gap has
increased a little for average bit gain.

The bys training leveled out on the next 250 epochs, so I'll see if
the str2uc-near-backbone-11-20 training catches up.

Sat Dec 18 19:48:15 PST 2010 Kevin Karplus

bit gain (from quality-reports/)			
epochs	arch			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
50	5-20-5-20-5		0.16974	0.2155		0.16786	0.2007		0.17164	0.2064
150	5-20-5-20-5		0.28647	0.2937		0.2799	0.2860		0.28202	0.2891
250	5-20-5-20-5		0.31713	0.3178		0.31117	0.3114		0.3123	0.3175
test	5-20-5-20-5			0.3072			0.3075			0.3123

50	bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640
150	bys-3-11-5-11-7-11-9	0.33736	0.3509		0.33729	0.3522		0.33606	0.3437
250	bys-3-11-5-11-7-11-9	0.37357	0.3759		0.3717	0.3762		0.3639	0.3605
test	bys-3-11-5-11-7-11-9		0.3647			0.3663			0.3665

50	nb11-3-11-5-11-7-11-9	0.1285	0.2118		0.13003	0.2147		0.12271	0.2136
150	nb11-3-11-5-11-7-11-9	0.24283	0.2478		0.24245	0.2512		0.24144	0.2478
250	nb11-3-11-5-11-7-11-9	0.25907	0.2605		0.26177	0.2622		0.25847	0.2611
test	nb11-3-11-5-11-7-11-9		0.2555			0.2569			0.2539

50	str2-3-13-5-13-7-13-9	0.09692	0.1536		0.09414	0.1485		0.09189	0.1432
150	str2-3-13-5-13-7-13-9	0.20104	0.2092		0.2003	0.2079		0.2007	0.2070
250	str2-3-13-5-13-7-13-9	0.22793	0.2291		0.22697	0.2307		0.2277	0.2299
test	str2-3-13-5-13-7-13-9		0.2209			0.2244			0.2250

50	dssp-3-8-5-8-5-8-7	0.14064	0.1677		0.1399	0.1597		0.13785	0.1567
150	dssp-3-8-5-8-5-8-7	0.19263	0.1965		0.19206	0.1966		0.19065	0.1961
250	dssp-3-8-5-8-5-8-7	0.20693	0.2084		0.207	0.2093		0.20757	0.2098
test	dssp-3-8-5-8-5-8-7		0.1993			0.2048			0.2089


There is still a 0.06 advantage in average or best bits saved for bys,
but the current alphabet is doing better than the network trained on
just str2 (though how much of that is alphabet and how much is network
architecture is not clear).

If I don't get predict-2nd modified to take multiple alphabets as
input, I wonder whether there is any advantage to averaging the
outputs of several networks, either in probability space (which would
tend to blur things) or in bit-gain space (log
p_network/prob_background) which would tend to sharpen things.

This would require outputting the full distribution in an rdb file,
which I've been planning to do for evaluating different sequence
recovery metrics anyway.

Mon Dec 20 17:24:35 PST 2010 Kevin Karplus

I'm going to try a narrower window, still with about 6000 parameters
to learn: 3-29-3-29-3
My gut feeling is that this will do worse than 5-20-5-20-5, but my gut
feelings have not been too good on predicting what will work and what won't.
I've also started predictions for the pb alphabet (in ../pb_rev) and
want to redo the bys ones with a different architecture, since they
seem too good to be true.

Wed Dec 22 11:52:08 PST 2010 Kevin Karplus

bit gain (from quality-reports/)			
epochs	arch			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
50	5-20-5-20-5		0.16974	0.2155		0.16786	0.2007		0.17164	0.2064
150	5-20-5-20-5		0.28647	0.2937		0.2799	0.2860		0.28202	0.2891
250	5-20-5-20-5		0.31713	0.3178		0.31117	0.3114		0.3123	0.3175
test	5-20-5-20-5			0.3072			0.3075			0.3123

50	3-29-3-29-3		0.06209	0.1285		0.06321	0.1221		0.05884	0.1206

OK, 3-29-3-29-3 is definitely worse than 5-20-5-20-5, so there is no
need to explore that further.  But should I look at 7-20-7-20-7 (a
much larger network)?  Or should I go through 2 intermediate checks
(7-20-5-20-5 and 7-20-7-20-5)?

Sun Dec 26 09:58:02 PST 2010 Kevin Karplus

sequence recovery
test	bys-3-11-5-11-7-11-9		19.28%			19.41%			19.58%
test	bys-5-20-5-20-5			19.33%			19.23%			19.39%
test	5-20-5-20-5			16.73%			16.68%			16.96%
test	str2-3-13-5-13-7-13-9		15.85%			16.06%			16.02%
test	dssp-3-8-5-8-5-8-7		15.55%			15.79%			15.84%
test	nb11-3-11-5-11-7-11-9		14.58%			14.77%			14.68%

I suspect that I may get the best results by trying to combine
different profiles from different networks.  But do I want to average
profiles (blurring the signal) or multiply them (sharpening the
signal, but perhaps overdoing it, as some of the networks will produce
very similar profiles).  Perhaps the thing do to is to take a linear
combination of the profiles from a bunch of different networks and
optimize the weights to maximize sequence recovery. (Sounds like
another neural net to me.)

Sun Dec 26 15:47:39 PST 2010 Kevin Karplus

Sequence recovery
test	pb-5-20-5-20-5			16.75%			16.94%			16.91%

pb does just a little worse than str2uc-near-backbone-11-20 at
sequence recovery.