Wed Dec 22 12:02:05 PST 2010 Kevin Karplus

I got the permissions fixed for the files and directories that Rachel
Bingham left.

The networks she tested did an astonishingly good job:

bit gain (from quality-reports/)			
epochs	arch			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
50	bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640
150	bys-3-11-5-11-7-11-9	0.33736	0.3509		0.33729	0.3522		0.33606	0.3437
250	bys-3-11-5-11-7-11-9	0.37357	0.3759		0.3717	0.3762		0.3639	0.3605
test	bys-3-11-5-11-7-11-9		0.3647			0.3663			0.3665

I'm trying a different architecture with only 3 layers, 20 hidden
nodes on each hidden layer, and a uniform window size of 5:
	5-20-5-20-5
This architecture was one that did well for ../str2uc-near-backbone-11-20_rev/
I'm particularly interested in seeing whether bys continues to do well
wiht that architecture, or if there may be some subtle difference in
predict-2nd (training or testing) that caused Rachel's results to be
anomalously good.


Wed Dec 22 18:23:24 PST 2010 Kevin Karplus

I had to restart the 5-20-5-20-5 run again, since I had messed up the
edits on the empty network.

Thu Dec 23 19:06:31 PST 2010 Kevin Karplus

bit gain (from quality-reports/)			
epochs	arch			tr12avg	tr12best	tr23avg	tr23best	tr31avg	tr31best
50	bys-3-11-5-11-7-11-9	0.21049	0.2630		0.20551	0.2719		0.20105	0.2640
150	bys-3-11-5-11-7-11-9	0.33736	0.3509		0.33729	0.3522		0.33606	0.3437
250	bys-3-11-5-11-7-11-9	0.37357	0.3759		0.3717	0.3762		0.3639	0.3605
test	bys-3-11-5-11-7-11-9		0.3647			0.3663			0.3665

50	bys-5-20-5-20-5		0.23355	0.2789		0.23424	0.2752		0.23221	0.2656

It looks like the 5-20-5-20-5 architecture may well be better,so I
should do the next step.  This indicates that Rachel was not just
lucky: the Bystroff alphabet seems to be a good one for protein design.


Sat Dec 25 20:25:28 PST 2010 Kevin Karplus

50	bys-5-20-5-20-5		0.23355	0.2789		0.23424	0.2752		0.23221	0.2656
150	bys-5-20-5-20-5		0.33455	0.3466		0.33488	0.3443		0.3338	0.3387


Hmm, the 150 training is not quite as good for the 5-20-5-20-5
architecture as for 3-11-5-11-7-11-9 architecture, despite the
training at 50 epochs having been substantially better.

Sun Dec 26 09:20:59 PST 2010 Kevin Karplus

50	bys-5-20-5-20-5		0.23355	0.2789		0.23424	0.2752		0.23221	0.2656
150	bys-5-20-5-20-5		0.33455	0.3466		0.33488	0.3443		0.3338	0.3387
250	bys-5-20-5-20-5		0.36823	0.3706		0.36423	0.3671		0.3619	0.3650
test	bys-5-20-5-20-5			0.3622			0.3585			0.3639

sequence recovery
test	bys-5-20-5-20-5			19.33%			19.23%			19.39%
test	bys-3-11-5-11-7-11-9		19.28%			19.41%			19.58%

On the training data, the 5-20-5-20-5 network does about 0.005 bits
worse than the 3-11-5-11-7-11-9 network, though the 5-20-5-20-5
network has 5100 weights, and the 3-11-5-11-7-11-9 network has 3795.

On the test data, the avg bits are 0.36583 for the smaller networks
and 0.3615 for the larger, about the same difference as on the training data.
The sequence recovery averages 19.423% (3-11-5-11-7-11-9) and 19.32%
(5-20-5-20-5), showing the a similar small difference.

If I look just at the mutual information between AA and bys (in ../../compare-real/comparisions/)
    dunbrack-30pc-1763.aa-bys.compare	Mutual information = 0.25955 bits
    old/dunbrack-50pc-2621.aa-bys.compare Mutual information = 0.251873 bits
    dunbrack-in-scop.aa-bys.compare	Mutual information = 0.239064 bits
So the networks are gaining a little on just using the single
character bys code at the position of the resitu, but not a lot.

Bys seems to be the most informative alphabet about AA at the single-character level.
The next highest that have been tested so far in the comparisons are 
    dunbrack-in-scop.r_rot_psi_omega_14_mod-aa.compare:68:Mutual information = 0.240437 bits
    old/dunbrack-in-scop.aa-burial-6.5-9.compare:86:Mutual information = 0.240267 bits
    dunbrack-in-scop.aa-phizetagauss7_hbond_21.compare:86:Mutual information = 0.229149 bits
    dunbrack-in-scop.aa-phizetagauss10.compare:86:Mutual information = 0.228915 bits

The "burial-6.5-9" is an old alphabet that has not been looked at in
quite a while.  The other
alphabets are ones that Grant created, but they were not his
favorites.  Like bys, they mainly look at the backbone conformation
for the current residue, though phizetagauss7_hbond_21 also includes
hydrogen-bonding info.  Some of the newer alphabets have not yet been
tested in compare-real.

More on choosing burial alphabets for design in ../near-backbone-11_rev/README