Wed Dec 22 12:02:05 PST 2010 Kevin Karplus I got the permissions fixed for the files and directories that Rachel Bingham left. The networks she tested did an astonishingly good job: bit gain (from quality-reports/) epochs arch tr12avg tr12best tr23avg tr23best tr31avg tr31best 50 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 150 bys-3-11-5-11-7-11-9 0.33736 0.3509 0.33729 0.3522 0.33606 0.3437 250 bys-3-11-5-11-7-11-9 0.37357 0.3759 0.3717 0.3762 0.3639 0.3605 test bys-3-11-5-11-7-11-9 0.3647 0.3663 0.3665 I'm trying a different architecture with only 3 layers, 20 hidden nodes on each hidden layer, and a uniform window size of 5: 5-20-5-20-5 This architecture was one that did well for ../str2uc-near-backbone-11-20_rev/ I'm particularly interested in seeing whether bys continues to do well wiht that architecture, or if there may be some subtle difference in predict-2nd (training or testing) that caused Rachel's results to be anomalously good. Wed Dec 22 18:23:24 PST 2010 Kevin Karplus I had to restart the 5-20-5-20-5 run again, since I had messed up the edits on the empty network. Thu Dec 23 19:06:31 PST 2010 Kevin Karplus bit gain (from quality-reports/) epochs arch tr12avg tr12best tr23avg tr23best tr31avg tr31best 50 bys-3-11-5-11-7-11-9 0.21049 0.2630 0.20551 0.2719 0.20105 0.2640 150 bys-3-11-5-11-7-11-9 0.33736 0.3509 0.33729 0.3522 0.33606 0.3437 250 bys-3-11-5-11-7-11-9 0.37357 0.3759 0.3717 0.3762 0.3639 0.3605 test bys-3-11-5-11-7-11-9 0.3647 0.3663 0.3665 50 bys-5-20-5-20-5 0.23355 0.2789 0.23424 0.2752 0.23221 0.2656 It looks like the 5-20-5-20-5 architecture may well be better,so I should do the next step. This indicates that Rachel was not just lucky: the Bystroff alphabet seems to be a good one for protein design. Sat Dec 25 20:25:28 PST 2010 Kevin Karplus 50 bys-5-20-5-20-5 0.23355 0.2789 0.23424 0.2752 0.23221 0.2656 150 bys-5-20-5-20-5 0.33455 0.3466 0.33488 0.3443 0.3338 0.3387 Hmm, the 150 training is not quite as good for the 5-20-5-20-5 architecture as for 3-11-5-11-7-11-9 architecture, despite the training at 50 epochs having been substantially better. Sun Dec 26 09:20:59 PST 2010 Kevin Karplus 50 bys-5-20-5-20-5 0.23355 0.2789 0.23424 0.2752 0.23221 0.2656 150 bys-5-20-5-20-5 0.33455 0.3466 0.33488 0.3443 0.3338 0.3387 250 bys-5-20-5-20-5 0.36823 0.3706 0.36423 0.3671 0.3619 0.3650 test bys-5-20-5-20-5 0.3622 0.3585 0.3639 sequence recovery test bys-5-20-5-20-5 19.33% 19.23% 19.39% test bys-3-11-5-11-7-11-9 19.28% 19.41% 19.58% On the training data, the 5-20-5-20-5 network does about 0.005 bits worse than the 3-11-5-11-7-11-9 network, though the 5-20-5-20-5 network has 5100 weights, and the 3-11-5-11-7-11-9 network has 3795. On the test data, the avg bits are 0.36583 for the smaller networks and 0.3615 for the larger, about the same difference as on the training data. The sequence recovery averages 19.423% (3-11-5-11-7-11-9) and 19.32% (5-20-5-20-5), showing the a similar small difference. If I look just at the mutual information between AA and bys (in ../../compare-real/comparisions/) dunbrack-30pc-1763.aa-bys.compare Mutual information = 0.25955 bits old/dunbrack-50pc-2621.aa-bys.compare Mutual information = 0.251873 bits dunbrack-in-scop.aa-bys.compare Mutual information = 0.239064 bits So the networks are gaining a little on just using the single character bys code at the position of the resitu, but not a lot. Bys seems to be the most informative alphabet about AA at the single-character level. The next highest that have been tested so far in the comparisons are dunbrack-in-scop.r_rot_psi_omega_14_mod-aa.compare:68:Mutual information = 0.240437 bits old/dunbrack-in-scop.aa-burial-6.5-9.compare:86:Mutual information = 0.240267 bits dunbrack-in-scop.aa-phizetagauss7_hbond_21.compare:86:Mutual information = 0.229149 bits dunbrack-in-scop.aa-phizetagauss10.compare:86:Mutual information = 0.228915 bits The "burial-6.5-9" is an old alphabet that has not been looked at in quite a while. The other alphabets are ones that Grant created, but they were not his favorites. Like bys, they mainly look at the backbone conformation for the current residue, though phizetagauss7_hbond_21 also includes hydrogen-bonding info. Some of the newer alphabets have not yet been tested in compare-real. More on choosing burial alphabets for design in ../near-backbone-11_rev/README