30 November 2001 Kevin Karplus The "stride" directory contains networks, scripts, and quality reports for neural nets attempting to predict the secondary structure defined by STRIDE, reduced to a 6-letter alphabet (EBGHTL), from a multiple alignment. Most of the scripts are still set up to refer to an older subdirectory organization (in which the subdirectories of testing/stride/ were just subdirectories of testing/). The first sections are old reports on the quality of the different networks. The stride EBGHTL predictions are currently our most polished set of neural nets. Later sections will update for newer networks. ------------------------------------------------------------ 9 Jan 2001 Updated the quality reports and unit usage. The quality reports now have bits_saved as the third column, allowing comparison between different alphabets. The objective is now something to be maximized, rather than minimized. The unit usage previously had a bug in reporting E(Phat(i)P(j)) / E(P(j)), which has now been fixed. It looks like having a richer alphabet makes for more informative predictions, though the Q measure drops: alphabet bits_saved Q_n SOV(E) SOV(H) EHL2 0.7815 0.7810 0.7337 0.7801 EHTL2 0.8299 0.6842 0.7592 0.7847 EBGHTL 0.9065 0.6667 0.8090 0.8646 It is interesting that splitting L into T and L improves SOV(E) and SOV(H), though their definitions are unchanged. The split of E into EB and H into GH naturally improves the E and H SOV scores, since B and G are the hardest to predict. 23 Jan 2001 Using t2k-thin90 alignments, the best EHL2 network is now overrep-2500-IDaa13-7-10-11-10-11-6-5-ehl2-seeded-stride-trained.net (3419 parameters) which was built by adding an additional layer to the best EBGHTL network and retraining overrep-2500-IDaa13-7-10-11-10-11-ebghtl-seeded-stride-trained.net (3326 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EHL2 0.7980 0.7864 0.7331 0.7813 EBGHTL 0.9232 0.6712 0.8075 0.8642 I will add an extra final layer to the EBGHTL network and see if I can improve the EBGHTL savings some more. 24 Jan 2001 overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded-stride-trained.net (3584 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9360 0.6754 0.8102 0.8673 3 Feb 2001 overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded2-stride-trained.net (3584 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9361 0.6756 0.8118 0.8654 7 Feb 2001 overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded3-stride-trained.net (3584 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9415 0.6765 0.8121 0.8717 30 April 2001 Retrained the ebghtl network on a larger data set (3617 chains, 806942 columns), but the quality of the predictions on the training set were not much different between the initial and final model: overrep-3617-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded-stride-trained.net (3584 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) initial EBGHTL 0.9328 0.6749 0.8121 0.8660 final EBGHTL 0.9325 0.6744 0.8154 0.8654 (Training for 60 iterations made no improvements, then equilibration kicked the network away from the optimum and training for another 180 got it back to about the same quality.) 4 June 2001 Increased size of ebghtl network and trained on 3617 chains. Now get somewhat better network (most of the improvement is in bits saved). overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9449 0.6779 0.8156 0.8657 7 June 2001 overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded2-stride-trained.net is better, but I'm not sure by how much, as the .quality file was not properly written (probably due to the disk filling up). I'll do an evaluation and some retraining to see how good it is. 12 June 2001 overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded2-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9473 0.6781 0.8120 0.8675 overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded3-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9482 0.6785 0.8139 0.8697 23 August 2001 I noticed that the NMR chains in the overrep-3617 set were much worse predicted than the X-ray chains. There are two possible explanations: 1) the x-ray chains are over-represented, and the network is overtrained 2) the NMR chains have a lot of incorrect 2ry structure assigned by STRIDE, because of insufficient data to resolve the structure fully I favor the second explanaton, so I built a new training set from just the Dunbrack culled x-ray sequences. (dunbrack-2752) The first run with this training set (actually with dunbrack-2751, since one of the t2k alignments had run out of memory and not been built yet) improved from overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded3-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9610 0.6814 0.8091 0.8708 dunbrack-2751-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9687 0.6838 0.8125 0.8724 The beta bridges (B) are the hardest thing to predict---when something is really a B it is most likely to be predicted as T,C,E,H,G,B (B is the least likely pediction, and only H has a probability lower than the background). G is also hard, with order THCGEB, and H is the easiest. E gets a bit confused with C and a little less with C, and CT get very confused with each other. The same network, retrained on full dunbrack-2752 doesn't change, and results are almost identical: dunbrack-2752-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9680 0.6836 0.8124 0.8723 15 Nov 2001 Kevin Karplus Trying to use the new dist.20comp regularizer and reducing bits saved to 1.0 does not help. After 120 epochs, the quality for dunbrack-2752-IDaa10-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters) is only alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9571 0.6810 0.8082 0.8659 (not bad, but not as good as the IDaa13 network it was retrained from). It looks like it would take another 1000 epochs to get as good as the IDaa13 network. 16 Nov 2001 Kevin Karplus Eliminating a regularizer, and just using Hennikoff sequence weighting hurts even more. After 120 epocs, the quality for dunbrack-2752-IDaa-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters) is only alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9484 0.6769 0.8087 0.8705 17 Nov 2001 Kevin Karplus New best network for EBGHTL # dunbrack-2752-IDaa13-7-10-11-10-11-7-9-ebghtl-seeded-stride-trained.net (3821 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9697 0.6834 0.8096 0.8693 The main gains seem to be in G and L. Further training would most likely result in futher improvements---this does not seem to have converged yet. It might be worth trying to remove the H2 hidden unit from the penultimate layer, as it does not seem to be doing much. 19 Nov 2001 Kevin Karplus New best network for EBGHTL # dunbrack-2752-IDaa13-7-10-11-10-11-6-11-ebghtl-seeded-stride-trained.net (3728 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9718 0.6847 0.8143 0.8712 This seems close to convergence. To get further improvements, it might be worth adding a 5th layer, perhaps narrowing the windows on the 2nd and 3rd layers. 21 Nov 2001 Kevin Karplus New best network for stride EBGHTL # dunbrack-2752-IDaa13-7-10-9-10-9-6-11-6-11-ebghtl-seeded-stride-trained2.net (3810 parameters) alphabet bits_saved Q_n SOV(E) SOV(H) EBGHTL 0.9719 0.6855 0.8115 0.8708 This one has a slightly better Q6 and slightly worse SOV score than the previous best. 24 Nov 2001 Kevin Karplus I tried removing the pseudocounts from the best network for stride EBGHTL (it only had pseudocounts on the first layer). After 150 epochs, the network had almost, but not quite, recovered from the modification. I will probably take another 300 epochs to recover fully (if it can). 26 Nov 2001 Kevin Karplus New best network (slightly better SOV than before, which overweights slightly worse bits and Q_n) # dunbrack-2752-IDaa13-7-10-11-10-9-6-9-6-9-ebghtl-seeded-stride-trained.net (3866 parameters) alphabet bits_saved Q_n SOV SOV(E) SOV(H) EBGHTL 0.9712 0.6852 0.6867 0.8103 0.8718 Note: this number of paramters is misleading, as this network has all 0 pseudocounts and has been centered. 27 Nov 2001 Kevin Karplus New best network for STRIDE. SOV slips a little, but bits saved and Q6 are new bests. # dunbrack-2752-IDaa13-7-10-11-10-9-6-9-6-9-ebghtl-seeded-stride-trained2.net (3366 degrees of freedom) alphabet bits_saved Q_n SOV SOV(E) SOV(H) EBGHTL 0.9733 0.6857 0.6860 0.8089 0.8749 Note: now reporting degrees of freedom rather than number of parameters, but this network has identical structure to the previous best---just more training. 30 Nov 2001 Kevin Karplus New best network for STRIDE. All measures of quality are up. # dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net (3521 degrees of freedom) alphabet bits_saved Q_n SOV SOV(E) SOV(H) EBGHTL 0.9762 0.6867 0.6880 0.8135 0.8747 1 Dec 2001 Kevin Karplus Starting some experiments in training neural nets complete from scratch (no seeding). My first attempt is a 3-level network (IDaa13-7-14-7-12-11-ebghtl) slightly larger than my best 5-level network, but I'll also try training a 5-level network with the same architecture as the best current network, to see how much benefit is gained from the seeding and very long training that the best network has. 3 Dec 2001 Kevin Karplus After 260 epochs, the network trained from scratch has still not really converged: # dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained.net (3740 degrees of freedom) alphabet bits_saved Q_n SOV SOV(E) SOV(H) EBGHTL 0.9269 0.6735 0.6687 0.8047 0.8552 It may be that the learning rate is too slow. I may have to try again with faster learning. I tried training a network from IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-empty with faster parameters, and already at 42 epochs it is doing better than dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained.net after 260. I don't know whether this improvement is due to the structure or the learning parameters as both were changed at once. 6 Dec 2001 Kevin Karplus After 260 epochs the # dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty.net (3521 degrees of freedom) network has gotten pretty good---about as good as the best network in August or September, before I started tweaking the architecture and doing really long training runs. alphabet bits_saved Q_n SOV SOV(E) SOV(H) EBGHTL 0.9623 0.6846 0.6880 0.8101 0.8782 I should try doing a second training run on this network, perhaps with slightly slower learning rate, to see if I can match the best so far. I should also try a couple of other empty networks---say one with more hidden units and one with wider windows, but about the same number of parameters. maybe IDaa13-5-14-7-10-9-8-11-ebghtl and IDaa13-9-8-11-8-11-8-11-8-11-ebghtl. It would also be interesting to see how jurying several independently trained networks compares to one network of larger size. Do the independently trained networks learn the same things? Obviously some of the patterns, such as the middle of an amphipathic helix, will be learned by almost any network, but do the networks differ on the more difficult cases? Assuming that each network has been trained long enough that its assessment of probabilities is fairly accurate, how should we combine multiple probability vectors? Some possibilities: * average the probability vectors. advantages: result is a probability vector Can use linear algebra to optimize weighting of multiple predictors. disadvantages: does not give more weight to more confident predictions. There is a big difference between being 80% sure and 99% sure that something is a helix. Predictions will never be stronger than the strongest single predictor (though which predictor is strongest may vary from position to position, so overall result could still be better than a single predictor). * average the log P(x|data)/P(x) scores, add log P(x), exponentiate, and rescale to make probability vector. advantages: large deviations from background frequencies are taken advantage of. disadvantages: predictions will never be stronger than the best single predictor. Figuring out weighting for different predictors may be a bit messier, because of the rescaling. 11 Dec 2001 Kevin Karplus I now have several independently trained networks (on the same training set) with different architectures Epochs bits/saved Q6 SOV "overall" SOV(H) SOV(E) [trained from old networks---lots of training] dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained many 0.9762 0.6867 0.6880 2.0068 0.8135 0.8747 [Same structure as best network, but trained from random start] dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty 260 0.9623 0.6846 0.6880 1.9909 0.8101 0.8782 (faster params) dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained 260 0.9269 0.6735 0.6687 1.9347 0.8047 0.8552 (fast params) [The following 2 are related---the second is trained from the first.] dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty 260 0.9634 0.6832 0.6822 1.9877 0.8179 0.8710 (faster params) dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-2 (from previous) 260+250 0.9696 0.6850 0.6845 1.9968 0.8119 0.8723 It looks like total training time is more important than how the network is structured, though total number of parameters may be important---these networks were deliberately chosen to have roughly the same number of parameters. Perhaps I should go to a "simulated annealing" style of training---after each epoch, keep the new network if it is better on the crosstraining set than the old one, or with probability dependent on the loss of quality if it is worse. 13 Dec 2001 Kevin Karplus Further training of dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty to produce dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-2 has improved it to be almost as good as dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-2 but at the current rate of improvement, it looks like it would take another 1000 epochs to get as good as the best trained network. [Same structure as best network, but trained from random start] dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty 260 0.9623 0.6846 0.6880 1.9909 0.8101 0.8782 (faster params) dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-2 260+270 0.9660 0.6855 0.6887 1.9958 0.8073 0.8786 I have implemented the "simulated annealing" style of training and will test it soon. I'd like to set up a train/cross-train/test set first. From compbio-request@services.cse.ucsc.edu Mon Dec 17 10:41:37 2001 Date: Mon, 17 Dec 2001 10:41:25 -0800 From: Kevin Karplus To: compbio@soe.ucsc.edu Subject: secondary structure prediction test set I have set up a train/cross-train/test split for testing secondary structure predictors (or other local properties of proteins). The set consists of 1759 proteins, with 20% or less pairwise residue identity, and with x-Ray structures having resolution <= 3.0 and R-factor <=1.0 [taken from Dunbrack's Culled-PDB website]. Dunbrack's set has been further pruned to elminate any sequences which are fragmentary (all fragments less than 20 residues long). All these chains have t2k alignments in the pcem/pdb subdirectories. The set has been randomly partitioned into three equal-sized sets: dubrack-1759-1 dubrack-1759-2 dubrack-1759-3 To use the set, you can do 3-fold cross-validation in one of two ways 1) train on 1+2, test on 3 train on 1+3, test on 2 train on 2+3, test on 1 Results should be reported as the overall performance on the test set. 2) If you have a "cross training" method, where you train on one set of data and select the best performing network (or other trained machine) on a different set of data, you should do 6 runs, for the six permutations of the sets: train on 1 select on 2 test on 3 train on 1 select on 3 test on 2 train on 2 select on 1 test on 3 train on 2 select on 3 test on 1 train on 3 select on 1 test on 2 train on 3 select on 2 test on 1 Again, results should be reported as the average performance on the test sets. -------------------- I have done one run training a neural net for the STRIDE EBGHTL classification (train on 1 select on 2 test on 3). After 320 epochs, I got data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C # null bits=0 train 0.9123 0.6762 0.6719 1.9244 0.8110 0.3639 0.2035 0.8512 0.5860 0.4378 crosstrain 0.8148 0.6472 0.6472 1.7856 0.7721 0.3913 0.2193 0.8342 0.5694 0.4239 test 0.8057 0.6452 0.6430 1.7724 0.7780 0.3746 0.2310 0.8348 0.5656 0.4205 There is more overtraining than I had expected, and the neural network has converged to within 0.0007 of the final result (on the crosstrain set) within 160 epochs. There is also more difference between the crosstrain set and the test set than I expected. I'll have to do the other permutations, to make sure that I haven't just picked up a difference in difficulty between the three sets of chains. I'll probably also pick one (or all) of the resulting networks to retrain on the entire dataset. The resulting network(s) will, of course, look like they do better than networks trained with a clean separation of training and testing data, and should generalize to previously unseen data at least as well. 20 Dec 2001 Kevin Karplus The IDaa13-5-14-7-10-9-8-11-ebghtl network structure does better than the IDaa13-7-10-11-11-9-6-9-6-9-ebghtl network structure, at least on the dunbrack-1795 set with dunbrack-1795-1 as the training set and dunbrack-1795-2 as the cross-training set. I should probably try a variety of different architectures, allowing the size to vary, before picking the best one on the cross-training data. # dunbrack-1795-12-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty-try2.net (3521 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9123 0.6762 0.6719 1.9244 0.8110 0.3639 0.2035 0.8512 0.5860 0.4378 crosstrain 0.8148 0.6472 0.6472 1.7856 0.7721 0.3913 0.2193 0.8342 0.5694 0.4239 test 0.8057 0.6452 0.6430 1.7724 0.7780 0.3746 0.2310 0.8348 0.5656 0.4205 # dunbrack-1795-12-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty.net (3382 degrees of freedom) train 0.9091 0.6717 0.6658 1.9136 0.8163 0.3639 0.2632 0.8428 0.5838 0.4285 crosstrain 0.8322 0.6517 0.6484 1.8081 0.7886 0.3917 0.2603 0.8338 0.5773 0.4156 test 0.8260 0.6506 0.6472 1.8003 0.7917 0.3750 0.2661 0.8364 0.5679 0.4144 21 Dec 2001 Kevin Karplus Making the windows of uniform width (about the same number of parameters) doesn't help: # dunbrack-1795-12-IDaa13-7-14-7-10-7-8-7-ebghtl-stride-from-empty.net (3654 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9087 0.6749 0.6706 1.9189 0.8186 0.3639 0.2152 0.8419 0.5944 0.4328 crosstrain 0.8203 0.6483 0.6446 1.7909 0.7861 0.3913 0.2300 0.8292 0.5768 0.4130 test 0.8172 0.6492 0.6428 1.7878 0.7941 0.3746 0.2422 0.8324 0.5614 0.4199 There seem to be two directions to go---try all windows wide and try all windows narrow. I'll try narrow first, since it will train quicker. 22 Dec 2001 Kevin Karplus Making the windows narrower hurts # dunbrack-1795-12-IDaa13-5-14-5-10-5-8-5-ebghtl-stride-from-empty.net (2610 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C crosstrain 0.8162 0.6469 0.6420 1.7842 0.7801 0.3913 0.2399 0.8257 0.5779 0.4396 Note: because of disk problems on /projects/compbio2, the final network was not written, and the test and final training results are not available. Still, it is clear that the crosstraining topped out much lower here, so an all-narrow network is not a winner (unless it is much bigger, maybe). I could try an all-wide network next, or I could try a tapered network with more hidden units. I think that increasing the number of hidden units is more likely to help, so I'll try that first. 23 Dec 2001 Kevin Karplus Increasing the number of hidden units does not seem to have helped---it just increased the gap between training and cross-training sets: # dunbrack-1795-12-IDaa13-5-16-7-16-9-12-11-ebghtl-stride-from-empty.net (5574 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9214 0.6758 0.6730 1.9337 0.8174 0.3639 0.2385 0.8436 0.5969 0.4431 crosstrain 0.8276 0.6493 0.6492 1.8015 0.7852 0.3913 0.2444 0.8316 0.5861 0.4213 test 0.8170 0.6481 0.6445 1.7874 0.7853 0.3746 0.2499 0.8345 0.5752 0.4163 Let's try a network with about half as many weights per layer. # dunbrack-1795-12-IDaa13-5-9-7-15-9-8-17-ebghtl-stride-from-empty.net (3387 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9074 0.6735 0.6690 1.9155 0.8143 0.3639 0.2197 0.8451 0.5771 0.4504 crosstrain 0.8344 0.6516 0.6489 1.8105 0.7830 0.3913 0.2277 0.8401 0.5586 0.4311 test 0.8269 0.6505 0.6432 1.7990 0.7876 0.3746 0.2431 0.8329 0.5527 0.4270 This is the best so far on the cross-training set, but the test set doesn't do as well as the previous best, indicating that we may be in the region of diminishing returns. Still, it is unfair to use the test set to make any decisons---that contaminates the test. So we'll ignore the test set result and just concentrate on improving the cross-training set. We could try again with only 3 layers, but about the same number of parameters per layer. 25 Dec 2001 # dunbrack-1795-12-IDaa13-5-9-7-15-11-ebghtl-stride-from-empty.net (2587 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.8932 0.6693 0.6568 1.8909 0.8033 0.3639 0.2285 0.8308 0.5816 0.4402 crosstrain 0.8165 0.6483 0.6468 1.7881 0.7786 0.3913 0.2391 0.8297 0.5797 0.4334 test 0.8079 0.6460 0.6391 1.7735 0.7810 0.3750 0.2389 0.8297 0.5721 0.4283 Hmm: not nearly as good, but I don't know if the problem is having fewer layers, or just fewer parameters. Let's try a smaller 4-layer network. 26 Dec 2001 # dunbrack-1795-12-IDaa13-5-7-7-14-9-6-21-ebghtl-stride-from-empty.net (2463 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.8873 0.6656 0.6617 1.8838 0.8004 0.3639 0.2283 0.8403 0.5753 0.4300 crosstrain 0.8051 0.6428 0.6419 1.7689 0.7753 0.3913 0.2357 0.8341 0.5663 0.4130 test 0.8007 0.6417 0.6379 1.7614 0.7831 0.3746 0.2452 0.8293 0.5547 0.4115 This smaller 4-layer network is terrible---worse overtraining than the three-layer network. Maybe I should try a bigger 3-layer network: 27 Dec 2001 # dunbrack-1795-12-IDaa13-5-11-7-16-13-ebghtl-stride-from-empty.net (3295 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.8969 0.6704 0.6625 1.8986 0.8065 0.3641 0.2154 0.8390 0.5803 0.4520 crosstrain 0.8271 0.6501 0.6450 1.7997 0.7817 0.3913 0.2272 0.8291 0.5746 0.4384 test 0.8202 0.6495 0.6407 1.7901 0.7807 0.3746 0.2389 0.8297 0.5617 0.4333 This 3-layer network is respectable, but not as good as the 4-layer network of about the same size. Perhaps I should try a 5-layer network of about the same size: 28 Dec 2001 # dunbrack-1795-12-IDaa13-5-7-7-15-9-6-11-11-13-ebghtl-stride-from-empty.net (3396 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9140 0.6716 0.6707 1.9209 0.8211 0.3639 0.2417 0.8496 0.5798 0.4280 crosstrain 0.8261 0.6469 0.6466 1.7962 0.7884 0.3913 0.2399 0.8310 0.5733 0.4118 test 0.8081 0.6448 0.6419 1.7738 0.7861 0.3742 0.2502 0.8337 0.5483 0.4170 WARNING: two files (4mt2 and 1lst, both in the test set) were not included in this run because their info directories were messed up in the latest rebuild of the template libraires. This doesn't affect the training or cross-training results, though. Of course, changes in some of the template alignments may affect accuracy slightly. 29 Dec 2001 This is again respectable, but not as good as the 4-layer one. The overtraining (difference between train and crosstrain) is greater, so making the network larger probably wouldn't help. Let's try a different approach, makng a network with 4 layers, tapered window sizes, but keep the number of hidden units roughly equal, rather than number of weights. Let's try 11 hidden units with 5,7,9,11 windows. # dunbrack-1795-12-IDaa13-5-11-7-11-9-11-11-ebghtl-stride-from-empty.net (3465 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9173 0.6773 0.6729 1.9311 0.8147 0.3639 0.2486 0.8454 0.5911 0.4419 crosstrain 0.8260 0.6493 0.6470 1.7988 0.7834 0.3913 0.2451 0.8298 0.5754 0.4214 test 0.8153 0.6484 0.6392 1.7833 0.7829 0.3746 0.2487 0.8306 0.5597 0.4206 Again, respectable, but not as good as the best 4-layer network, which actually has fewer weights. The cross-training objective function (evaluated on set dunbrack-1795-2) has ranged from 1.7640 (for IDaa13-5-7-7-14-9-6-21-ebghtl) to 1.8105 (for IDaa13-5-9-7-15-9-8-17-ebghtl). The three networks with objective >1.8 are dunbrack-1795-12-IDaa13-5-16-7-16-9-12-11-ebghtl-stride-from-empty 1.8015 dunbrack-1795-12-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty 1.8081 dunbrack-1795-12-IDaa13-5-9-7-15-9-8-17-ebghtl-stride-from-empty 1.8105 All are 4-level networks with windowing 5,7,9,(11 or 17). The test result objectives vary from 1.7614 to 1.8003---not exactly matching the ranking by cross-training results, though the top two for crosstraining are also the top two for testing. Let's try one more 4-layer network with roughly equal numbers of weights, but make the windows 3,5,7,17 instead of 5,7,9,17. 30 Dec 2001 The narrower windows seem to work quite well, with objective function > 1.8 (third best network, so far). # dunbrack-1795-12-IDaa13-3-14-5-13-7-10-17-ebghtl-stride-from-empty.net (3367 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9222 0.6762 0.6691 1.9330 0.8109 0.3639 0.2478 0.8514 0.5757 0.4518 crosstrain 0.8313 0.6516 0.6465 1.8062 0.7829 0.3913 0.2553 0.8374 0.5671 0.4296 test 0.8254 0.6497 0.6436 1.7969 0.7833 0.3746 0.2595 0.8388 0.5529 0.4232 I suspect that the last window is too wide though, and we may do better by narrowing it down to 11, or even 9. I wonder how we would do with even narrower windows: 1,3,5,7 or 1,3,5,9. Perhaps 1-40 3-8 5-22 7-ebghtl? Hmm---that doesn't make much sense---why recode a single position into more than the 22 inputs? How about 3-16 5-13 7-10 11-ebghtl? It seems that this narrow network does not get as good. # dunbrack-1795-12-IDaa13-3-16-5-13-7-10-11-ebghtl-stride-from-empty.net (3319 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.8905 0.6677 0.6626 1.8894 0.8141 0.3639 0.2432 0.8436 0.5813 0.4274 crosstrain 0.8269 0.6494 0.6449 1.7988 0.7828 0.3913 0.2578 0.8322 0.5810 0.4131 test 0.8205 0.6487 0.6396 1.7890 0.7881 0.3750 0.2536 0.8293 0.5652 0.4172 Next thing to try---widening the window on the penultimate layer? Maybe 3-14 5-13 9-10 13-ebghtl? 1 Jan 2002 # dunbrack-1795-12-IDaa13-3-14-5-13-9-10-13-ebghtl-stride-from-empty.net (3401 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.8945 0.6697 0.6664 1.8974 0.8107 0.3639 0.2327 0.8407 0.5877 0.4371 crosstrain 0.8303 0.6505 0.6473 1.8045 0.7853 0.3913 0.2379 0.8272 0.5830 0.4250 test 0.8149 0.6469 0.6453 1.7844 0.7852 0.3746 0.2456 0.8261 0.5713 0.4193 Very good, but not as good as IDaa13-3-14-5-13-7-10-17-ebghtl, which is still not as good as the best (IDaa13-5-9-7-15-9-8-17-ebghtl), though overtraining is less on the network with the narrower last layer. Hmm, how about narrowing the last layer on the best network, and adding hidden units to earlier layers? Maybe 5-10-7-15-9-10-13-ebghtl? YES! after only 70 epochs, dunbrack-1795-12-IDaa13-5-10-7-15-9-10-13-ebghtl-stride-from-empty.net is already the best network on the cross-training set. So where next? Keep the 5,7,9,13 windows and gradually increase the number of hidden units---maybe IDaa13-5-15-7-15-9-15-13-ebghtl? 2 Jan 2002 # dunbrack-1795-12-IDaa13-5-10-7-15-9-10-13-ebghtl-stride-from-empty.net (3835 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9291 0.6812 0.6775 1.9491 0.8124 0.3639 0.2412 0.8583 0.6023 0.4345 crosstrain 0.8364 0.6518 0.6512 1.8138 0.7827 0.3913 0.2405 0.8374 0.5926 0.4126 test 0.8209 0.6500 0.6472 1.7945 0.7857 0.3746 0.2537 0.8401 0.5781 0.4044 Best so far on crosstrain, but overtraining is getting big, so (as predictable) the test set shows no improvement. Hmm, is that really pedictable? Let's look at the results for the 14 different networks trained for dunbrack-1795-12. If we do a linear fit on the bits (cross-test) = m (train-cross) + b, we get m = 0.0489 +- 0.112 b = 0.005 +- 0.009 So the data doesn't really support predicting overtraining on the test set based on the difference between the crosstrain and the training set. Doing a linear fit on the bits, test= m cross+b, we get m=0.8085 +-.1396, b=0.1489 +- .115 We would ideally like m=1, b=0, but at least m is fairly large. Doing the fit on the bits, test = m train+b, we get m=0.2181+-0.166 b=0.6183+-0.151 with a much smaller slope, so selecting on the crosstraining set does help on the test set. Doing a linear fit on the objective, test= m cross+b, we get m=0.867544+-0.1276 b=0.225452+-0.2294 rms residual 0.00539263 The fit looks pretty good, but the highest value on the test set occurs for the third highest value on the crosstraining set. Doing a linear fit on the objective, test= m train+b, we get m=0.24079+-0.1526 b=1.32433+-0.2921 rms residuals 0.0108099 with a much lower slope and poorer fit, so selection on the crosstraining set seems to help. I can improve the fit very slightly for the objective test = m (cross-0.051 train)+b m=0.908658+-0.1318 b=0.240243+-0.2241 rms residual = 0.00533249 It isn't clear that adding the extra parameter is worth the tiny reduction in the residual. Selecting on cross-0.051*train does not change the order of the top few networks, so would have no practical effect. 3 January 2002 New best on cross training set! # dunbrack-1795-12-IDaa13-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9263 0.6780 0.6725 1.9405 0.8130 0.3639 0.2321 0.8480 0.5864 0.4494 crosstrain 0.8376 0.6522 0.6488 1.8142 0.7832 0.3913 0.2416 0.8398 0.5747 0.4231 test 0.8254 0.6517 0.6456 1.7999 0.7877 0.3746 0.2475 0.8378 0.5571 0.4259 4 Jan 2002 Increasing the windows (and thus the number of parameters) doesn't help: # dunbrack-1795-12-IDaa13-7-15-9-15-11-15-13-ebghtl-stride-from-empty.net (7331 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9205 0.6759 0.6724 1.9326 0.8120 0.3639 0.2068 0.8443 0.6097 0.4128 crosstrain 0.8249 0.6515 0.6493 1.8010 0.7868 0.3913 0.2211 0.8254 0.6069 0.4011 test 0.8212 0.6519 0.6473 1.7968 0.7891 0.3746 0.2330 0.8330 0.6003 0.3956 We could try increasing the number of hidden units further. 6 jan 2002 Increasing by 2 on each layer doesn't help---overtraining gets worse: # dunbrack-1795-12-IDaa13-5-17-7-17-9-17-13-ebghtl-stride-from-empty.net (7217 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9348 0.6805 0.6715 1.9510 0.8195 0.3639 0.2303 0.8447 0.5979 0.4439 crosstrain 0.8284 0.6510 0.6459 1.8023 0.7854 0.3913 0.2377 0.8335 0.5758 0.4162 test 0.8209 0.6502 0.6465 1.7944 0.7923 0.3746 0.2450 0.8368 0.5712 0.4148 I just noticed a naming error in all the tests so far: the "IDaa13" should be "IDaa14" since they have actually been trying to save 1.4 bits per position, not 1.3. Let's try using the best architecture so far, but changing the sequence weighting. 7 jan 2002 # dunbrack-1795-12-IDaa10-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9269 0.6779 0.6727 1.9411 0.8078 0.3639 0.2432 0.8402 0.5972 0.4498 crosstrain 0.8320 0.6516 0.6511 1.8091 0.7822 0.3913 0.2463 0.8333 0.5834 0.4283 test 0.8296 0.6515 0.6505 1.8063 0.7844 0.3746 0.2592 0.8395 0.5708 0.4284 Using lower sequence weights hurts a little on crosstraining (but actually helps on the test set, getting a new best there). 8 Jan 2002 Trying intermediate weights gets intermediate results # dunbrack-1795-12-IDaa12-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C crosstrain 0.8347 0.6522 0.6473 1.8106 0.7870 0.3913 0.2441 0.8333 0.5824 0.4111 (training and testing set results are not available, because of file-write problems) NOTE: I have renamed all the quality report and network files to correctly have IDaa14 rather than IDaa13 in the names, but the contents of the files have not been changed, so still have the incorrect IDaa13 internally. Let's try increasing the bit savings more (increasing the weights). 9 Jan 2002 # dunbrack-1795-12-IDaa16-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9268 0.6786 0.6711 1.9409 0.8160 0.3639 0.2595 0.8514 0.5796 0.4258 crosstrain 0.8402 0.6538 0.6541 1.8210 0.7930 0.3913 0.2537 0.8398 0.5756 0.4191 test 0.8315 0.6530 0.6433 1.8061 0.7869 0.3746 0.2631 0.8362 0.5601 0.4109 Increasing the weights to 1.6 bits/position is a new best. Maybe the weights should be increased even more---perhaps even using just relative and not absolute weighting? Already at 1.6 bits/position we have 1131 sequences clipping so that the average weight is 1. We should try three other networks: one that uses Henikoff weighting with number of sequences as total weight (thin90 alignment), one that uses Henikoff weighting with no regularizer (thin90 alignment), and one that uses a thin62 alignment. 10 Jan 2002 I tried using Henikoff weighting with total weight equal number of sequences and no regularizer. For the first 70 epochs, it followed the other networks with the same architecutre quite closely, but on epoch 70, the "CenterWeights" command made a big change to the network which it never recovered from. There is really gross overtraining happening also, which may be why it couldn't recover on the cross-training set. # dunbrack-1795-12-IDaaH-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9530 0.6837 0.6777 1.9756 0.8130 0.3639 0.2404 0.8691 0.5861 0.4302 crosstrain 0.8080 0.6460 0.6378 1.7729 0.7703 0.3913 0.2426 0.8288 0.5696 0.4068 test 0.7950 0.6455 0.6372 1.7591 0.7727 0.3746 0.2452 0.8409 0.5606 0.3973 11 Jan 2002 Using a ReRegularizer after doing the Henikoff weighting seems to be more robust, and results in a new BEST score on the cross-training set (already best by epoch 90). 12 Jan 2002 # dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9442 0.6836 0.6738 1.9648 0.8115 0.3639 0.2769 0.8528 0.5842 0.4503 crosstrain 0.8465 0.6549 0.6545 1.8287 0.7837 0.3913 0.2722 0.8349 0.5700 0.4375 test 0.8330 0.6538 0.6446 1.8091 0.7802 0.3746 0.2736 0.8351 0.5575 0.4333 The next thing to do may be to try different regularizers, trained for more or less generalization with weight=1, to see if that makes a difference. Let's go for less generalization first--say with one of the mixtures I trained for Rolf Olsen, maybe rolfT1AG.26comp. 13 Jan 2002 # dunbrack-1795-12-IDaaHt-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9402 0.6815 0.6743 1.9589 0.8130 0.3639 0.2590 0.8514 0.5873 0.4415 crosstrain 0.8377 0.6537 0.6494 1.8161 0.7837 0.3913 0.2566 0.8287 0.5764 0.4274 test 0.8205 0.6501 0.6413 1.7913 0.7824 0.3746 0.2648 0.8311 0.5629 0.4256 The rolfT1AG.26comp regularizer seems to have done worse. Let's try one that generalizes MORE, like dist.20comp. 14 Jan 2002 # dunbrack-1795-12-IDaaHd-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9342 0.6804 0.6724 1.9508 0.8064 0.3639 0.2743 0.8509 0.5837 0.4454 crosstrain 0.8392 0.6532 0.6525 1.8187 0.7825 0.3913 0.2735 0.8320 0.5678 0.4359 test 0.8255 0.6516 0.6417 1.7979 0.7816 0.3746 0.2745 0.8323 0.5583 0.4247 Nope, dist.20comp is also worse than recode3.20comp, though slightly better than rolfT1AG.26comp. I suppose I could check recode4.20comp or recode5.20comp. 15 Jan 2002 recode4.20comp doesn't do as well as recode3.20comp: # dunbrack-1795-12-IDaaHr4-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9225 0.6779 0.6681 1.9345 0.8116 0.3639 0.2593 0.8436 0.5834 0.4326 crosstrain 0.8413 0.6537 0.6489 1.8194 0.7902 0.3913 0.2557 0.8306 0.5676 0.4226 test 0.8312 0.6533 0.6413 1.8052 0.7827 0.3746 0.2687 0.8370 0.5566 0.4187 17 Jan 2002 Dropping the insert-delete information hurts quite a bit: # dunbrack-1795-12-aaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5735 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9311 0.6782 0.6703 1.9443 0.8111 0.3639 0.2411 0.8586 0.5887 0.4267 crosstrain 0.8262 0.6489 0.6418 1.7961 0.7821 0.3913 0.2401 0.8352 0.5775 0.3991 test 0.8177 0.6499 0.6406 1.7879 0.7807 0.3750 0.2475 0.8416 0.5635 0.3980 19 Jan 2002 Oops, in attempting to test thinning to 62% and no thinning, I accidentally overwrote the best network. I'll have to redo dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net Thinning to 62% is a new best on the cross training set, with less overtraining than before. It doesn't do any better on the test set though, so the difference is probably not important. # dunbrack-1795-12-IDaaHr62-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9303 0.6794 0.6693 1.9443 0.8112 0.3639 0.2527 0.8511 0.5846 0.4268 crosstrain 0.8469 0.6562 0.6537 1.8300 0.7843 0.3913 0.2552 0.8344 0.5879 0.4137 test 0.8309 0.6513 0.6427 1.8035 0.7820 0.3746 0.2659 0.8391 0.5638 0.4126 21 Jan 2002 Not thinning is NOT a good idea: # dunbrack-1795-12-IDaaHrall-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9344 0.6811 0.6775 1.9543 0.8147 0.3639 0.2529 0.8590 0.5885 0.4348 crosstrain 0.8374 0.6521 0.6469 1.8130 0.7844 0.3913 0.2432 0.8273 0.5825 0.4207 test 0.8221 0.6511 0.6448 1.7956 0.7823 0.3746 0.2552 0.8317 0.5712 0.4213 So should we thin to 62%? 50%? 75%? One minor point---I just found out today that 1fznD is not a good structure (mis-solved). It is currently in set dunbrack-1795-2. 22 Jan 2002 After removing the bad 1fznD from set 2, retraining the best thin90 network on set 1 gives a new best on the cross-training set (and on the test set): # dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9372 0.6814 0.6725 1.9549 0.8152 0.3639 0.2603 0.8522 0.5952 0.4207 crosstrain 0.8496 0.6557 0.6523 1.8315 0.7924 0.3920 0.2611 0.8303 0.5857 0.4126 test 0.8338 0.6539 0.6434 1.8093 0.7862 0.3746 0.2655 0.8330 0.5707 0.4120 8 Feb 2002 Training on set 2 and cross training on set 3: # dunbrack-1795-23-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9430 0.6794 0.6783 1.9615 0.8095 0.3920 0.2522 0.8621 0.5890 0.4584 crosstrain 0.8268 0.6525 0.6480 1.8033 0.7869 0.3746 0.2513 0.8395 0.5593 0.4355 test 0.8421 0.6584 0.6534 1.8273 0.7922 0.3639 0.2330 0.8273 0.5709 0.4354 Hmm, it looks like set 3 is harder, since the test on set 1 gets better score than the cross-train on set 3. 9 Feb 2002 Training on set 3 cross-training on set 1: # dunbrack-1795-31-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9314 0.6790 0.6715 1.9461 0.8019 0.3750 0.2625 0.8655 0.5899 0.4419 crosstrain 0.8379 0.6579 0.6536 1.8226 0.7888 0.3639 0.2332 0.8237 0.5750 0.4248 test 0.8352 0.6524 0.6473 1.8113 0.7790 0.3920 0.2498 0.8309 0.5736 0.4319 10 Feb 2002 Training on set 2, cross-training on set 1: # dunbrack-1795-21-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9338 0.6764 0.6744 1.9475 0.8084 0.3920 0.2481 0.8628 0.5885 0.4437 crosstrain 0.8440 0.6583 0.6537 1.8292 0.7933 0.3639 0.2329 0.8338 0.5769 0.4224 test 0.8280 0.6526 0.6470 1.8041 0.7890 0.3746 0.2505 0.8398 0.5650 0.4254 Note that the results on sets 1 and 3 are very similar to (but slightly better than) when 3 was used as the cross-training set and 1 as the test set. 11 Feb 2002 training on set 3, cross-training on set 2: # dunbrack-1795-32-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9247 0.6774 0.6685 1.9364 0.8008 0.3750 0.2624 0.8621 0.5856 0.4445 crosstrain 0.8354 0.6524 0.6487 1.8122 0.7784 0.3922 0.2496 0.8357 0.5681 0.4367 test 0.8366 0.6568 0.6538 1.8204 0.7859 0.3639 0.2336 0.8253 0.5685 0.4288 12 Feb 2002 training on 1 cross-training on 3 # dunbrack-1795-13-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9310 0.6790 0.6700 1.9450 0.8143 0.3639 0.2644 0.8446 0.5773 0.4436 crosstrain 0.8338 0.6536 0.6448 1.8098 0.7873 0.3746 0.2679 0.8308 0.5521 0.4384 test 0.8485 0.6541 0.6517 1.8284 0.7917 0.3920 0.2625 0.8303 0.5651 0.4352 Objective values: test 1 2 3 train: 1 1.9549,1.9450 1.8284 1.8093 2 1.8273 1.9615,1.9475 1.8041 3 1.8204 1.8113 1.9461,1.9364 Bits saved: test 1 2 3 train: 1 0.9372,0.9310 0.8485 0.8338 2 0.8421 0.9430, 0.9338 0.8280 3 0.8366 0.8352 0.9314,0.9427 13-14 Feb 2002 Training on 2/3 of data, no cross-training. The over-training is expected to be more severe with no early termination from a cross-training set, but the greater diversity of the training set may compensate for that. # dunbrack-1795-1+2-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9962 0.6934 0.6889 2.0340 0.8210 0.3779 0.3148 0.8750 0.5941 0.4630 test 0.8414 0.6585 0.6526 1.8262 0.7875 0.3746 0.3025 0.8466 0.5613 0.4333 Training on the combination of 1 and 2 IS better than training on one set and cross-training on the other, so I should probably train networks for 1+3 and 2+3, and make these be the networks for testing fold-recognition. 15 Feb 2002 # dunbrack-1795-2+3-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9707 0.6866 0.6847 1.9996 0.8046 0.3833 0.2917 0.8763 0.5806 0.4827 test 0.8730 0.6677 0.6638 1.8726 0.7846 0.3639 0.2696 0.8418 0.5564 0.4653 # dunbrack-1795-1+3-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9824 0.6914 0.6854 2.0165 0.8166 0.3693 0.3146 0.8700 0.5957 0.4650 test 0.8753 0.6640 0.6617 1.8701 0.7866 0.3920 0.3030 0.8448 0.5794 0.4454 Again, much better results than the train/cross-train splitting of the training data. 19 Fev 2002 Training on the FULL dunbrack-1795 set (1794 chains) has not quite converged after 320 epochs, but over-training is less than on the 1+2 set, so I would expect the quality to be at least as good as that network. # dunbrack-1795-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom) data set Bits_saved Q6 SOV objective SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C train 0.9895 0.6918 0.6903 2.0263 0.8176 0.3775 0.3064 0.8755 0.5992 0.4845 The 13 worst chains with this training set are # chainID Bits_saved Q6 SOV object count SOV_E SOV_B SOV_G SOV_H SOV_T SOV_C 1f83B -0.8095 0.2083 0.1806 -0.5109 24 1.0000 1.0000 1.0000 1.0000 0.0000 0.3611 1rypL -0.7921 0.4198 0.4479 -0.1484 212 0.8277 0.0000 0.0000 1.0000 0.3740 0.3200 1fjgV -0.2364 0.0833 0.0417 -0.1322 24 1.0000 0.0000 1.0000 0.0000 0.0000 0.2000 1cq4B -0.4601 0.2609 0.1920 -0.1032 23 0.0000 0.0000 1.0000 1.0000 0.1429 0.3638 1fjhA -0.7586 0.4746 0.4339 -0.0670 236 0.6667 1.0000 0.0000 0.4754 0.3025 0.3806 1jjuC -0.2531 0.1899 0.2039 0.0387 79 1.0000 0.0000 0.0000 0.3529 0.2028 0.2626 1ryp2 -0.5921 0.4592 0.4791 0.1067 233 0.8986 1.0000 0.0000 1.0000 0.3824 0.3833 1lpbA -0.3850 0.3294 0.3333 0.1111 85 0.4015 0.0000 0.0000 0.5833 0.5200 0.0600 1ijvA -0.3956 0.3611 0.3315 0.1312 36 0.3750 1.0000 1.0000 0.0000 0.5000 0.3815 1i50L -0.4090 0.3696 0.4826 0.2019 46 1.0000 0.0000 1.0000 1.0000 0.8941 0.2800 1qqp4 -0.1203 0.2391 0.2464 0.2420 46 1.0000 1.0000 1.0000 0.5000 0.3333 0.1333 1f32A -0.2387 0.4016 0.3757 0.3507 127 0.3029 1.0000 0.1667 0.3953 0.4990 0.4119 1en2A -0.1560 0.3721 0.2845 0.3583 86 0.1667 0.0000 0.2667 1.0000 0.4236 0.2708 For 1rypL, the problem seem to be that STRIDE say the long helices as mixtures of 3-10 helices and turns. DSSP may have been a better choice for labeling this structure! 25 Feb 2002 Trained a network on all 5893 T2K alignments. Several turned out to be difficult to predict---108 had predicted probabilities worse than background probabilities. I have saved these in t2k-hard-stride.ids. Looking quickly at the stride labelings, it seems that a lot of these sequences are "turn-rich". Perhaps the problem is with stride labeling some helices as turns and 3-10 helices. It would be useful to see whether DSSP has the same number of problem sequences, and whether the sequences are the some ones. Question: should I retrain the neural net excluding these 108 problematic sequences? It would do much better on the OTHER sequences, but even worse on this set. I can look at the EBGHTL logos for the old network dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net and see if they are bad. 1a0hA lots of turns. Turns predicted, but weakly. 1a1rC predicted strand on short peptide. Peptide DOES seem to be in sheet with different chain. (STRIDE labeling wrong). 1a1tA very weak predictions. Lots of T and G labeling. 1ab3 strongly predicted helices, long runs of T and G Rather ugly NMR structure---probably helices. 1ac0 strands properly predicted, but mislabeled by stride 1av3 peptide with very weak prediction--misses short beta connections 1awj prediction not perfect, but strnad prediction better than stride labeling. 1b29A theoretical model! YECH! Will remove all theoretical models from the template library (I thought I had already, but there are 49 still there!) 1b4g spaghetti of incompatible NMR models---not useful for training. 1b8wA weak prediction, don't know why it is wrong. 1bcg strong wrong predictions, don't know why. Structure looks good. 1be3 long helix, but predicted strand! Don't know why. 1befA some errors in prediction at N-terminus. Rest seems to use a lot of B labeling where Es are predicted. Could there be some slippage in the multiple alignment (some of the turns are offset somewhat from the prediction. 1bh1 prediction seems better than STRIDE labeling. ... 1qbfA predicted helices, long T--G--T ... 3itr another theoretical model!