30 November 2001 Kevin Karplus

The "stride" directory contains networks, scripts, and quality reports
for neural nets attempting to predict the secondary structure defined
by STRIDE, reduced to a 6-letter alphabet (EBGHTL), from a multiple
alignment.

Most of the scripts are still set up to refer to an older subdirectory
organization (in which the subdirectories of testing/stride/ were just
subdirectories of testing/).

The first sections are old reports on the quality of the different
networks.  The stride EBGHTL predictions are currently our most
polished set of neural nets.  Later sections will update for newer
networks.

------------------------------------------------------------

9 Jan 2001

Updated the quality reports and unit usage.
The quality reports now have bits_saved as the third column, allowing
comparison between different alphabets.
The objective is now something to be maximized, rather than minimized.

The unit usage previously had a bug in reporting E(Phat(i)P(j)) / E(P(j)),
which has now been fixed.

It looks like having a richer alphabet makes for more informative
predictions, though the Q measure drops:
    alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
	EHL2	0.7815		0.7810	0.7337	0.7801
	EHTL2	0.8299		0.6842	0.7592	0.7847
	EBGHTL	0.9065		0.6667	0.8090	0.8646

It is interesting that splitting L into T and L improves SOV(E) and
SOV(H), though their definitions are unchanged.  The split of E into
EB and H into GH naturally improves the E and H SOV scores, since B
and G are the hardest to predict.

23 Jan 2001

Using t2k-thin90 alignments, the best EHL2 network is now
	overrep-2500-IDaa13-7-10-11-10-11-6-5-ehl2-seeded-stride-trained.net (3419 parameters)
which was built by adding an additional layer to the best EBGHTL
network and retraining
	overrep-2500-IDaa13-7-10-11-10-11-ebghtl-seeded-stride-trained.net (3326 parameters)

    alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
	EHL2	0.7980		0.7864	0.7331	0.7813
	EBGHTL	0.9232		0.6712	0.8075  0.8642

I will add an extra final layer to the EBGHTL network and see if I can
improve the EBGHTL savings some more.

24 Jan 2001

overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded-stride-trained.net (3584 parameters)
   alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
	EBGHTL	0.9360		0.6754	0.8102  0.8673

3 Feb 2001
overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded2-stride-trained.net (3584 parameters)
   alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
	EBGHTL	0.9361		0.6756	0.8118  0.8654

7 Feb 2001
overrep-2500-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded3-stride-trained.net (3584 parameters)
   alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
	EBGHTL	0.9415		0.6765	0.8121	0.8717


30 April 2001
Retrained the ebghtl network on a larger data set (3617 chains, 806942
columns), but the quality of the predictions on the training set were
not much different between the initial and final model:
overrep-3617-IDaa13-7-10-11-10-11-6-7-ebghtl-seeded-stride-trained.net (3584 parameters)
   alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
initial EBGHTL	0.9328 	        0.6749  0.8121  0.8660
  final EBGHTL	0.9325 		0.6744  0.8154  0.8654

(Training for 60 iterations made no improvements, then equilibration
kicked the network away from the optimum and training for another 180
got it back to about the same quality.)

4 June 2001

Increased size of ebghtl network and trained on 3617 chains.
Now get somewhat better network (most of the improvement is in bits saved).
 overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters)
    alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9449 		0.6779  0.8156  0.8657

7 June 2001

overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded2-stride-trained.net
is better, but I'm not sure by how much, as the .quality file was not
properly written (probably due to the disk filling up).  I'll do an
evaluation and some retraining to see how good it is.

12 June 2001

overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded2-stride-trained.net (3737 parameters)
    alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9473 		0.6781  0.8120  0.8675

overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded3-stride-trained.net (3737 parameters)
    alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9482 		0.6785  0.8139  0.8697


23 August 2001

I noticed that the NMR chains in the overrep-3617 set were much worse
predicted than the X-ray chains.  There are two possible explanations:
   1) the x-ray chains are over-represented, and the network is overtrained
   2) the NMR chains have a lot of incorrect 2ry structure assigned by
      STRIDE, because of insufficient data to resolve the structure fully
I favor the second explanaton, so I built a new training set from just
the Dunbrack culled x-ray sequences. (dunbrack-2752)

The first run with this training set (actually with dunbrack-2751,
since one of the t2k alignments had run out of memory and not been
built yet) improved from

overrep-3617-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded3-stride-trained.net (3737 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9610		0.6814	0.8091	0.8708

dunbrack-2751-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9687		0.6838	0.8125	0.8724

The beta bridges (B) are the hardest thing to predict---when something
is really a B it is most likely to be predicted as T,C,E,H,G,B (B is
the least likely pediction, and only H has a probability lower than
the background).  G is also hard, with order THCGEB, and H is the easiest.
E gets a bit confused with C and a little less with C, and CT get very
confused with each other.


The same network, retrained on full dunbrack-2752 doesn't change, and
results are almost identical:
dunbrack-2752-IDaa13-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9680		0.6836	0.8124	0.8723


15 Nov 2001 Kevin Karplus

Trying to use the new dist.20comp regularizer and reducing bits saved
to 1.0 does not help.  After 120 epochs, the quality for
 dunbrack-2752-IDaa10-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters)
is only
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9571		0.6810	0.8082	0.8659
(not bad, but not as good as the IDaa13 network it was retrained from).
It looks like it would take another 1000 epochs to get as good as the
IDaa13 network.

16 Nov 2001 Kevin Karplus

Eliminating a regularizer, and just using Hennikoff sequence weighting
hurts even more.  After 120 epocs, the quality for
dunbrack-2752-IDaa-7-10-11-10-11-7-7-ebghtl-seeded-stride-trained.net (3737 parameters)
is only
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9484		0.6769	0.8087	0.8705

17 Nov 2001 Kevin Karplus

New best network for EBGHTL
# dunbrack-2752-IDaa13-7-10-11-10-11-7-9-ebghtl-seeded-stride-trained.net (3821 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9697		0.6834	0.8096	0.8693
The main gains seem to be in G and L.  Further training would most
likely result in futher improvements---this does not seem to have
converged yet.

It might be worth trying to remove the H2 hidden unit from the
penultimate layer, as it does not seem to be doing much.

19 Nov 2001 Kevin Karplus
New best network for EBGHTL
# dunbrack-2752-IDaa13-7-10-11-10-11-6-11-ebghtl-seeded-stride-trained.net (3728 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9718		0.6847	0.8143	0.8712
This seems close to convergence.
To get further improvements, it might be worth adding a 5th layer, perhaps
narrowing the windows on the 2nd and 3rd layers.

21 Nov 2001 Kevin Karplus
New best network for stride EBGHTL
# dunbrack-2752-IDaa13-7-10-9-10-9-6-11-6-11-ebghtl-seeded-stride-trained2.net (3810 parameters)
     alphabet	bits_saved	Q_n	SOV(E)	SOV(H)
      EBGHTL	0.9719		0.6855	0.8115	0.8708
This one has a slightly better Q6 and slightly worse SOV score than
the previous best.

24 Nov 2001 Kevin Karplus
I tried removing the pseudocounts from the best network for stride
EBGHTL (it only had pseudocounts on the first layer).  After 150
epochs, the network had almost, but not quite, recovered from the
modification.	I will probably take another 300 epochs to recover
fully (if it can).

26 Nov 2001 Kevin Karplus
New best network (slightly better SOV than before, which overweights
	slightly worse bits and Q_n)
# dunbrack-2752-IDaa13-7-10-11-10-9-6-9-6-9-ebghtl-seeded-stride-trained.net (3866 parameters)
     alphabet	bits_saved	Q_n	SOV	SOV(E)	SOV(H)
      EBGHTL	0.9712		0.6852	0.6867	0.8103	0.8718
Note: this number of paramters is misleading, as this network has all
0 pseudocounts and has been centered.

27 Nov 2001 Kevin Karplus
New best network for STRIDE.  SOV slips a little, but
bits saved and Q6 are new bests.
# dunbrack-2752-IDaa13-7-10-11-10-9-6-9-6-9-ebghtl-seeded-stride-trained2.net (3366 degrees of freedom)
     alphabet	bits_saved	Q_n	SOV	SOV(E)	SOV(H)
      EBGHTL	0.9733		0.6857	0.6860	0.8089	0.8749
Note: now reporting degrees of freedom rather than number of
parameters, but this network has identical structure to the previous
best---just more training.

30 Nov 2001 Kevin Karplus
New best network for STRIDE.  All measures of quality are up.
# dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net (3521 degrees of freedom)
     alphabet	bits_saved	Q_n	SOV	SOV(E)	SOV(H)
      EBGHTL	0.9762		0.6867	0.6880	0.8135	0.8747

1 Dec 2001 Kevin Karplus
Starting some experiments in training neural nets complete from
scratch (no seeding).  My first attempt is a 3-level
network (IDaa13-7-14-7-12-11-ebghtl) slightly larger than my best
5-level network, but I'll also try training a 5-level network with the
same architecture as the best current network, to see how much benefit
is gained from the seeding and very long training that the best
network has.

3 Dec 2001 Kevin Karplus

After 260 epochs, the network trained from scratch has still not
really converged:
# dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained.net (3740 degrees of freedom)
     alphabet	bits_saved	Q_n	SOV	SOV(E)	SOV(H)
      EBGHTL	0.9269		0.6735	0.6687	0.8047	0.8552
It may be that the learning rate is too slow.  I may have to try again
with faster learning.  I tried training a network from
IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-empty with faster parameters, and
already at 42 epochs it is doing better than
dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained.net after 260.
I don't know whether this improvement is due to the structure or the
learning parameters as both were changed at once.

6 Dec 2001 Kevin Karplus
After 260 epochs the
# dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty.net (3521 degrees of freedom)
network has gotten pretty good---about as good as the best network in
August or September, before I started tweaking the architecture and
doing really long training runs.
     alphabet	bits_saved	Q_n	SOV	SOV(E)	SOV(H)
      EBGHTL	0.9623		0.6846	0.6880	0.8101	0.8782
I should try doing a second training run on this network, perhaps with
slightly slower learning rate, to see if I can match the best so far.
I should also try a couple of other empty networks---say one with more
hidden units and one with wider windows, but about the same number of parameters.
maybe IDaa13-5-14-7-10-9-8-11-ebghtl and IDaa13-9-8-11-8-11-8-11-8-11-ebghtl.

It would also be interesting to see how jurying several independently
trained networks compares to one network of larger size.  Do the
independently trained networks learn the same things?  Obviously some
of the patterns, such as the middle of an amphipathic helix, will be
learned by almost any network, but do the networks differ on the more
difficult cases?  Assuming that each network has been trained long
enough that its assessment of probabilities is fairly accurate, how
should we combine multiple probability vectors?

Some possibilities:
    *	average the probability vectors.
		advantages: result is a probability vector
			Can use linear algebra to optimize weighting
			of multiple predictors.
		disadvantages:	does not give more weight to more
			confident predictions.	There is a big
			difference between being 80% sure and 99% sure
			that something is a helix.  Predictions will
			never be stronger than the strongest single
			predictor (though which predictor is strongest
			may vary from position to position, so overall
			result could still be better than a single predictor).

    * average the log P(x|data)/P(x) scores, add log P(x),
      exponentiate, and rescale to make probability vector.
		advantages: large deviations from background
		      frequencies are taken advantage of.
		disadvantages: predictions will never be stronger than
		      the best single predictor.  Figuring out weighting
		      for different predictors may be a bit messier,
		      because of the rescaling.

11 Dec 2001 Kevin Karplus

I now have several independently trained networks (on the same
training set) with different architectures

Epochs	bits/saved Q6	SOV    "overall" SOV(H)	SOV(E)

 [trained from old networks---lots of training]
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained
 many	0.9762	0.6867	0.6880	2.0068	0.8135	0.8747

 [Same structure as best network, but trained from random start]
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty
 260	0.9623	0.6846	0.6880	1.9909	0.8101	0.8782	(faster params)

 dunbrack-2752-IDaa13-7-14-7-12-11-ebghtl-stride-trained
 260	0.9269	0.6735	0.6687	1.9347	0.8047	0.8552	(fast params)

 [The following 2 are related---the second is trained from the first.]
 dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty
 260	0.9634	0.6832	0.6822	1.9877	0.8179	0.8710	(faster params)
 dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-2 (from previous)
260+250 0.9696	0.6850	0.6845	1.9968	0.8119	0.8723

It looks like total training time is more important than how the
network is structured, though total number of parameters may be
important---these networks were deliberately chosen to have roughly
the same number of parameters.

Perhaps I should go to a "simulated annealing" style of
training---after each epoch, keep the new network if it is better on
the crosstraining set than the old one, or with probability dependent
on the loss of quality if it is worse.


13 Dec 2001 Kevin Karplus

Further training of
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty
to produce
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-2
has improved it to be almost as good as
 dunbrack-2752-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-2
but at the current rate of improvement, it looks like it would take
another 1000 epochs to get as good as the best trained network.

 [Same structure as best network, but trained from random start]
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty
 260	0.9623	0.6846	0.6880	1.9909	0.8101	0.8782	(faster params)
 dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-2
260+270 0.9660	0.6855	0.6887	1.9958	0.8073	0.8786

I have implemented the "simulated annealing" style of training and
will test it soon.  I'd like to set up a train/cross-train/test set
first.

From compbio-request@services.cse.ucsc.edu  Mon Dec 17 10:41:37 2001
Date: Mon, 17 Dec 2001 10:41:25 -0800
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: compbio@soe.ucsc.edu
Subject: secondary structure prediction test set


I have set up a train/cross-train/test split for testing secondary
structure predictors (or other local properties of proteins).

The set consists of 1759 proteins, with 20% or less pairwise residue
identity, and with x-Ray structures having resolution <= 3.0 and
R-factor <=1.0 [taken from Dunbrack's Culled-PDB website].  Dunbrack's
set has been further pruned to elminate any sequences which are
fragmentary (all fragments less than 20 residues long).
All these chains have t2k alignments in the pcem/pdb subdirectories.

The set has been randomly partitioned into three equal-sized sets:
	dubrack-1759-1
	dubrack-1759-2
	dubrack-1759-3

To use the set, you can do 3-fold cross-validation in one of two ways
	
1) train on 1+2, test on 3
   train on 1+3, test on 2
   train on 2+3, test on 1
   Results should be reported as the overall performance on the test set.

2) If you have a "cross training" method, where you train on one set
   of data and select the best performing network (or other trained
   machine) on a different set of data, you should do 6 runs, for the
   six permutations of the sets:

   train on 1	select on 2	test on 3
   train on 1	select on 3	test on 2
   train on 2	select on 1	test on 3
   train on 2	select on 3	test on 1
   train on 3	select on 1	test on 2
   train on 3	select on 2	test on 1

   Again, results should be reported as the average performance on the
   test sets.

--------------------
I have done one run training a neural net for the STRIDE EBGHTL
classification (train on 1	select on 2	test on 3).

After 320 epochs, I got

data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
# null bits=0
 train	    0.9123  0.6762  0.6719  1.9244  0.8110  0.3639  0.2035  0.8512  0.5860  0.4378
crosstrain  0.8148  0.6472  0.6472  1.7856  0.7721  0.3913  0.2193  0.8342  0.5694  0.4239
test	    0.8057  0.6452  0.6430  1.7724  0.7780  0.3746  0.2310  0.8348  0.5656  0.4205

There is more overtraining than I had expected, and the neural network
has converged to within 0.0007 of the final result (on the crosstrain
set) within 160 epochs.	 There is also more difference between the
crosstrain set and the test set than I expected.

I'll have to do the other permutations, to make sure that I haven't
just picked up a difference in difficulty between the three sets of chains.

I'll probably also pick one (or all) of the resulting networks to
retrain on the entire dataset.	The resulting network(s) will, of
course, look like they do better than networks trained with a clean
separation of training and testing data, and should generalize to
previously unseen data at least as well.


20 Dec 2001 Kevin Karplus

The IDaa13-5-14-7-10-9-8-11-ebghtl network structure does better than the
IDaa13-7-10-11-11-9-6-9-6-9-ebghtl network structure, at least on the
dunbrack-1795 set with dunbrack-1795-1 as the training set and
dunbrack-1795-2 as the cross-training set.  I should probably try a
variety of different architectures, allowing the size to vary, before
picking the best one on the cross-training data.

# dunbrack-1795-12-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-stride-from-empty-try2.net (3521 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.9123  0.6762  0.6719  1.9244  0.8110  0.3639  0.2035  0.8512  0.5860  0.4378
crosstrain  0.8148  0.6472  0.6472  1.7856  0.7721  0.3913  0.2193  0.8342  0.5694  0.4239
test	    0.8057  0.6452  0.6430  1.7724  0.7780  0.3746  0.2310  0.8348  0.5656  0.4205

# dunbrack-1795-12-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty.net (3382 degrees of freedom)
train	    0.9091  0.6717  0.6658  1.9136  0.8163  0.3639  0.2632  0.8428  0.5838  0.4285
crosstrain  0.8322  0.6517  0.6484  1.8081  0.7886  0.3917  0.2603  0.8338  0.5773  0.4156
test	    0.8260  0.6506  0.6472  1.8003  0.7917  0.3750  0.2661  0.8364  0.5679  0.4144

21 Dec 2001 Kevin Karplus

Making the windows of uniform width (about the same number of
parameters) doesn't help:

# dunbrack-1795-12-IDaa13-7-14-7-10-7-8-7-ebghtl-stride-from-empty.net (3654 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.9087  0.6749  0.6706  1.9189  0.8186  0.3639  0.2152  0.8419  0.5944  0.4328
crosstrain  0.8203  0.6483  0.6446  1.7909  0.7861  0.3913  0.2300  0.8292  0.5768  0.4130
test	    0.8172  0.6492  0.6428  1.7878  0.7941  0.3746  0.2422  0.8324  0.5614  0.4199

There seem to be two directions to go---try all windows wide and try
all windows narrow.  I'll try narrow first, since it will train quicker.

22 Dec 2001 Kevin Karplus

Making the windows narrower hurts

# dunbrack-1795-12-IDaa13-5-14-5-10-5-8-5-ebghtl-stride-from-empty.net (2610 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
crosstrain  0.8162  0.6469  0.6420  1.7842  0.7801  0.3913  0.2399  0.8257  0.5779  0.4396

Note: because of disk problems on /projects/compbio2, the final
network was not written, and the test and final training results are
not available.	Still, it is clear that the crosstraining topped out
much lower here, so an all-narrow network is not a winner (unless it
is much bigger, maybe).

I could try an all-wide network next, or I could try a tapered network
with more hidden units.	 I think that increasing the number of hidden
units is more likely to help, so I'll try that first.


23 Dec 2001 Kevin Karplus

Increasing the number of hidden units does not seem to have
helped---it just increased the gap between training and cross-training
sets:

# dunbrack-1795-12-IDaa13-5-16-7-16-9-12-11-ebghtl-stride-from-empty.net (5574 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.9214  0.6758  0.6730  1.9337  0.8174  0.3639  0.2385  0.8436  0.5969  0.4431
crosstrain  0.8276  0.6493  0.6492  1.8015  0.7852  0.3913  0.2444  0.8316  0.5861  0.4213
test	    0.8170  0.6481  0.6445  1.7874  0.7853  0.3746  0.2499  0.8345  0.5752  0.4163

Let's try a network with about half as many weights per layer.

# dunbrack-1795-12-IDaa13-5-9-7-15-9-8-17-ebghtl-stride-from-empty.net (3387 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.9074  0.6735  0.6690  1.9155  0.8143  0.3639  0.2197  0.8451  0.5771  0.4504
crosstrain  0.8344  0.6516  0.6489  1.8105  0.7830  0.3913  0.2277  0.8401  0.5586  0.4311
test	    0.8269  0.6505  0.6432  1.7990  0.7876  0.3746  0.2431  0.8329  0.5527  0.4270

This is the best so far on the cross-training set, but the test set
doesn't do as well as the previous best, indicating that we may be in
the region of diminishing returns.  Still, it is unfair to use the
test set to make any decisons---that contaminates the test.  So we'll
ignore the test set result and just concentrate on improving the
cross-training set.

We could try again with only 3 layers, but about the same number of
parameters per layer.

25 Dec 2001

# dunbrack-1795-12-IDaa13-5-9-7-15-11-ebghtl-stride-from-empty.net (2587 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.8932  0.6693  0.6568  1.8909  0.8033  0.3639  0.2285  0.8308  0.5816  0.4402
crosstrain  0.8165  0.6483  0.6468  1.7881  0.7786  0.3913  0.2391  0.8297  0.5797  0.4334
test	    0.8079  0.6460  0.6391  1.7735  0.7810  0.3750  0.2389  0.8297  0.5721  0.4283

Hmm: not nearly as good, but I don't know if the problem is having
fewer layers, or just fewer parameters.	 Let's try a smaller 4-layer
network.

26 Dec 2001

# dunbrack-1795-12-IDaa13-5-7-7-14-9-6-21-ebghtl-stride-from-empty.net (2463 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.8873  0.6656  0.6617  1.8838  0.8004  0.3639  0.2283  0.8403  0.5753  0.4300
crosstrain  0.8051  0.6428  0.6419  1.7689  0.7753  0.3913  0.2357  0.8341  0.5663  0.4130
test	    0.8007  0.6417  0.6379  1.7614  0.7831  0.3746  0.2452  0.8293  0.5547  0.4115

This smaller 4-layer network is terrible---worse overtraining than the
three-layer network.  Maybe I should try a bigger 3-layer network:

27 Dec 2001
# dunbrack-1795-12-IDaa13-5-11-7-16-13-ebghtl-stride-from-empty.net (3295 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.8969  0.6704  0.6625  1.8986  0.8065  0.3641  0.2154  0.8390  0.5803  0.4520
crosstrain  0.8271  0.6501  0.6450  1.7997  0.7817  0.3913  0.2272  0.8291  0.5746  0.4384
test	    0.8202  0.6495  0.6407  1.7901  0.7807  0.3746  0.2389  0.8297  0.5617  0.4333

This 3-layer network is respectable, but not as good as the 4-layer
network of about the same size.	 Perhaps I should try a 5-layer
network of about the same size:

28 Dec 2001
# dunbrack-1795-12-IDaa13-5-7-7-15-9-6-11-11-13-ebghtl-stride-from-empty.net (3396 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	    0.9140  0.6716  0.6707  1.9209  0.8211  0.3639  0.2417  0.8496  0.5798  0.4280
crosstrain  0.8261  0.6469  0.6466  1.7962  0.7884  0.3913  0.2399  0.8310  0.5733  0.4118
test	    0.8081  0.6448  0.6419  1.7738  0.7861  0.3742  0.2502  0.8337  0.5483  0.4170


WARNING: two files (4mt2 and 1lst, both in the test set) were not
included in this run because their info directories were messed up in
the latest rebuild of the template libraires.  This doesn't affect the
training or cross-training results, though.  Of course, changes in
some of the template alignments may affect accuracy slightly.

29 Dec 2001
This is again respectable, but not as good as the 4-layer one.	The
overtraining (difference between train and crosstrain) is greater, so
making the network larger probably wouldn't help.


Let's try a different approach, makng a network with 4 layers, tapered
window sizes, but keep the number of hidden units roughly equal,
rather than number of weights.	Let's try 11 hidden units with 5,7,9,11
windows.


# dunbrack-1795-12-IDaa13-5-11-7-11-9-11-11-ebghtl-stride-from-empty.net (3465 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	      0.9173  0.6773  0.6729  1.9311  0.8147  0.3639  0.2486  0.8454  0.5911  0.4419
crosstrain    0.8260  0.6493  0.6470  1.7988  0.7834  0.3913  0.2451  0.8298  0.5754  0.4214
test	      0.8153  0.6484  0.6392  1.7833  0.7829  0.3746  0.2487  0.8306  0.5597  0.4206

Again, respectable, but not as good as the best 4-layer network, which
actually has fewer weights.

The cross-training objective function (evaluated on set
dunbrack-1795-2) has ranged from 1.7640 (for
IDaa13-5-7-7-14-9-6-21-ebghtl) to 1.8105 (for
IDaa13-5-9-7-15-9-8-17-ebghtl).

The three networks with objective >1.8 are
    dunbrack-1795-12-IDaa13-5-16-7-16-9-12-11-ebghtl-stride-from-empty	1.8015
    dunbrack-1795-12-IDaa13-5-14-7-10-9-8-11-ebghtl-stride-from-empty	1.8081
    dunbrack-1795-12-IDaa13-5-9-7-15-9-8-17-ebghtl-stride-from-empty	1.8105
All are 4-level networks with windowing 5,7,9,(11 or 17).

The test result objectives vary from 1.7614 to 1.8003---not exactly
matching the ranking by cross-training results, though the top two for
crosstraining are also the top two for testing.

Let's try one more 4-layer network with roughly equal numbers of
weights, but make the windows 3,5,7,17 instead of 5,7,9,17.

30 Dec 2001

The narrower windows seem to work quite well, with objective function
> 1.8 (third best network, so far).

# dunbrack-1795-12-IDaa13-3-14-5-13-7-10-17-ebghtl-stride-from-empty.net (3367 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train	   0.9222  0.6762  0.6691  1.9330  0.8109  0.3639  0.2478  0.8514  0.5757  0.4518
crosstrain 0.8313  0.6516  0.6465  1.8062  0.7829  0.3913  0.2553  0.8374  0.5671  0.4296
test	   0.8254  0.6497  0.6436  1.7969  0.7833  0.3746  0.2595  0.8388  0.5529  0.4232


I suspect that the last window is too wide though, and we may
do better by narrowing it down to 11, or even 9.  I wonder how we
would do with even narrower windows: 1,3,5,7 or 1,3,5,9.
Perhaps 1-40 3-8 5-22 7-ebghtl?	  Hmm---that doesn't make much
sense---why recode a single position into more than the 22 inputs?
How about 3-16 5-13 7-10 11-ebghtl?

It seems that this narrow network does not get as good.
# dunbrack-1795-12-IDaa13-3-16-5-13-7-10-11-ebghtl-stride-from-empty.net (3319 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.8905  0.6677  0.6626  1.8894  0.8141  0.3639  0.2432  0.8436  0.5813  0.4274
crosstrain 0.8269  0.6494  0.6449  1.7988  0.7828  0.3913  0.2578  0.8322  0.5810  0.4131
test       0.8205  0.6487  0.6396  1.7890  0.7881  0.3750  0.2536  0.8293  0.5652  0.4172


Next thing to try---widening the window on the penultimate layer?
Maybe 3-14 5-13 9-10 13-ebghtl?

1 Jan 2002
# dunbrack-1795-12-IDaa13-3-14-5-13-9-10-13-ebghtl-stride-from-empty.net (3401 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.8945  0.6697  0.6664  1.8974  0.8107  0.3639  0.2327  0.8407  0.5877  0.4371
crosstrain 0.8303  0.6505  0.6473  1.8045  0.7853  0.3913  0.2379  0.8272  0.5830  0.4250
test       0.8149  0.6469  0.6453  1.7844  0.7852  0.3746  0.2456  0.8261  0.5713  0.4193

Very good, but not as good as   IDaa13-3-14-5-13-7-10-17-ebghtl, which
is still not as good as the best (IDaa13-5-9-7-15-9-8-17-ebghtl),
though overtraining is less on the network with the narrower last
layer.  Hmm, how about narrowing the last layer on the best network,
and adding hidden units to earlier layers?  Maybe
5-10-7-15-9-10-13-ebghtl?

YES!  after only 70 epochs,
dunbrack-1795-12-IDaa13-5-10-7-15-9-10-13-ebghtl-stride-from-empty.net
is already the best network on the cross-training set.
So where next?  Keep the 5,7,9,13 windows and gradually increase the
number of hidden units---maybe IDaa13-5-15-7-15-9-15-13-ebghtl?

2 Jan 2002

# dunbrack-1795-12-IDaa13-5-10-7-15-9-10-13-ebghtl-stride-from-empty.net (3835 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9291  0.6812  0.6775  1.9491  0.8124  0.3639  0.2412  0.8583  0.6023  0.4345
crosstrain 0.8364  0.6518  0.6512  1.8138  0.7827  0.3913  0.2405  0.8374  0.5926  0.4126
test       0.8209  0.6500  0.6472  1.7945  0.7857  0.3746  0.2537  0.8401  0.5781  0.4044

Best so far on crosstrain, but overtraining is getting big, so (as
predictable) the test set shows no improvement.  

Hmm, is that really pedictable?
Let's look at the results for the 14 different networks trained for dunbrack-1795-12.

If we do a linear fit on the bits (cross-test) = m (train-cross) + b, we get
m = 0.0489 +- 0.112    b = 0.005 +- 0.009
So the data doesn't really support predicting overtraining on the test
set based on the difference between the crosstrain and the training set.

Doing a linear fit on the bits, test= m cross+b, we get
	 m=0.8085 +-.1396, b=0.1489 +- .115 
We would ideally like m=1, b=0, but at least m is fairly large.

Doing the fit on the bits, test = m train+b, we get
	m=0.2181+-0.166  b=0.6183+-0.151
with a much smaller slope, so selecting on the crosstraining set does
help on the test set.

Doing a linear fit on the objective, test= m cross+b, we get
	m=0.867544+-0.1276	b=0.225452+-0.2294
	rms residual 0.00539263
The fit looks pretty good, but the highest value on the test set
occurs for the third highest value on the crosstraining set.
Doing a linear fit on the objective, test= m train+b, we get
	m=0.24079+-0.1526 b=1.32433+-0.2921
	rms residuals 0.0108099
with a much lower slope and poorer fit, so selection on the
crosstraining set seems to help. 

I can improve the fit very slightly for the objective
	test = m (cross-0.051 train)+b
	m=0.908658+-0.1318	b=0.240243+-0.2241	
	rms residual = 0.00533249
It isn't clear that adding the extra parameter is worth the tiny
reduction in the residual.  Selecting on cross-0.051*train does not
change the order of the top few networks, so would have no practical
effect. 

3 January 2002

New best on cross training set!
# dunbrack-1795-12-IDaa13-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9263  0.6780  0.6725  1.9405  0.8130  0.3639  0.2321  0.8480  0.5864  0.4494
crosstrain 0.8376  0.6522  0.6488  1.8142  0.7832  0.3913  0.2416  0.8398  0.5747  0.4231
test       0.8254  0.6517  0.6456  1.7999  0.7877  0.3746  0.2475  0.8378  0.5571  0.4259

4 Jan 2002
Increasing the windows (and thus the number of parameters) doesn't help:
# dunbrack-1795-12-IDaa13-7-15-9-15-11-15-13-ebghtl-stride-from-empty.net (7331 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9205  0.6759  0.6724  1.9326  0.8120  0.3639  0.2068  0.8443  0.6097  0.4128
crosstrain 0.8249  0.6515  0.6493  1.8010  0.7868  0.3913  0.2211  0.8254  0.6069  0.4011
test       0.8212  0.6519  0.6473  1.7968  0.7891  0.3746  0.2330  0.8330  0.6003  0.3956

We could try increasing the number of hidden units further.

6 jan 2002
Increasing by 2 on each layer doesn't help---overtraining gets worse:
# dunbrack-1795-12-IDaa13-5-17-7-17-9-17-13-ebghtl-stride-from-empty.net (7217 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9348  0.6805  0.6715  1.9510  0.8195  0.3639  0.2303  0.8447  0.5979  0.4439
crosstrain 0.8284  0.6510  0.6459  1.8023  0.7854  0.3913  0.2377  0.8335  0.5758  0.4162
test       0.8209  0.6502  0.6465  1.7944  0.7923  0.3746  0.2450  0.8368  0.5712  0.4148

I just noticed a naming error in all the tests so far:  the "IDaa13"
should be "IDaa14" since they have actually been trying to save 1.4
bits per position, not 1.3.

Let's try using the best architecture so far, but changing the sequence weighting.

7 jan 2002

# dunbrack-1795-12-IDaa10-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9269  0.6779  0.6727  1.9411  0.8078  0.3639  0.2432  0.8402  0.5972  0.4498
crosstrain 0.8320  0.6516  0.6511  1.8091  0.7822  0.3913  0.2463  0.8333  0.5834  0.4283
test       0.8296  0.6515  0.6505  1.8063  0.7844  0.3746  0.2592  0.8395  0.5708  0.4284

Using lower sequence weights hurts a little on crosstraining (but
actually helps on the test set, getting a new best there).

8 Jan 2002

Trying intermediate weights gets intermediate results
# dunbrack-1795-12-IDaa12-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
crosstrain 0.8347  0.6522  0.6473  1.8106  0.7870  0.3913  0.2441  0.8333  0.5824  0.4111

(training and testing set results are not available, because of
file-write problems)

NOTE: I have renamed all the quality report and network files to
correctly have IDaa14 rather than IDaa13 in the names, but the
contents of the files have not been changed, so still have the
incorrect IDaa13 internally.

Let's try increasing the bit savings more (increasing the weights).

9 Jan 2002

# dunbrack-1795-12-IDaa16-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9268  0.6786  0.6711  1.9409  0.8160  0.3639  0.2595  0.8514  0.5796  0.4258
crosstrain 0.8402  0.6538  0.6541  1.8210  0.7930  0.3913  0.2537  0.8398  0.5756  0.4191
test       0.8315  0.6530  0.6433  1.8061  0.7869  0.3746  0.2631  0.8362  0.5601  0.4109

Increasing the weights to 1.6 bits/position is a new best.  Maybe the
weights should be increased even more---perhaps even using just
relative and not absolute weighting?  Already at 1.6 bits/position we
have 1131 sequences clipping so that the average weight is 1.  We
should try three other networks: one that uses Henikoff weighting with
number of sequences as total weight (thin90 alignment), one that uses
Henikoff weighting with no regularizer (thin90 alignment), and one
that uses a thin62 alignment.


10 Jan 2002

I tried using Henikoff weighting with total weight equal number of
sequences and no regularizer.  For the first 70 epochs, it followed
the other networks with the same architecutre quite closely, but on
epoch 70, the "CenterWeights" command made a big change to the network
which it never recovered from.  There is really gross overtraining
happening also, which may be why it couldn't recover on the
cross-training set.

# dunbrack-1795-12-IDaaH-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9530  0.6837  0.6777  1.9756  0.8130  0.3639  0.2404  0.8691  0.5861  0.4302
crosstrain 0.8080  0.6460  0.6378  1.7729  0.7703  0.3913  0.2426  0.8288  0.5696  0.4068
test       0.7950  0.6455  0.6372  1.7591  0.7727  0.3746  0.2452  0.8409  0.5606  0.3973


11 Jan 2002

Using a ReRegularizer after doing the Henikoff weighting seems to be
more robust, and results in a new BEST score on the cross-training set
(already best by epoch 90).  

12 Jan 2002
# dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9442  0.6836  0.6738  1.9648  0.8115  0.3639  0.2769  0.8528  0.5842  0.4503
crosstrain 0.8465  0.6549  0.6545  1.8287  0.7837  0.3913  0.2722  0.8349  0.5700  0.4375
test       0.8330  0.6538  0.6446  1.8091  0.7802  0.3746  0.2736  0.8351  0.5575  0.4333

The next thing to do may be to try different regularizers, trained for
more or less generalization with weight=1, to see if that makes a
difference.  Let's go for less generalization first--say with one of
the mixtures I trained for Rolf Olsen, maybe rolfT1AG.26comp.


13 Jan 2002
# dunbrack-1795-12-IDaaHt-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9402  0.6815  0.6743  1.9589  0.8130  0.3639  0.2590  0.8514  0.5873  0.4415
crosstrain 0.8377  0.6537  0.6494  1.8161  0.7837  0.3913  0.2566  0.8287  0.5764  0.4274
test       0.8205  0.6501  0.6413  1.7913  0.7824  0.3746  0.2648  0.8311  0.5629  0.4256

The rolfT1AG.26comp regularizer seems to have done worse.  Let's try
one that generalizes MORE, like dist.20comp.

14 Jan 2002

# dunbrack-1795-12-IDaaHd-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9342  0.6804  0.6724  1.9508  0.8064  0.3639  0.2743  0.8509  0.5837  0.4454
crosstrain 0.8392  0.6532  0.6525  1.8187  0.7825  0.3913  0.2735  0.8320  0.5678  0.4359
test       0.8255  0.6516  0.6417  1.7979  0.7816  0.3746  0.2745  0.8323  0.5583  0.4247

Nope, dist.20comp is also worse than recode3.20comp, though slightly
better than rolfT1AG.26comp.  I suppose I could check recode4.20comp or recode5.20comp.


15 Jan 2002

recode4.20comp doesn't do as well as recode3.20comp:

# dunbrack-1795-12-IDaaHr4-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9225  0.6779  0.6681  1.9345  0.8116  0.3639  0.2593  0.8436  0.5834  0.4326
crosstrain 0.8413  0.6537  0.6489  1.8194  0.7902  0.3913  0.2557  0.8306  0.5676  0.4226
test       0.8312  0.6533  0.6413  1.8052  0.7827  0.3746  0.2687  0.8370  0.5566  0.4187

17 Jan 2002

Dropping the insert-delete information hurts quite a bit:
# dunbrack-1795-12-aaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5735 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9311  0.6782  0.6703  1.9443  0.8111  0.3639  0.2411  0.8586  0.5887  0.4267
crosstrain 0.8262  0.6489  0.6418  1.7961  0.7821  0.3913  0.2401  0.8352  0.5775  0.3991
test       0.8177  0.6499  0.6406  1.7879  0.7807  0.3750  0.2475  0.8416  0.5635  0.3980

19 Jan 2002

Oops, in attempting to test thinning to 62% and no thinning, I
accidentally overwrote the best network.  I'll have to redo
dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net

Thinning to 62% is a new best on the cross training set, with less
overtraining than before.  It doesn't do any better on the test set
though, so the difference is probably not important.

# dunbrack-1795-12-IDaaHr62-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C
train      0.9303  0.6794  0.6693  1.9443  0.8112  0.3639  0.2527  0.8511  0.5846  0.4268
crosstrain 0.8469  0.6562  0.6537  1.8300  0.7843  0.3913  0.2552  0.8344  0.5879  0.4137
test       0.8309  0.6513  0.6427  1.8035  0.7820  0.3746  0.2659  0.8391  0.5638  0.4126


21 Jan 2002

Not thinning is NOT a good idea:
# dunbrack-1795-12-IDaaHrall-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9344  0.6811  0.6775  1.9543  0.8147  0.3639  0.2529  0.8590  0.5885  0.4348
crosstrain 0.8374  0.6521  0.6469  1.8130  0.7844  0.3913  0.2432  0.8273  0.5825  0.4207
test       0.8221  0.6511  0.6448  1.7956  0.7823  0.3746  0.2552  0.8317  0.5712  0.4213

So should we thin to 62%? 50%? 75%?

One minor point---I just found out today that 1fznD is not a good
structure (mis-solved).  It is currently in set dunbrack-1795-2.

22 Jan 2002

After removing the bad 1fznD from set 2, retraining the best thin90
network on set 1 gives a new best on the cross-training set (and on
the test set):

# dunbrack-1795-12-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9372  0.6814  0.6725  1.9549  0.8152  0.3639  0.2603  0.8522  0.5952  0.4207
crosstrain 0.8496  0.6557  0.6523  1.8315  0.7924  0.3920  0.2611  0.8303  0.5857  0.4126
test       0.8338  0.6539  0.6434  1.8093  0.7862  0.3746  0.2655  0.8330  0.5707  0.4120

8 Feb 2002

Training on set 2 and cross training on set 3:

# dunbrack-1795-23-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train	   0.9430  0.6794  0.6783  1.9615  0.8095  0.3920  0.2522  0.8621  0.5890  0.4584
crosstrain 0.8268  0.6525  0.6480  1.8033  0.7869  0.3746  0.2513  0.8395  0.5593  0.4355
test       0.8421  0.6584  0.6534  1.8273  0.7922  0.3639  0.2330  0.8273  0.5709  0.4354

Hmm, it looks like set 3 is harder, since the test on set 1 gets
better score than the cross-train on set 3.

9 Feb 2002

Training on set 3 cross-training on set 1:

# dunbrack-1795-31-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train	   0.9314  0.6790  0.6715  1.9461  0.8019  0.3750  0.2625  0.8655  0.5899  0.4419
crosstrain 0.8379  0.6579  0.6536  1.8226  0.7888  0.3639  0.2332  0.8237  0.5750  0.4248
test       0.8352  0.6524  0.6473  1.8113  0.7790  0.3920  0.2498  0.8309  0.5736  0.4319

10 Feb 2002

Training on set 2, cross-training on set 1:

# dunbrack-1795-21-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9338  0.6764  0.6744  1.9475  0.8084  0.3920  0.2481  0.8628  0.5885  0.4437
crosstrain 0.8440  0.6583  0.6537  1.8292  0.7933  0.3639  0.2329  0.8338  0.5769  0.4224
test       0.8280  0.6526  0.6470  1.8041  0.7890  0.3746  0.2505  0.8398  0.5650  0.4254

Note that the results on sets 1 and 3 are very similar to (but
slightly better than) when 3 was used as the cross-training set and 1
as the test set.

11 Feb 2002

training on set 3, cross-training on set 2:

# dunbrack-1795-32-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9247  0.6774  0.6685  1.9364  0.8008  0.3750  0.2624  0.8621  0.5856  0.4445
crosstrain 0.8354  0.6524  0.6487  1.8122  0.7784  0.3922  0.2496  0.8357  0.5681  0.4367
test       0.8366  0.6568  0.6538  1.8204  0.7859  0.3639  0.2336  0.8253  0.5685  0.4288


12 Feb 2002
training on 1 cross-training on 3
# dunbrack-1795-13-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9310  0.6790  0.6700  1.9450  0.8143  0.3639  0.2644  0.8446  0.5773  0.4436
crosstrain 0.8338  0.6536  0.6448  1.8098  0.7873  0.3746  0.2679  0.8308  0.5521  0.4384
test       0.8485  0.6541  0.6517  1.8284  0.7917  0.3920  0.2625  0.8303  0.5651  0.4352


Objective values:
   test	1		2		3
train:
1	1.9549,1.9450	1.8284		1.8093
2	1.8273		1.9615,1.9475	1.8041	
3	1.8204		1.8113		1.9461,1.9364


Bits saved:
   test	1		2		3
train:
1	0.9372,0.9310	0.8485		0.8338
2	0.8421		0.9430,	0.9338	0.8280	
3	0.8366		0.8352		0.9314,0.9427

13-14 Feb 2002 
Training on 2/3 of data, no cross-training.
The over-training is expected to be more severe with no early
termination from a cross-training set, but the greater diversity of
the training set may compensate for that.

# dunbrack-1795-1+2-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9962  0.6934  0.6889  2.0340  0.8210  0.3779  0.3148  0.8750  0.5941  0.4630
test       0.8414  0.6585  0.6526  1.8262  0.7875  0.3746  0.3025  0.8466  0.5613  0.4333

Training on the combination of 1 and 2 IS better than training on one
set and cross-training on the other, so I should probably train
networks for 1+3 and 2+3, and make these be the networks for testing
fold-recognition.


15 Feb 2002

# dunbrack-1795-2+3-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9707  0.6866  0.6847  1.9996  0.8046  0.3833  0.2917  0.8763  0.5806  0.4827
test       0.8730  0.6677  0.6638  1.8726  0.7846  0.3639  0.2696  0.8418  0.5564  0.4653

# dunbrack-1795-1+3-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9824  0.6914  0.6854  2.0165  0.8166  0.3693  0.3146  0.8700  0.5957  0.4650
test       0.8753  0.6640  0.6617  1.8701  0.7866  0.3920  0.3030  0.8448  0.5794  0.4454

Again, much better results than the train/cross-train splitting of the
training data.

19 Fev 2002

Training on the FULL dunbrack-1795 set (1794 chains) has not quite
converged after 320 epochs, but over-training is less than on the 1+2
set, so I would expect the quality to be at least as good as that network.

# dunbrack-1795-IDaaHr-5-15-7-15-9-15-13-ebghtl-stride-from-empty.net (5875 degrees of freedom)
data set  Bits_saved  Q6      SOV objective SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C 
train      0.9895  0.6918  0.6903  2.0263  0.8176  0.3775  0.3064  0.8755  0.5992  0.4845

The 13 worst chains with this training set are
# chainID             Bits_saved  Q6     SOV  object   count   SOV_E   SOV_B   SOV_G   SOV_H   SOV_T   SOV_C   
1f83B                -0.8095  0.2083  0.1806 -0.5109      24  1.0000  1.0000  1.0000  1.0000  0.0000  0.3611
1rypL                -0.7921  0.4198  0.4479 -0.1484     212  0.8277  0.0000  0.0000  1.0000  0.3740  0.3200
1fjgV                -0.2364  0.0833  0.0417 -0.1322      24  1.0000  0.0000  1.0000  0.0000  0.0000  0.2000
1cq4B                -0.4601  0.2609  0.1920 -0.1032      23  0.0000  0.0000  1.0000  1.0000  0.1429  0.3638
1fjhA                -0.7586  0.4746  0.4339 -0.0670     236  0.6667  1.0000  0.0000  0.4754  0.3025  0.3806
1jjuC                -0.2531  0.1899  0.2039  0.0387      79  1.0000  0.0000  0.0000  0.3529  0.2028  0.2626
1ryp2                -0.5921  0.4592  0.4791  0.1067     233  0.8986  1.0000  0.0000  1.0000  0.3824  0.3833
1lpbA                -0.3850  0.3294  0.3333  0.1111      85  0.4015  0.0000  0.0000  0.5833  0.5200  0.0600
1ijvA                -0.3956  0.3611  0.3315  0.1312      36  0.3750  1.0000  1.0000  0.0000  0.5000  0.3815
1i50L                -0.4090  0.3696  0.4826  0.2019      46  1.0000  0.0000  1.0000  1.0000  0.8941  0.2800
1qqp4                -0.1203  0.2391  0.2464  0.2420      46  1.0000  1.0000  1.0000  0.5000  0.3333  0.1333
1f32A                -0.2387  0.4016  0.3757  0.3507     127  0.3029  1.0000  0.1667  0.3953  0.4990  0.4119
1en2A                -0.1560  0.3721  0.2845  0.3583      86  0.1667  0.0000  0.2667  1.0000  0.4236  0.2708

For 1rypL, the problem seem to be that STRIDE say the long helices as
mixtures of 3-10 helices and turns.  DSSP may have been a better
choice for labeling this structure!


25 Feb 2002

Trained a network on all 5893 T2K alignments.  Several turned out to
be difficult to predict---108 had predicted probabilities worse than
background probabilities.  I have saved these in t2k-hard-stride.ids.

Looking quickly at the stride labelings, it seems that a lot of these
sequences are "turn-rich".  Perhaps the problem is with stride
labeling some helices as turns and 3-10 helices.

It would be useful to see whether DSSP has the same number of problem
sequences, and whether the sequences are the some ones.

Question: should I retrain the neural net excluding these 108
problematic sequences?  It would do much better on the OTHER
sequences, but even worse on this set.

I can look at the EBGHTL logos for the old network
dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net
and see if they are bad.


1a0hA	lots of turns.  Turns predicted, but weakly.

1a1rC	predicted strand on short peptide.  Peptide DOES seem to be in
	sheet with different chain. (STRIDE labeling wrong).

1a1tA	very weak predictions.  Lots of T and G labeling.

1ab3	strongly predicted helices, long runs of T and G
	Rather ugly NMR structure---probably helices.
	
1ac0	strands properly predicted, but mislabeled by stride

1av3	peptide with very weak prediction--misses short beta connections

1awj	prediction not perfect, but strnad prediction better than
	stride labeling.

1b29A	theoretical model! YECH!  Will remove all theoretical models
	from the template library (I thought I had already, but there
	are 49 still there!)  

1b4g	spaghetti of incompatible NMR models---not useful for training.

1b8wA	weak prediction, don't know why it is wrong.

1bcg	strong wrong predictions, don't know why. Structure looks good.

1be3	long helix, but predicted strand!  Don't know why.

1befA	some errors in prediction at N-terminus.  Rest seems to use a
	lot of B labeling where Es are predicted.  Could there be some
	slippage in the multiple alignment (some of the turns are offset
	somewhat from the prediction.

1bh1	prediction seems better than STRIDE labeling.

...

1qbfA	predicted helices, long T--G--T

...

3itr	another theoretical model!