This is the README for /projects/compbio/usr/cbarrett/predict-2nd/testing.

The Makefile in this directory is intended to be run from this directory,
with all of its input and output files coming from the subdirectories:


Subdirectories exist for each of the different alphabets to try to predict.
Within each subdirectory are
	quality-reports: net quality parameters generated 
			during predict-2nd training
	networks: initial and trained neural network structures
	run-scripts: input scripts to predict-2nd

There are also these quality-reports, networks, and run-scripts at the
top level, but these are remnants of an older directory organization,
and should not be used except for minor testing.

The alphabet-specific subdirectories are (as of 30 Nov 2001)
	stride		STRIDE EBGHTL alphabet
	stride-ehl	STRIDE EHL2 alphabet
	dssp		DSSP EBGHSTL alphabet
	ang		ANG alphabet(s)
	a2m		amino-acid distribution from single sequence


training-data: files that are automatically included into files in run-scripts.
	These files are partitions of the training alignments into training
	and cross-training sets.

scripts: for all scripts that are not input to predict-2nd

params: Parameter sets for learning parameters of predict-2nd.

plots: contains all gnuplot generated postscript files from the data in
       the files in quality-reports.  THese are mostly out of date.


#######################################################################
            A note on the naming convention used.
#######################################################################
THE FOLLOWING DESCRIPTION IS OUT OF DATE---IT APPLIES ONLY TO VERY OLD FILES:
The general format is exemplified by how the quality reports are named:

<training dataset>-<windowsize1>-<numoutputs1>-
	[<windowsize2>-<numoutputs2>]-<aa,comp>.quality

For example:
 phdset-7-3-aa.quality
     This is a one layer net that uses training-data/phdset.training-data, a 
     window size of 7, 3 outputs, and the input layer uses aa probabilities.

 phdset-7-10-5-3-comp.quality
     This is a two layer net that uses training-data/phdset.training-data.
     The first layer uses a window size of 7 and has 10 outputs.
     The second layer uses a window size of 5 and has 3 outputs.
     The input layer uses component probabilities instead of aa probs.

THE MORE MODERN NAMING CONVENTION (30 Nov 2001)

trainingset-inputformat-(window-hidden)*-window-outputalphabet

dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net
    training set= dunbrack-2752
    input= insert+delete+amino acids, with 1.3 bits/column average savings
    first layer: window=7, 10 units
    2nd layer:   window=11, 11 units
    3rd layer:   window=9, 6 units
    4th layer:   window=9, 6 units
    5th layer:   window=9, output is ebghtl alphabet
    Extra info: starting point was an already somewhat trained net ("seeded").
	    This network trained on stride.