This is the README for /projects/compbio/usr/cbarrett/predict-2nd/testing. The Makefile in this directory is intended to be run from this directory, with all of its input and output files coming from the subdirectories: Subdirectories exist for each of the different alphabets to try to predict. Within each subdirectory are quality-reports: net quality parameters generated during predict-2nd training networks: initial and trained neural network structures run-scripts: input scripts to predict-2nd There are also these quality-reports, networks, and run-scripts at the top level, but these are remnants of an older directory organization, and should not be used except for minor testing. The alphabet-specific subdirectories are (as of 30 Nov 2001) stride STRIDE EBGHTL alphabet stride-ehl STRIDE EHL2 alphabet dssp DSSP EBGHSTL alphabet ang ANG alphabet(s) a2m amino-acid distribution from single sequence training-data: files that are automatically included into files in run-scripts. These files are partitions of the training alignments into training and cross-training sets. scripts: for all scripts that are not input to predict-2nd params: Parameter sets for learning parameters of predict-2nd. plots: contains all gnuplot generated postscript files from the data in the files in quality-reports. THese are mostly out of date. ####################################################################### A note on the naming convention used. ####################################################################### THE FOLLOWING DESCRIPTION IS OUT OF DATE---IT APPLIES ONLY TO VERY OLD FILES: The general format is exemplified by how the quality reports are named: --- [-]-.quality For example: phdset-7-3-aa.quality This is a one layer net that uses training-data/phdset.training-data, a window size of 7, 3 outputs, and the input layer uses aa probabilities. phdset-7-10-5-3-comp.quality This is a two layer net that uses training-data/phdset.training-data. The first layer uses a window size of 7 and has 10 outputs. The second layer uses a window size of 5 and has 3 outputs. The input layer uses component probabilities instead of aa probs. THE MORE MODERN NAMING CONVENTION (30 Nov 2001) trainingset-inputformat-(window-hidden)*-window-outputalphabet dunbrack-2752-IDaa13-7-10-11-11-9-6-9-6-9-ebghtl-seeded-stride-trained.net training set= dunbrack-2752 input= insert+delete+amino acids, with 1.3 bits/column average savings first layer: window=7, 10 units 2nd layer: window=11, 11 units 3rd layer: window=9, 6 units 4th layer: window=9, 6 units 5th layer: window=9, output is ebghtl alphabet Extra info: starting point was an already somewhat trained net ("seeded"). This network trained on stride.