30 November 2001 Kevin Karplus

The "ang" directory contains networks, scripts, and quality reports
for neural nets attempting to predict the secondary structure defined
by partitioning phi/psi space according to a scheme devised by
Bystroff in the HMMSTR paper.  Each phi/psi angle maps to the closest
of the following points:		
	  phi		psi	     Bystroff's code > our code
	  {  -61.91 ,  -45.20 },   // H
	  { -109.78 ,   20.88 },   // G
	  {  -70.58 ,  147.22 },   // B
	  { -132.89 ,  142.43 },   // E
	  { -135.03 ,   77.26 },   // d > D
	  {  -85.03 ,   72.26 },   // b > T
	  { -165.00 ,  175.00 },   // e > Y
	  {   55.88 ,   38.62 },   // L
	  {   85.82 ,   -0.03 },   // l > K
	  {   80.00 , -170.00 }    // x > Z


There are also some networks for predicting ANG 2ry sequence from the
STRIDE 2ry sequence, and STRIDE from ANG, to see how much information
is in the ANG string that is not in the STRIDE string.
(Actually, as of 30 Nov 2001, that is all we have in the directories.)

We get that ang as input can save 1.4810 bits when predicting Stride
ebghtl, but stride ebghtl input only saves 0.9720 bits when predicting
ang.   This can be compared with the no-window mutual information
(computed predict-2nd/compare-real) of 0.7900 bits.  The ang alphabet
seems to be significantly more informative than the stride alphabet.

Most of the scripts are still set up to refer to an older subdirectory
organization (in which the subdirectories of testing/ang/ were just
subdirectories of testing/).


We may want to rename the ANG alphabet classes:

From karplus@bray.cse.ucsc.edu  Thu Nov  8 17:03:26 2001
Date: Thu, 8 Nov 2001 17:03:23 -0800
From: Kevin Karplus <karplus@soe.ucsc.edu>
To: jcasper@soe.ucsc.edu, rachelk@soe.ucsc.edu
CC: karplus@soe.ucsc.edu
Subject: renaming the Bystroff angle alphabet


We may want to rename the classes in the Bystroff angle alphabet, to
look more like stride and dssp

	best matches in
ANG	DSSP		STRIDE
B	L,B		B,C
D	B,L		B,T,C
E	E,B		E,B
G	T,G		G,T,I,C
H	H,I,G		H,I,G,T
K	T,S		T,C
L	T,S		T,C
T	L,S,B		T,B,C,E
Y	E,B,S		E,B,C
Z	S,L,T		C,T,B

DSSP	best matches in ANG
B	B,D,Y
E	E,Y,D
G	G,H
H	H
I	H,Z
L	B,T,D,Z
S	Z,K,Y,L,T
T	K,L,Z

STRIDE	best matches in ANG
B	D,Y,B,E,Z,T
C	Z,B,L,T,Y,D
E	E,Y,D,T
G	G,H
H	H
I	H,G
T	K,L,G,Z,T

(see ~karplus/dna/predict-2nd/compare-real for the tables these are taken from) 

Based on these observations the names should probably be

current			proposed	Bystroff's	Andy	
ANG	frequency	ANG		name		Karplus
H	0.42		H		H		alpha_R	
E	0.17		E		E		beta_S
B	0.17		P		B		beta_P
G	0.08		G		G		delta_R
Y	0.04		Y		e		epsilon' 
T	0.03		N 		b		gamma'
L	0.02		L		L		alpha_L
K	0.02		T		l		delta_L
D	0.02		D		d		zeta
Z	0.01		S		x		epsilon


The reasons for the name changes:
	B->P	this is most often a poly-proline helix, strongly
		favoring proline.
	T->N	This region seems to favor aspartic acid and
		asparagine (D and N).  D was already in use.
	K->T	This region seems to have the turns.
	Z->S	This region seems to have DSSP's S-turns.

I think we might also want to add another (lightly populated) point around
(-100, 180) to make the groove between E and P.  If we do this, we
might want to recluster the points somewhat, since I think that Y
would then move to around -180,-175, givng up some of its mass to E
and the new point. 

I know that renaming the classes is a pain, but I think that we'll want
the best letters we can come up with.  We should also start thinking
up longer names for the different phi-psi states.