30 November 2001 Kevin Karplus The "ang" directory contains networks, scripts, and quality reports for neural nets attempting to predict the secondary structure defined by partitioning phi/psi space according to a scheme devised by Bystroff in the HMMSTR paper. Each phi/psi angle maps to the closest of the following points: phi psi Bystroff's code > our code { -61.91 , -45.20 }, // H { -109.78 , 20.88 }, // G { -70.58 , 147.22 }, // B { -132.89 , 142.43 }, // E { -135.03 , 77.26 }, // d > D { -85.03 , 72.26 }, // b > T { -165.00 , 175.00 }, // e > Y { 55.88 , 38.62 }, // L { 85.82 , -0.03 }, // l > K { 80.00 , -170.00 } // x > Z There are also some networks for predicting ANG 2ry sequence from the STRIDE 2ry sequence, and STRIDE from ANG, to see how much information is in the ANG string that is not in the STRIDE string. (Actually, as of 30 Nov 2001, that is all we have in the directories.) We get that ang as input can save 1.4810 bits when predicting Stride ebghtl, but stride ebghtl input only saves 0.9720 bits when predicting ang. This can be compared with the no-window mutual information (computed predict-2nd/compare-real) of 0.7900 bits. The ang alphabet seems to be significantly more informative than the stride alphabet. Most of the scripts are still set up to refer to an older subdirectory organization (in which the subdirectories of testing/ang/ were just subdirectories of testing/). We may want to rename the ANG alphabet classes: From karplus@bray.cse.ucsc.edu Thu Nov 8 17:03:26 2001 Date: Thu, 8 Nov 2001 17:03:23 -0800 From: Kevin Karplus To: jcasper@soe.ucsc.edu, rachelk@soe.ucsc.edu CC: karplus@soe.ucsc.edu Subject: renaming the Bystroff angle alphabet We may want to rename the classes in the Bystroff angle alphabet, to look more like stride and dssp best matches in ANG DSSP STRIDE B L,B B,C D B,L B,T,C E E,B E,B G T,G G,T,I,C H H,I,G H,I,G,T K T,S T,C L T,S T,C T L,S,B T,B,C,E Y E,B,S E,B,C Z S,L,T C,T,B DSSP best matches in ANG B B,D,Y E E,Y,D G G,H H H I H,Z L B,T,D,Z S Z,K,Y,L,T T K,L,Z STRIDE best matches in ANG B D,Y,B,E,Z,T C Z,B,L,T,Y,D E E,Y,D,T G G,H H H I H,G T K,L,G,Z,T (see ~karplus/dna/predict-2nd/compare-real for the tables these are taken from) Based on these observations the names should probably be current proposed Bystroff's Andy ANG frequency ANG name Karplus H 0.42 H H alpha_R E 0.17 E E beta_S B 0.17 P B beta_P G 0.08 G G delta_R Y 0.04 Y e epsilon' T 0.03 N b gamma' L 0.02 L L alpha_L K 0.02 T l delta_L D 0.02 D d zeta Z 0.01 S x epsilon The reasons for the name changes: B->P this is most often a poly-proline helix, strongly favoring proline. T->N This region seems to favor aspartic acid and asparagine (D and N). D was already in use. K->T This region seems to have the turns. Z->S This region seems to have DSSP's S-turns. I think we might also want to add another (lightly populated) point around (-100, 180) to make the groove between E and P. If we do this, we might want to recluster the points somewhat, since I think that Y would then move to around -180,-175, givng up some of its mass to E and the new point. I know that renaming the classes is a pain, but I think that we'll want the best letters we can come up with. We should also start thinking up longer names for the different phi-psi states.