Thu Jul 21 22:05:00 PDT 2005 Kevin Karplus I had an interesting, but perhaps useless idea. We might be able to improve neural nets (at least for design purposes) by training not just on correct input/ouput pairs but also on random input sequences with the background at the outputs. This would train the neural net to recognize protein-like sequences as well as classifying them. There may be enough paraameters in the neural net to take on this additional role. Another possibility is just to train a single neural net to recognize protein-like sequences (that is, with a constant 1 output for real sequences and a constant 0 output for random inputs). Sun Jul 17 14:28:08 PDT 2005 Kevin Karplus It would be good to have a more general mechanism than FreezeDesign for constraining the design algorithm. It would be good, for example, to prohibit certain residues in all (or specific) positions, or to prohibit the native residue in certain positions. Sun Jul 17 14:29:36 PDT 2005 Kevin Karplus It would be nice for the design algorithm to be able to start from a multiple alignment (perhaps of previously generated designs). Sat Jul 9 08:51:00 PDT 2005 Kevin Karplus Add optional weights to different tracks for designing to multiple networks. Add ability to choose most probable sequence in Design1st. Sat Jul 9 04:23:25 PDT 2005 Kevin Karplus It might be interesting to try batch update of weights, with an OptimizeOnLine optimization to choose the step size. Sat Jul 9 04:25:41 PDT 2005 Kevin Karplus It might be interesting to try defining correct answers not by having a known right labeling, but by having pairwise (or multiple) alignments of inputs and score the correctness by the co-emission probability or by symmetrized cross-entropy (sum p_i log(q_i) + q_i log(p_i)). This would allow learning a labeling at the same time as learning how to predict it. 8 Feb 2004 Kevin Karplus It would be good to have the ability to have multiple interface descriptions at any layer, with arbitrary inputs from earlier layers, so that we could have our neural net be a DAG. This could be useful for generating predictions for multiple alphabets from some common recoding, for example, or for including the primary inputs in later levels of the neural net. I'd like to have a multiple-alphabet output from a network (made by running several networks in parallel), so that I can use back-propagation from desired local structure properties to do protein design. ------------------------------------------------------------ I want to modify InterfaceDescription, so that we can have a guide sequence+profile for input. These can come from a single a2m file, if we use the convention that the first sequence is the guide sequence (optionally, allow providing the name for the guide sequence). [DONE BY SOL KATZMAN spring 2004] Mon Jun 13 21:37:30 PDT 2005 Kevin Karplus Bug in QualityRecord: the way bits gained is reported is bogus! It currently reports the difference between the cost of the letter with the prediction and the *average cost* of letters, rather than the background cost of the particular letter. [DONE 18 June 2005] For Design1st, try collecting all predictions (or best predictions) and doing a second round, starting from a profile based on the predictions. [DONE 15 June 2005] Fri May 27 17:49:56 PDT 2005 Kevin Karplus Add commands to read background probabilities, rather than taking them from the training set. This will allow measurement of information gain for predictions of single proteins. [Done 18 June 2005, improved 7 July 2005] 8 Feb 2004 Kevin Karplus We need to check that this version of predict-2nd runs correctly, after the minor modifications that were made for the new c++ compiler were made. [DONE] I want to train neural networks for several alphabets, using the guide+profile input. I suspect that the profile could be generalized further than we currently use, since we'd have the guide sequence available to characterize the close homologs. [mostly DONE, need to redo train-test validation for more alphabets July 2005] Fri May 27 17:49:56 PDT 2005 Kevin Karplus Add ability to have multiple networks read in at once, with common input interface, so that backpropagation can be done from multiple local structure alphabets. Note: this is simpler than the more general approach of having multi-output neural nets. [DONE for the Design1st algorithm, 9 July 2005]