Cost Function Evaluation Overview
                          John Archie (2007-08-20)

Most of the evaluation process can be done with the cfneval.pl script provided
here; the script is documented:

    % cfneval.pl --man

Evaluating cost functions using the code here is a multi-step process: First,
create the evaluation score files in all of the CASP7 target directories.
For my anglevector.costfcn file this was done by

    % set casp7=/projects/compbio/experiments/protein-predict/casp7/
    % set anglevectorcfn=/cse/grads/jarchie/projects/anglevector/anglevector.costfcn
    % set targetfile=$casp7/target_list.txt
    % umask 002
    % foreach target (`cat $targetfile`)
	foreach> sed -e "s/TXXXX/$target/g" < $anglevectorcfn > $casp7/$target/anglevector.costfcn
	foreach> end

Next, do all the scoring of decoys using the CASP7 stuff.  One way is

    % cfneval.pl -us "decoys/predictions.evaluate.anglevector.rdb" \
    ? 	                         | para-trickle-make -command ' ' -max_jobs 5

[ Thu Aug 23 12:02:21 PDT 2007 Kevin Karplus
   Alternatively, you could use 
   	para-trickle-make -manyids -se2log -no2letter -modelsdir $casp7 \
		-makefile ./Makefile -target decoys/predictions.evaluate.anglevector.rdb < $targetfile
]

Summary statistics can be generated by cfneval.pl:

    % cfneval.pl -s decoys/predictions.evaluate.anglevector.rdb -f0 > example.rdb 

Finally, plot the graphs and analyze the data in R, gnuplot, or some other
program:

    % R --no-save < cfneval_example.R > cfneval_example.log

(Check the R log for summary statistics and the plots/ directory for plots.)


Tue Aug 21 13:27:31 PDT 2007 Kevin Karplus

    Copied to /projects/compbio/experiments/protein-predict/CostFcnEval

Tue Aug 21 13:39:25 PDT 2007 Kevin Karplus

    Created builtins.costfcn to evaluate all the cost functions that
    are not specific to a particular target.

Tue Aug 21 20:36:27 PDT 2007 John Archie

    Fussed with the method used in cfneval.pl to compute Kendall's tau
    a bit to increase speed.  My very rough guess is that it will now take
    about 5 hours to complete the heiarchial cost function tree that I need
    to build in the Fall.

Fri Aug 24 12:53:34 PDT 2007 Kevin Karplus

One can get a quick summary of the results in the rdb file using
	summ -m < builtins.rdb | sort -nr +7 > builtins.avg

For the builtin cost fcns, the highest average tau is for
near_backbone, followed by other burial functions.

Note: I had to modify summ slightly, as it had used %d instead of %g
to print the values.


Fri Aug 24 13:19:00 PDT 2007 Kevin Karplus

I have put targets in the Makefile for evaluating the costfcn,
building an rdb file of the results by target, and giving the average
for each costfcn.

Fri Aug 24 14:48:55 PDT 2007 Kevin Karplus

There is now a %.summarize target, so that 
	make -k builtins.summarize
will make
builtins-gdt-btr.avg  builtins-gdt-tau.rdb        builtins-real_cost-tau.avg
builtins-gdt-btr.rdb  builtins-real_cost-btr.avg  builtins-real_cost-tau.rdb
builtins-gdt-tau.avg  builtins-real_cost-btr.rdb 

Using real_cost metric and tau, the best costfunction components are
Min, Avg, Max, Total for hbond_geom_backbone: -0.136, 0.333244, 0.581, 28.659
Min, Avg, Max, Total for near_backbone: -0.029, 0.32643, 0.566, 28.073
Min, Avg, Max, Total for dry12: -0.053, 0.305628, 0.642, 26.284
Min, Avg, Max, Total for dry8: -0.023, 0.303709, 0.569, 26.119

Using gdt and tau, the best costfunction components are
Min, Avg, Max, Total for near_backbone: -0.054, 0.302116, 0.553, 25.982
Min, Avg, Max, Total for dry12: -0.064, 0.290791, 0.646, 25.008
Min, Avg, Max, Total for dry8: -0.016, 0.286314, 0.543, 24.623
Min, Avg, Max, Total for way_back: -0.087, 0.28086, 0.562, 24.154
Min, Avg, Max, Total for dry6.5: -0.037, 0.261, 0.546, 22.446
Min, Avg, Max, Total for hbond_geom_backbone: -0.102, 0.250128, 0.5, 21.511

It is interesting that hbond_geom_backbone moves up so much in the
real_cost measure---probably because of the hbond scoring functions
included in real_cost.