Cost Function Evaluation Overview John Archie (2007-08-20) Most of the evaluation process can be done with the cfneval.pl script provided here; the script is documented: % cfneval.pl --man Evaluating cost functions using the code here is a multi-step process: First, create the evaluation score files in all of the CASP7 target directories. For my anglevector.costfcn file this was done by % set casp7=/projects/compbio/experiments/protein-predict/casp7/ % set anglevectorcfn=/cse/grads/jarchie/projects/anglevector/anglevector.costfcn % set targetfile=$casp7/target_list.txt % umask 002 % foreach target (`cat $targetfile`) foreach> sed -e "s/TXXXX/$target/g" < $anglevectorcfn > $casp7/$target/anglevector.costfcn foreach> end Next, do all the scoring of decoys using the CASP7 stuff. One way is % cfneval.pl -us "decoys/predictions.evaluate.anglevector.rdb" \ ? | para-trickle-make -command ' ' -max_jobs 5 [ Thu Aug 23 12:02:21 PDT 2007 Kevin Karplus Alternatively, you could use para-trickle-make -manyids -se2log -no2letter -modelsdir $casp7 \ -makefile ./Makefile -target decoys/predictions.evaluate.anglevector.rdb < $targetfile ] Summary statistics can be generated by cfneval.pl: % cfneval.pl -s decoys/predictions.evaluate.anglevector.rdb -f0 > example.rdb Finally, plot the graphs and analyze the data in R, gnuplot, or some other program: % R --no-save < cfneval_example.R > cfneval_example.log (Check the R log for summary statistics and the plots/ directory for plots.) Tue Aug 21 13:27:31 PDT 2007 Kevin Karplus Copied to /projects/compbio/experiments/protein-predict/CostFcnEval Tue Aug 21 13:39:25 PDT 2007 Kevin Karplus Created builtins.costfcn to evaluate all the cost functions that are not specific to a particular target. Tue Aug 21 20:36:27 PDT 2007 John Archie Fussed with the method used in cfneval.pl to compute Kendall's tau a bit to increase speed. My very rough guess is that it will now take about 5 hours to complete the heiarchial cost function tree that I need to build in the Fall. Fri Aug 24 12:53:34 PDT 2007 Kevin Karplus One can get a quick summary of the results in the rdb file using summ -m < builtins.rdb | sort -nr +7 > builtins.avg For the builtin cost fcns, the highest average tau is for near_backbone, followed by other burial functions. Note: I had to modify summ slightly, as it had used %d instead of %g to print the values. Fri Aug 24 13:19:00 PDT 2007 Kevin Karplus I have put targets in the Makefile for evaluating the costfcn, building an rdb file of the results by target, and giving the average for each costfcn. Fri Aug 24 14:48:55 PDT 2007 Kevin Karplus There is now a %.summarize target, so that make -k builtins.summarize will make builtins-gdt-btr.avg builtins-gdt-tau.rdb builtins-real_cost-tau.avg builtins-gdt-btr.rdb builtins-real_cost-btr.avg builtins-real_cost-tau.rdb builtins-gdt-tau.avg builtins-real_cost-btr.rdb Using real_cost metric and tau, the best costfunction components are Min, Avg, Max, Total for hbond_geom_backbone: -0.136, 0.333244, 0.581, 28.659 Min, Avg, Max, Total for near_backbone: -0.029, 0.32643, 0.566, 28.073 Min, Avg, Max, Total for dry12: -0.053, 0.305628, 0.642, 26.284 Min, Avg, Max, Total for dry8: -0.023, 0.303709, 0.569, 26.119 Using gdt and tau, the best costfunction components are Min, Avg, Max, Total for near_backbone: -0.054, 0.302116, 0.553, 25.982 Min, Avg, Max, Total for dry12: -0.064, 0.290791, 0.646, 25.008 Min, Avg, Max, Total for dry8: -0.016, 0.286314, 0.543, 24.623 Min, Avg, Max, Total for way_back: -0.087, 0.28086, 0.562, 24.154 Min, Avg, Max, Total for dry6.5: -0.037, 0.261, 0.546, 22.446 Min, Avg, Max, Total for hbond_geom_backbone: -0.102, 0.250128, 0.5, 21.511 It is interesting that hbond_geom_backbone moves up so much in the real_cost measure---probably because of the hbond scoring functions included in real_cost.