Cost Function Evaluation Overview John Archie (2007-08-20) Most of the evaluation process can be done with the cfneval.pl script provided here; the script is documented: % cfneval.pl --man Evaluating cost functions using the code here is a multi-step process: First, create the evaluation score files in all of the CASP7 target directories. For my anglevector.costfcn file this was done by % set casp7=/projects/compbio/experiments/protein-predict/casp7/ % set anglevectorcfn=/cse/grads/jarchie/projects/anglevector/anglevector.costfcn % set targetfile=$casp7/target_list.txt % umask 002 % foreach target (`cat $targetfile`) foreach> sed -e "s/TXXXX/$target/g" < $anglevectorcfn > $casp7/$target/anglevector.costfcn foreach> end Next, do all the scoring of decoys using the CASP7 stuff. One way is % cfneval.pl -us "decoys/predictions.evaluate.anglevector.rdb" \ ? | para-trickle-make -command ' ' -max_jobs 5 [ Thu Aug 23 12:02:21 PDT 2007 Kevin Karplus Alternatively, you could use para-trickle-make -manyids -se2log -no2letter -modelsdir $casp7 \ -makefile ./Makefile -target decoys/predictions.evaluate.anglevector.rdb < $targetfile ] Summary statistics can be generated by cfneval.pl: % cfneval.pl -s decoys/predictions.evaluate.anglevector.rdb -f0 > example.rdb Finally, plot the graphs and analyze the data in R, gnuplot, or some other program: % R --no-save < cfneval_example.R > cfneval_example.log (Check the R log for summary statistics and the plots/ directory for plots.) Tue Aug 21 13:27:31 PDT 2007 Kevin Karplus Copied to /projects/compbio/experiments/protein-predict/CostFcnEval Tue Aug 21 13:39:25 PDT 2007 Kevin Karplus Created builtins.costfcn to evaluate all the cost functions that are not specific to a particular target. Tue Aug 21 20:36:27 PDT 2007 John Archie Fussed with the method used in cfneval.pl to compute Kendall's tau a bit to increase speed. My very rough guess is that it will now take about 5 hours to complete the heiarchial cost function tree that I need to build in the Fall. Fri Aug 24 12:53:34 PDT 2007 Kevin Karplus One can get a quick summary of the results in the rdb file using summ -m < builtins.rdb | sort -nr +7 > builtins.avg For the builtin cost fcns, the highest average tau is for near_backbone, followed by other burial functions. Note: I had to modify summ slightly, as it had used %d instead of %g to print the values. Fri Aug 24 13:19:00 PDT 2007 Kevin Karplus I have put targets in the Makefile for evaluating the costfcn, building an rdb file of the results by target, and giving the average for each costfcn. Fri Aug 24 14:48:55 PDT 2007 Kevin Karplus There is now a %.summarize target, so that make -k builtins.summarize will make builtins-gdt-btr.avg builtins-gdt-tau.rdb builtins-real_cost-tau.avg builtins-gdt-btr.rdb builtins-real_cost-btr.avg builtins-real_cost-tau.rdb builtins-gdt-tau.avg builtins-real_cost-btr.rdb Using real_cost metric and tau, the best costfunction components are Min, Avg, Max, Total for hbond_geom_backbone: -0.136, 0.333244, 0.581, 28.659 Min, Avg, Max, Total for near_backbone: -0.029, 0.32643, 0.566, 28.073 Min, Avg, Max, Total for dry12: -0.053, 0.305628, 0.642, 26.284 Min, Avg, Max, Total for dry8: -0.023, 0.303709, 0.569, 26.119 Using gdt and tau, the best costfunction components are Min, Avg, Max, Total for near_backbone: -0.054, 0.302116, 0.553, 25.982 Min, Avg, Max, Total for dry12: -0.064, 0.290791, 0.646, 25.008 Min, Avg, Max, Total for dry8: -0.016, 0.286314, 0.543, 24.623 Min, Avg, Max, Total for way_back: -0.087, 0.28086, 0.562, 24.154 Min, Avg, Max, Total for dry6.5: -0.037, 0.261, 0.546, 22.446 Min, Avg, Max, Total for hbond_geom_backbone: -0.102, 0.250128, 0.5, 21.511 It is interesting that hbond_geom_backbone moves up so much in the real_cost measure---probably because of the hbond scoring functions included in real_cost. Sun Aug 26 20:15:13 PDT 2007 Kevin Karplus WARNING: there seems to be an occasional problem with T0305 on the moai cluster: # ReadConformPDB reading from PDB file predictions/T0305TS601_3 looking for model 1 # Found a chain break before 294 # copying to AlignedFragments data structure # naming current conformation T0305TS601_3 # request to SCWRL produces command: ulimit -t 268 ; scwrl3 -i /var/tmp/to_scwrl_1995065502.pdb -s /var/tmp/to_scwrl_1995065502.seq -o /var/tmp/from_scwrl_1995065502.pdb > /var/tmp/scwrl_1995065502.log # Trying to read SCWRLed conformation from /var/tmp/from_scwrl_1995065502.pdb undertaker: ScwrlCommands.cc:224: Conformation* SCWRL(Conformation*, std::ostream&): Assertion `ch->atom(a).no_wc_match(new_ch->atom(atom_in_new_ch))' failed. Running exactly the same program on cheep does not cause any problems. When comparing tau or btr numbers, check to make sure that the same number of targets is included in both runs (not a problem if the computations are form the same run). Sun Aug 26 22:25:26 PDT 2007 Kevin Karplus The best cost functions for choosing high GDT are all neural-net predictions: predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_04_simple: 0.262, 0.520244, 0.733, 44.741 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_06_simple: 0.261, 0.515058, 0.732, 44.295 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_2k_simple: 0.29, 0.51464, 0.735, 44.259 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_04: 0.272, 0.483512, 0.715, 41.582 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_06: 0.227, 0.477128, 0.708, 41.033 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_nb11_2k: 0.237, 0.473186, 0.706, 40.694 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_04_simple: -0.03, 0.447767, 0.716, 38.508 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_alpha06: 0.098, 0.444233, 0.695, 38.204 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_06_simple: -0.03, 0.443849, 0.725, 38.171 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_alpha04: 0.099, 0.44264, 0.709, 38.067 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_alpha2k: 0.117, 0.436105, 0.65, 37.505 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_06: -0.014, 0.424, 0.721, 36.464 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_04: -0.01, 0.421593, 0.736, 36.257 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_2k_simple: 0.071, 0.418953, 0.724, 36.03 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_cb14_2k: -0.171, 0.412314, 0.697, 35.459 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_pb_mean: 0.093, 0.409151, 0.634, 35.187 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_pb_t04: 0.088, 0.409116, 0.633, 35.184 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_pb_t06: 0.096, 0.408558, 0.636, 35.136 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_pb_t2k: 0.091, 0.406221, 0.632, 34.935 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_CB8-sep9_06_simple: 0.057, 0.397884, 0.697, 34.218 predburial-gdt-tau.avg:Min, Avg, Max, Total for pred_CB8-sep9_06: 0.056, 0.387663, 0.648, 33.339 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_bys_t06: 0.12, 0.375698, 0.632, 32.31 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_bys_t04: 0.12, 0.375523, 0.629, 32.295 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_bys_mean: 0.121, 0.373802, 0.632, 32.147 anglevector-gdt-tau.avg:Min, Avg, Max, Total for pred_bys_t2k: 0.121, 0.373233, 0.64, 32.098 predburial-gdt-tau.avg:Min, Avg, Max, Total for near_backbone: -0.054, 0.302116, 0.553, 25.982 builtins-gdt-tau.avg:Min, Avg, Max, Total for near_backbone: -0.054, 0.302116, 0.553, 25.982 builtins-gdt-tau.avg:Min, Avg, Max, Total for dry12: -0.064, 0.290791, 0.646, 25.008 builtins-gdt-tau.avg:Min, Avg, Max, Total for dry8: -0.016, 0.286314, 0.543, 24.623 builtins-gdt-tau.avg:Min, Avg, Max, Total for way_back: -0.087, 0.28086, 0.562, 24.154 builtins-gdt-tau.avg:Min, Avg, Max, Total for dry6.5: -0.037, 0.261, 0.546, 22.446 builtins-gdt-tau.avg:Min, Avg, Max, Total for hbond_geom_backbone: -0.102, 0.250128, 0.5, 21.511 For real_cost, the best are again predictions: predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_04_simple: 0.295, 0.552698, 0.752, 47.532 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_06_simple: 0.308, 0.549256, 0.745, 47.236 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_2k_simple: 0.316, 0.545872, 0.75, 46.945 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_04: 0.22, 0.5175, 0.727, 44.505 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_06: 0.259, 0.512442, 0.724, 44.07 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_nb11_2k: 0.203, 0.506477, 0.721, 43.557 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_alpha06: 0.151, 0.499477, 0.69, 42.955 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_alpha04: 0.15, 0.498279, 0.698, 42.852 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_alpha2k: 0.147, 0.486709, 0.671, 41.857 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_pb_t04: 0.192, 0.475895, 0.67, 40.927 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_pb_mean: 0.191, 0.475756, 0.672, 40.915 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_pb_t06: 0.191, 0.475721, 0.674, 40.912 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_pb_t2k: 0.189, 0.471488, 0.67, 40.548 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_04_simple: -0.013, 0.460442, 0.726, 39.598 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_06_simple: -0.013, 0.457674, 0.729, 39.36 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_bys_t06: 0.096, 0.453384, 0.69, 38.991 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_bys_t04: 0.097, 0.45307, 0.689, 38.964 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_bys_mean: 0.094, 0.450965, 0.691, 38.783 anglevector-real_cost-tau.avg:Min, Avg, Max, Total for pred_bys_t2k: 0.091, 0.449674, 0.7, 38.672 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_06: -0.012, 0.440907, 0.7, 37.918 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_04: -0.006, 0.437709, 0.73, 37.643 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_2k_simple: 0.044, 0.428244, 0.727, 36.829 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_cb14_2k: -0.166, 0.425407, 0.69, 36.585 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_CB8-sep9_06_simple: 0.089, 0.412093, 0.663, 35.44 predburial-real_cost-tau.avg:Min, Avg, Max, Total for pred_CB8-sep9_06: 0.073, 0.402047, 0.664, 34.576 builtins-real_cost-tau.avg:Min, Avg, Max, Total for hbond_geom_backbone: -0.136, 0.333244, 0.581, 28.659 predburial-real_cost-tau.avg:Min, Avg, Max, Total for near_backbone: -0.029, 0.326488, 0.566, 28.078 builtins-real_cost-tau.avg:Min, Avg, Max, Total for near_backbone: -0.029, 0.32643, 0.566, 28.073 builtins-real_cost-tau.avg:Min, Avg, Max, Total for dry12: -0.053, 0.305628, 0.642, 26.284 builtins-real_cost-tau.avg:Min, Avg, Max, Total for dry8: -0.023, 0.303709, 0.569, 26.119 builtins-real_cost-tau.avg:Min, Avg, Max, Total for way_back: -0.101, 0.296767, 0.607, 25.522 builtins-real_cost-tau.avg:Min, Avg, Max, Total for alpha: -0.023, 0.288349, 0.583, 24.798 builtins-real_cost-tau.avg:Min, Avg, Max, Total for alpha_prev: 0.005, 0.28786, 0.586, 24.756