This directory contains scripts to perform "Careful Ranking" of several solvers on a set of benchmarks. To be meaningful, each benchmark must be run with the same timeout for all solvers. However, different benchmarks can have different timeouts, and the scripts will still work. To be meaningful, all solvers being compared should avoid producing wrong answers. The only reasonable penalty for a wrong answer is disqualification. The scripts do NOT attempt to see if the answers given are consistent. Generally, the user executes csh files, and if such a file is executed with no command-line parameters, it gives a usage message. The csh file often calls other files in this directory, so the current working directory should be this one. To ensure consistency, each csh file executes "setenv LANG C". This causes programs to treat "." as a decimal point and to sort by bit order so that 'Z' < 'a'. ----------------------------------------------------------------------- Typical step by step to rank n solvers on m instances follows. 0. select-catgy.csh phaseN.txt prefix phaseN.txt is in the format of text results at www.satcompetition.org, e.g., for the 2009 competition. For example, use wget 'http://www.cril.univ-artois.fr/SAT09/phase2.txt' to download phase2 results for all three categories. Or use wget 'http://www.cril.univ-artois.fr/SAT11/results/export.php?idev=46' to download phase2 results for all three categories of the 2011 competition. "prefix" is a unique case-insensitive prefix of one of the categories in phaseN.txt. E.g., phase2.txt has these categories: APPLICATION, CRAFTED, RANDOM. The "prefix" might be "app", "craft", or "rand". The output file is named prefix-phaseN.txt and has no headers and the category field has been deleted. See phase2head.txt and app-phase2head.txt for a short input/output example. 1. parse-ss.csh spreadsheet.txt The outputs are spreadsheet-trim.txt, spreadsheet-matrix.txt, and spreadsheet-solverlist.txt. NOTE: spreadsheet-matrix.txt is generated from spreadsheet-trim.txt and spreadsheet-solverlist.txt is generated from spreadsheet-matrix.txt. Therefore the user may make a copy of parse-ss.csh and revise the beginning of the new script file to produce spreadsheet-trim.txt from ANY input file. This file must be sorted and each (instance solver) pair must be unique. Similarly, data for a different solver can be developed independently in the format of spreadsheet-trim.txt and files can be combined with 'sort' before producing ...-matrix.txt and ...-solverlist.txt. --------------------------------------------------------------------- spreadsheet.txt is in the format of text output of select-catgy.csh in step 0. In particular, it has no category field. Examples: app-phase2.txt.gz gzipped 2009 application phase 2 results app-phase2head.txt lines 1-8 of application phase 2 results app-phase2.txt.gz has 292 benchmarks and 16 solvers. spreadsheet.txt *might* begin with two header lines that are discarded. If present, they are "Instance..." and "_____...". The rest are n*m data lines. spreadsheet-trim.txt contains the data from spreadsheet.txt in the format (instance solver time), with "Inf" in the time field to denote that the instance was not solved. spreadsheet-solverlist.txt associates solver names with numbers 1-n. spreadsheet-matrix.txt has (m+1) lines (or rows) of (n+1) fields (or columns) each. Row 0 is a header in the format "SOLVER" solver_1 ... solver_n. After row 0, column 0 is instance name and columns 1-n are the times for solver_1 ... solver_n. This is the primary input file for analysis. 2. pairs-compare.csh spreadsheet-matrix.txt noise_level spreadsheet-matrix.txt came from step 1. noise_level is the user's choice for the minimum solving time that is considered "significant". The example runs in this directory used 60. The analysis is described in a paper submitted to SAT 2011. The outputs are spreadsheet-matrix-noise_level-report.txt spreadsheet-matrix-noise_level-scores.txt spreadsheet-matrix-noise_level-ranked1.txt spreadsheet-matrix-noise_level-ranked2.txt spreadsheet-matrix-noise_level-report.txt has a keyword for the first field on each line, to make it easy to select the lines of interest. UNITS 1 for seconds, 60 for minutes, etc. NOISE noise_level on command-line. MATCH details of the m-instance contest between a pair of solvers. Field 4 is positive if the first solver (name in field 2) wins, and is the winning margin. Each pair appears in both orders so there are n*n such lines. Field 6 = m. Field 8 = number of ties. Field 7 = margin expressed in standard deviations, negative for a lost match. Fields 9, 10 estimate the probability that the margin would be less extreme if the solvers were actually equally fast or their true relation was opposite what the margin implied (the null hypothesis). That is, if field 9 = 0.10, the interpretation is: If solver_1 were actually as fast as, or faster than, solver_2, a margin as negative as the value in field 4 would occur about 10% of the time. However, if field 9 = 0.90, the interpretation is: If solver_1 were actually as slow as, or slower than, solver_2, a margin less than the value in field 4 would occur about 90% of the time. CKSUM Sum of MATCH margins, should be 0. MYTOT Algebraic sum of winning margins, for each solver. MYDOMINS sum of 2*wins + 1*ties, for each solver. MYWINS sum of wins (ties and losses count 0), for each solver. none of the above: The final section shows two matrices, called DOMIN and STRICT_DOMIN. Each row of DOMIN breaks down the MYDOMINS of a solver; that is, each column is 2 for win, 1 for tie, 0 for loss. However, the diagonal elements are also 0, the solvers do not get a 1 for tying with themselves. Each row of STRICT_DOMIN breaks down the MYWINS of a solver; that is, each column is 1 for win, 0 for tie, 0 for loss. spreadsheet-matrix-noise_level-scores.txt sorts the MYTOT lines, largest first. spreadsheet-matrix-noise_level-ranked1.txt sorts the MYDOMINS lines, largest first. Duplicated values indicate that these solvers are involved in a cycle. Not all solvers in one strongly connected component need to have the same value, but their values will be duplicated. For example, values of 12 and 10 are duplicated in app-phase2-matrix-60-ranked1.txt. Examination of the DOMIN matrix in app-phase2-matrix-60-report.txt shows that the four solvers form a single SCC. If the leading values are 2(n-1), 2(n-2), 2(n-3), ..., then these solvers hold the leading ranks, uniquely. For example, the top eight solvers in app-phase2-matrix-60-ranked1.txt uniquely hold ranks 1-8. The submitted paper proposes a final ranking based on the irreflexive transitive closure of the DOMIN matrix, after converting all 2's to 1's, to make a standard 0-1 adjacency matrix. The ...-ranked1.txt file is designed to assist in doing this operation manually, in most practical cases. Usually, only a few leading solvers need to be ranked unambiguously. spreadsheet-matrix-noise_level-ranked2.txt sorts the MYWINS lines, largest first. If all values are unique, there were no ties. However, not every solver with a duplicated value is necessarily in some cycle. STRICT_DOMIN is computed because there are some voting schemes that use this matrix. The submitted paper does not use STRICT_DOMIN. ---------------------------------------------------------------- The following commands are to analyze effect of reducing time-out. 3. collect-solve-times.csh matrix.txt low_time matrix.txt is normally spreadsheet-matrix.txt from step 1. E.g., app-phase2-matrix.txt low_time is the time-out to stay above. E.g., 900 matrix-times-gt{low_time}.txt contains times > low_time for which some solver solved some benchmark, sorted descending. 4. domin-calc-t-o-range.csh matrix.txt t-o-range.txt ranks noise_level ranks is how many top-ranked solvers to list, for each time-out. E.g., 3 noise_level means the same as in step 2. E.g., 60 matrix.txt is normally spreadsheet-matrix.txt from step 1. t-o-range.txt is normally matrix-times-gt{low_time}.txt from step 3. matrix-range-high-{ranks}-{noise_level}.txt is the output file. There are also numerous files in TmpDir-pid/ E.g. app-phase2-matrix.txt app-phase2-matrix-times-gt900.txt 3 60 The script post-processes the output file (with find-domin-changes.csh) to show only lines where the top rankings changed from the previous line. This goes to stdout, unless redirected. E.g., >& app-phase2-domin-gt900-3-60.cshlog or >& app-phase2-domin-gt900-4-60.cshlog Run find-domin-changes.csh matrix-range-high-{ranks}-{noise_level}.txt to regenerate the stdout output.