This directory contains scripts to perform "Careful Ranking"
of several solvers on a set of benchmarks.

To be meaningful, each benchmark must be run with the same timeout
for all solvers.  However, different benchmarks can have different
timeouts, and the scripts will still work.

To be meaningful, all solvers being compared should avoid producing
wrong answers.  The only reasonable penalty for a wrong answer is
disqualification.  The scripts do NOT attempt to see if the answers
given are consistent.

Generally, the user executes csh files, and if such a file is
executed with no command-line parameters, it gives a usage message.
The csh file often calls other files in this directory, so the
current working directory should be this one.
To ensure consistency, each csh file executes "setenv LANG C".
This causes programs to treat "." as a decimal point and to sort
by bit order so that 'Z' < 'a'.


-----------------------------------------------------------------------

Typical step by step to rank n solvers on m instances follows.

0. select-catgy.csh phaseN.txt prefix

phaseN.txt is in the format of text results at www.satcompetition.org,
e.g., for the 2009 competition.  For example, use

  wget 'http://www.cril.univ-artois.fr/SAT09/phase2.txt'

to download phase2 results for all three categories.

Or use wget 'http://www.cril.univ-artois.fr/SAT11/results/export.php?idev=46'
to download phase2 results for all three categories of the 2011 competition.

"prefix" is a unique case-insensitive prefix of one of the categories
in phaseN.txt.  E.g., phase2.txt has these categories: APPLICATION,
CRAFTED, RANDOM.  The "prefix" might be "app", "craft", or "rand".

The output file is named prefix-phaseN.txt and has no headers and
the category field has been deleted.
See phase2head.txt and app-phase2head.txt for a short input/output example.


1. parse-ss.csh spreadsheet.txt

The outputs are spreadsheet-trim.txt, spreadsheet-matrix.txt, and
spreadsheet-solverlist.txt.

NOTE: spreadsheet-matrix.txt is generated from spreadsheet-trim.txt and
  spreadsheet-solverlist.txt is generated from spreadsheet-matrix.txt.

  Therefore the user may make a copy of parse-ss.csh and revise the beginning
  of the new script file to produce spreadsheet-trim.txt from ANY input file.
  This file must be sorted and each (instance solver) pair must be unique.

  Similarly, data for a different solver can be developed independently
  in the format of spreadsheet-trim.txt and files can be combined with 'sort'
  before producing ...-matrix.txt and ...-solverlist.txt.
  ---------------------------------------------------------------------

spreadsheet.txt is in the format of text output of select-catgy.csh
in step 0.  In particular, it has no category field.  Examples:

    app-phase2.txt.gz       gzipped 2009 application phase 2 results
    app-phase2head.txt      lines 1-8 of application phase 2 results

app-phase2.txt.gz has 292 benchmarks and 16 solvers.
spreadsheet.txt *might* begin with two header lines that are discarded.
If present, they are "Instance..." and "_____...".
The rest are n*m data lines.

spreadsheet-trim.txt contains the data from spreadsheet.txt in the
format (instance solver time), with "Inf" in the time field to
denote that the instance was not solved.

spreadsheet-solverlist.txt associates solver names with numbers 1-n.

spreadsheet-matrix.txt has (m+1) lines (or rows) of
(n+1) fields (or columns) each.

    Row 0 is a header in the format "SOLVER" solver_1 ... solver_n.

    After row 0, column 0 is instance name and columns 1-n are the times
    for solver_1 ... solver_n.

This is the primary input file for analysis.


2. pairs-compare.csh spreadsheet-matrix.txt noise_level

spreadsheet-matrix.txt came from step 1.  noise_level is the user's
choice for the minimum solving time that is considered "significant".
The example runs in this directory used 60.
The analysis is described in a paper submitted to SAT 2011.

The outputs are

    spreadsheet-matrix-noise_level-report.txt
    spreadsheet-matrix-noise_level-scores.txt
    spreadsheet-matrix-noise_level-ranked1.txt
    spreadsheet-matrix-noise_level-ranked2.txt

spreadsheet-matrix-noise_level-report.txt
    has a keyword for the first field on each line, to make it easy
    to select the lines of interest.

    UNITS        1 for seconds, 60 for minutes, etc.
    NOISE        noise_level on command-line.
    MATCH        details of the m-instance contest between a pair
                 of solvers.  Field 4 is positive if the first solver
                 (name in field 2) wins, and is the winning margin.
                 Each pair appears in both orders so there are n*n
                 such lines.  Field 6 = m.  Field 8 = number of ties.
                 Field 7 = margin expressed in standard deviations,
                 negative for a lost match.

                 Fields 9, 10 estimate the probability that the margin
                 would be less extreme if the solvers were actually
                 equally fast or their true relation was opposite
                 what the margin implied (the null hypothesis).
                 That is, if field 9 = 0.10, the interpretation is:
                 If solver_1 were actually as fast as, or faster than,
                 solver_2, a margin as negative as the value in field 4 would
                 occur about 10% of the time.
                 However, if field 9 = 0.90, the interpretation is:
                 If solver_1 were actually as slow as, or slower than,
                 solver_2, a margin less than the value in field 4 would
                 occur about 90% of the time.

    CKSUM        Sum of MATCH margins, should be 0.

    MYTOT        Algebraic sum of winning margins, for each solver.

    MYDOMINS     sum of 2*wins + 1*ties, for each solver.

    MYWINS       sum of wins (ties and losses count 0), for each solver.

    none of the above:  The final section shows two matrices, called
                 DOMIN and STRICT_DOMIN.

                 Each row of DOMIN breaks down the MYDOMINS of a solver;
                 that is, each column is 2 for win, 1 for tie, 0 for loss.
                 However, the diagonal elements are also 0, the solvers
                 do not get a 1 for tying with themselves.

                 Each row of STRICT_DOMIN breaks down the MYWINS of a solver;
                 that is, each column is 1 for win, 0 for tie, 0 for loss.


spreadsheet-matrix-noise_level-scores.txt
    sorts the MYTOT lines, largest first.

spreadsheet-matrix-noise_level-ranked1.txt
    sorts the MYDOMINS lines, largest first.
    Duplicated values indicate that these solvers are involved in a cycle.
    Not all solvers in one strongly connected component
    need to have the same value, but their values will be duplicated.

    For example, values of 12 and 10 are duplicated in
    app-phase2-matrix-60-ranked1.txt.  Examination of the DOMIN matrix
    in app-phase2-matrix-60-report.txt shows that the four solvers form
    a single SCC.

    If the leading values are 2(n-1), 2(n-2), 2(n-3), ..., then these
    solvers hold the leading ranks, uniquely.
    For example, the top eight solvers in
    app-phase2-matrix-60-ranked1.txt uniquely hold ranks 1-8.

    The submitted paper proposes a final ranking based on the irreflexive
    transitive closure of the DOMIN matrix, after converting all 2's to 1's,
    to make a standard 0-1 adjacency matrix.
    The ...-ranked1.txt file is designed to assist in doing this operation
    manually, in most practical cases.  Usually, only a few leading
    solvers need to be ranked unambiguously.

spreadsheet-matrix-noise_level-ranked2.txt
    sorts the MYWINS lines, largest first.
    If all values are unique, there were no ties.
    However, not every solver with a duplicated value is necessarily in
    some cycle.

    STRICT_DOMIN is computed because there are some voting schemes that
    use this matrix.  The submitted paper does not use STRICT_DOMIN.

----------------------------------------------------------------


The following commands are to analyze effect of reducing time-out.

3.  collect-solve-times.csh matrix.txt low_time

matrix.txt is normally spreadsheet-matrix.txt from step 1.
E.g., app-phase2-matrix.txt

low_time is the time-out to stay above.  E.g., 900

matrix-times-gt{low_time}.txt  contains times > low_time for which
some solver solved some benchmark, sorted descending.


4.  domin-calc-t-o-range.csh matrix.txt t-o-range.txt ranks noise_level

ranks is how many top-ranked solvers to list, for each time-out.
E.g., 3

noise_level means the same as in step 2.  E.g., 60

matrix.txt is normally spreadsheet-matrix.txt from step 1.

t-o-range.txt is normally matrix-times-gt{low_time}.txt from step 3.

matrix-range-high-{ranks}-{noise_level}.txt is the output file.
There are also numerous files in TmpDir-pid/

E.g. app-phase2-matrix.txt app-phase2-matrix-times-gt900.txt 3 60

The script post-processes the output file (with find-domin-changes.csh)
to show only lines where the top rankings changed from the previous
line.  This goes to stdout, unless redirected.
E.g., >& app-phase2-domin-gt900-3-60.cshlog
or    >& app-phase2-domin-gt900-4-60.cshlog

Run find-domin-changes.csh matrix-range-high-{ranks}-{noise_level}.txt
to regenerate the stdout output.