http://www.cse.ucsc.edu/~avg/ProofChecker/fileformat_rup.txt VERSION 1.2 Apr 20, 2007. AUTHOR Allen Van Gelder, avg where? at cs.ucsc.edu COPYRIGHT Allen Van Gelder, Apr, 2007. All rights reserved. Permission is granted to reproduce this document for noncommercial research purposes. 1. SPECIFICATION FOR REVERSE UNIT-PROPAGATION PROOF FILES --------------------------------------------------- The file format for a non-resolution proof is the following, where bytes are numbered from 0, and double quotes indicate character sequences that are NOT null-terminated: 0-3 "%RUP" 4-7 format specs, terminated by white space. The following format will be accepted for the SAT07 competition: "D32 " denotes ASCII signed integers separated by white space; e.g. use printf format "%ld\n". The remaining formats are for future development: "L32 " denotes little-endian 32-bit (e.g., Intel); e.g., 258 is written as 02 01 00 00 hex. "B32 " denotes big-endian 32-bit (e.g., Sparc, MIPS); e.g., 258 is written as 00 00 01 02 hex. "A32 " denotes ASCII signed integers separated by white space; e.g. use printf format "%ld\n". "L64 " denotes little-endian 64-bit (e.g., Intel); "B64 " denotes big-endian 64-bit (e.g., Sparc, MIPS); "A64 " denotes ASCII signed 64-bit integers. 8-31 two unsigned ASCII integers, n and m, representing the number of variables and number of clauses in input formula, each followed by white space (must match p-line of input formula, n might only be an upper bound on the number of variables); e.g., use printf format "%11d %11d\n". 32-255 identification, comments, etc., ASCII. It is recommended that this section be newline-terminated, for readability. 256-end non-resolution proof, ASCII (future binary). This section consists of a sequence of unsigned and signed integers, according to the format specs. The sequence is described in detail in the next section. The entire file is limited to 2000 megabytes (i.e., 2097152000 bytes). This is slightly less than than 2 gigabytes to ensure that it fits in a 2-GB file system. The overall length limit might be increased if a program demonstrates that it produces meaningful RUP proofs that exceed this limit. 1.1. DIMACS-LIKE REVERSE UNIT-PROPAGATION PROOF FORMAT The proof is a sequence of integers that represents the concatenation of subsequences called DERIVED CLAUSES, defined below. Let n and m denote the number of variables and number of clauses, as given on the p-line of the input formula. A LITERAL is a positive integer in the range 1-n or the negation of such an integer. A DERIVED CLAUSE in the Dimacs-like format D32 is a subsequence consisting of k literals, followed by the unsigned integer 0. The value of k may be 0, denoting the empty clause. The literals may include duplicates, but NOT contradictory pairs. In all ASCII formats, integers are separated arbitrarily by white space consisting of one or more of these characters: space, tab, or newline. The usual convention is that a newline follows the clause-terminating 0, but this is not a firm requirement. However, there must be a newline at the end of the proof. A common convention is that each clause occupies exactly one line, but this is not even a loose requirement. The 0 marks the end of a clause. Using " %ld" (note the space before the %) with fscanf permits reading one integer without worrying about newlines; check the return code to be sure an item was read. 1.2 LOGICAL CONTENT OF DERIVED CLAUSES The short story is that solvers in the vein of grasp, chaff, berkmin, minisat can simply output their conflict clauses (the final one being the empty clause). The rest of this section goes into details in case there are solvers that use a different approach. The idea is based on E.Goldberg, Y.Novikov. Verification of proofs of unsatisfiability for CNF formulas. Design, Automation and Test in Europe. March 2003, pp. 886--891. http://eigold.tripod.com/papers.html. The method of deriving the derived clause is not important. It might be by resolution or some other means. It is only necessary that the soundness of the derived clauses can be verified by a procedure that we call REVERSE UNIT-PROPAGATION (RUP for short). In the discussion, clauses are delimited by square brackets. Suppose that F is the input formula and D_1, ..., D_r are the derived clauses that have been output, with D_r = [] (the empty clause). The sequence D_1, ..., D_r is an RUP refutation of F if and only if: For each i from 1 through r, this procedure derives []: 1. Negate D_i = [L_{ij}] producing one or more unit clauses, [-L_{ij}]. For example, the negation of [x, y, z] is [-x], [-y], [-z]. (When i = r, there are no unit clauses produced.) 2. Add these unit clauses [-L_{ij}] and D_1 through D_{i-1} to F. (When i = r, there are no unit clauses to add.) 3. Perform unit-clause propagation. 1.2.1 EXAMPLES EXAMPLE 1 Say the input file has these clauses: c example 1 p cnf 4 4 1 -4 -3 0 1 4 0 -1 0 -4 3 0 The proof might be these integers (newlines are for readability): 4 3 0 0 This proof contains two derived clauses, [4 3] and []. Negating [4 3] gives [-4] and [-3], which are added to F. Then unit-clause propagation derives []. Next, add [4 3] (the first derived clause) to F, and again unit-clause propagation derives []. So the proof is verified as sound, although we have no idea why the program output [4 3] as a derived clause. EXAMPLE 2 Another correct proof for Example 1 is: 0 This is because the formula itself can be refuted by unit-clause propagation. EXAMPLE 3 Another correct proof for Example 1 is: 1 -3 0 1 3 0 -3 0 3 0 0 This is the sequence of resolvents in a resolution proof, but no structure is given, so that fact might not be obvious. 1.3. NAVIGATION HINTS The proof format is designed to be traversed forward or backward. A clause of k literals occupies (k+1) integers of storage if the proof is stored in a single array in the same format as the file. The proof checker needs to set up its own data structures for unit-clause propagation. This is beyond the capability of a log-space program. 2. REMARKS ------- The purpose of the RUP proof format is to allow a solver to "retro-fit" some verification output without too much trouble, and to allow for some possibility of derivation rules other than resolution. One drawback is that checking the output is quadratic in its length, by any methods known so far. Unit-clause propagation needs to be repeated for each succeeding derived clause. Another drawback is that the proof checker can report that a clause is not verified, but cannot necessarily explain exactly why. In this case, some clauses might be left over after unit-clause propagation terminates. If the formula is unsatisfiable, ANY clause is logically implied by it. However, as a debugging tool, this system may well provide helpful information. The organizers of the SAT05 solver competition did not have a program to check the RUP proof format, so it was not offered. The organizers of the SAT07 solver competition do have a tool to check the RUP proof format, consisting of some scripts and a few programs. The tool runs on Linux, and needs C and C++ compilers. The programs for the resolution proof checker for the binary and ascii formats accepted in 2005 are also available, and run on Linux. The 2005 formats will be accepted in 2007. All programs will be designed for a unix environment, such as Linux, Solaris, etc., and will be written to work on both big-endian and little-endian machines (without knowing which type the machine is). In some cases, unix features might be used to improve the processing of very large files, such as pipes, mmap, and forks. The design should be such that specialized features are confined to subroutines that can be replaced with less specialized implementations, at the cost of performance and size limitations. Any RUP proof checker will be in the complexity class known as "P-complete" (by reduction from positive unit resolution, by reduction from path systems in Cook's original paper). This is a higher class than the "log space" class of the simplest resolution proof checker. Essentially, a "log space" algorithm has room to store only a fixed number of integers; it cannot use an array of n integers, e.g. Therefore, checking the correctness of a "log space" algorithm is usually fairly easy. On the other hand, a "P-complete" algorithm can use much more complex data structures, and checking its correctness can be quite difficult. In other words, when an RUP proof checker disagrees with a solver, it is far from clear which one is correct. Our rupToRes tool addresses this problem by generating a full resolution proof in the "log space" format, i.e., %RES L32, as used in the 2005 competition, which in turn is checked by our full resolution proof checker, checker3.