http://www.cse.ucsc.edu/~avg/ProofChecker/fileformat_qir.txt VERSION 1.0 Nov 20, 2010. AUTHOR Allen Van Gelder, avg where? at cs.ucsc.edu COPYRIGHT Allen Van Gelder, Nov, 2010. All rights reserved. Permission is granted to reproduce this document for noncommercial research purposes. 1. THE QBF PROOF FORMAT: %QIR -------------------------- Q I R stands for Qbf Input Resolution. This format is both an extension of and a specialization of the "trace" format described for SAT in A. Biere. PicoSAT Essentials. Journal on Satisfiability, Boolean Modeling and Computation (JSAT), 4:75--97, 2008. http://fmv.jku.at/papers/. It is specialized because it requires the clauses used for the linear derivation to be specified in the order they should be used. It is an extension because it adds capabilities for QBF, and allows clauses to be used more than once in a single linear derivation. For ASCII formats, it is best if each QIR derivation record ends with '\n', but the format can be parsed without this assumption. However, comments and the p_line must be terminated with a newline. White space separates tokens and consists of one or more spaces, tabs, and newlines ('\n'). As specified in C, '\n' might be represented externally in various ways, depending on the system. 1.1 STAND-ALONE MODE, PREAMBLE The initial lines may be comments. The first significant line is called the , and is the same as dimacs and qdimacs formats. ::= c ::= p cnf where is an upper bound on variable numbers and is the number of clauses in the input formula. In put clauses numbers are in the range 1 through . From now on we use n and m with these meanings, omitting brackets. The initial section of a QIR proof in stand-alone mode is the quantifier prefix. This is exactly in Qdimacs format, except that any quantifier block can be spread over multiple lines and its terminating 0 can be followed by any white-space. Newlines have no special significance, but are recommended at the end of each quantifier block. The presence of a quantifier block signals that this proof file is in stand-alone mode, even if the p_line is missing, and that all the original clauses will be included immediately after the quantifier prefix. If there is no quantifier prefix and clauses begin, all variables are assumed to be existential, making this a SAT problem. This is consistent with the Qdimacs convention. However, QIR does not permit some variables to appear in the quantifier prefix and others to be omitted. It is all or none. If the p_line is omitted in stand-alone mode, then n will be inferred from the quantifier prefix, and m will be inferred from the last of the specified original clauses, for tools that choose to treat the p_line as optional. Copies of original clauses should NOT occur after the first derived clause. ::= ::= ::= ::= ::= "e" 0 ::= "a" 0 ::= ::= and should alternate. There is no explicit terminator for the quantifier prefix. Following the quantifier prefix, the entire matrix of input clauses should be given as a sequence of in which is . 1.2 QIR GRAMMAR A QIR derivation has this syntax (with the understanding that tokens are separated by whitespace (denoted by where necessary): Single-character tokens: * (asterisk), T, r, t, u, z. Single-word tokens ( is written as 0 in the rules): ::= ::= ::= Production rules: ::= 0 "z" Let the formula have m clauses. For derived clauses, begins at m+1 (or greater) and must be greater than of the previous derived clause. Clauses in the original formula have from 1 to m. ::= "T" ::= ::= ::= ::= normally is a (possibly empty) sequence of positive and negative integers, which integers represent variables. While a tautological clause (containing complementary literals) is permitted to appear, it is an error to use this clause as an operand of resolution. See "T", next. A tautological clause may be an operand of universal reduction. Repeated literals are acceptable, and denote the same clause as written without repeats. may also be the token "T", denoting a tautological clause. This clause may be an operand of resolution, in which case the resolvent is the other operand. Note that 0 follows the "T" anyway. 0 is the number 0, and might be written as 00, 000, etc., but not 0.0 or 0x00, etc. (Variable and clause numbers may also have extra leading 0s.) ::= // meaning "this in an original clause". ::= "*" // meaning "you figure it out" (see below). ::= "t" ::= ::= might consist of the token "t" followed by a of an earlier clause that was in the original formula or derived, and be followed by an empty . This denotes a simple copy of the referenced clause. The reason to include this possibility is to facilitate applying a "restriction" to a proof. Applying a "restriction" might cause a to become invalid. In such a case, a copy operation might cure the problem. The "*" derivation is provided to facilitate proof production in the presence of preprocessors, whose reasoning steps might not be easy to reproduce. A verifier might attempt to confirm that the clause is a sound consequence of the preceding clause. A verifier might accept such a clause as an additional input clause or an hypothesis. A verifier might simply give up and reject the proof. ::= "r" ::= "u" ::= ::= operand_clause_number is less than this derivation's clause_number. ::= may be "u" where is a universal literal that occurs in the "working clause" (to be defined below), to be deleted by universal reduction. The syntax denotes the following derivation in Q-resolution: Initially, "working clause" is the clause referenced by the clause_number following the "t". ("t" stands for "top clause", the beginning of a linear derivation.) Each operation uses the "working clause" as one operand and derives a new clause, which replaces "working clause". The final value of "working clause" is expected to be the derived clause, represented by 0. For a resolution operation, clashing_literal must be existential and should appear in "working clause" and its negation should appear in the clause referenced by operand_clause_number. The new "working clause" is the resolvent. Resolutions involving a "T" operand (that is, operand_clause_number references a "T" clause) are defined elsewhere. For a universal reduction, the universal literal should appear in "working clause" and the new "working clause" is the same except that this literal has been deleted. Errors in the semantics, or interpretation, of QIR derivations are not considered syntax errors. 1.3 EXAMPLE Here is a narrative interpreting an example QIR derivation. Note that "unrd" abbreviates universal reduction. p cnf 100 200 e 3 7 8 9 14 15 32 0 5 -8 -15 -32 z 17 -3 14 32 z 25 -3 8 9 z 35 3 -7 z 345 -7 9 14 0 t 25 r 8 5 r -32 17 u -15 r -3 35 z Clause 345 is [-7 9 14] and is derived as follows. tmp1 = res(8, 25, 5) = [ -3 9 -15 -32 ] tmp2 = res(-32, tmp1, 17) = [ -3 9 14 -15 ] tmp3 = unrd(-15, tmp2) = [ -3 9 14 ] tmp4 = res(-3, tmp3, 35) = [ -7 9 14 ] = Clause 345. 2. COMPLEXITY OF THE QIR PROOF LANGUAGE ------------------------------------ The QIR proof language can be recognized in deterministic log space, if "*" is not used. By "proof language" we mean the set of strings that denote a sound Q-resolution derivation. A QIR derivation record is considered sound if the derived clause is not tautological (except for "T") and its literals are a superset of the literals of the final resolvent denoted in the QIR derivation part of the QIR derivation record (beginning at "t" and ending just before "z"). 3. REMARKS ------- All programs will be designed for a unix environment, such as Linux, Solaris, etc., and will be written to work on both big-endian and little-endian machines (without knowing which type the machine is). In some cases, unix features might be used to improve the processing of very large files, such as pipes, mmap, and forks. The design should be such that specialized features are confined to subroutines that can be replaced with less specialized implementations, at the cost of performance and size limitations.