[Next Group] [Up] [Previous Group]


Constraint Satisfaction, Database Query Evaluation, and Information Integration

Phokion G. Kolaitis (PI)
Computer Science Department
University of California, Santa Cruz

in collaboration with

Moshe Y. Vardi
Department of Computer Science
Rice University

Contact Information

Phokion G. Kolaitis
Computer Science Department
University of California, Santa Cruz
Santa Cruz, CA 95064
Phone: (831) 459-4768
Fax: (831) 459-4829
Email: kolaitis@cs.ucsc.edu
Web: http://www.cs.ucsc.edu/~kolaitis

List of Supported Students

Dean Bailey, Graduate Student Researcher Jonathan Panttaja, Graduate Student Researcher

Project Award Information

Keywords

constraint satisfaction, conjunctive queries, combinatorial games, computational complexity.

Project Summary

The overall goal of this project is to enhance the interaction between artificial intelligence and database systems, and to advance the technology transfer between these two areas. Recent research work has established that there are strong and exact connections between constraint satisfaction and conjunctive query evaluation, two fundamental and ubiquitous problems in artificial intelligence and database systems respectively. Since both these problems are known to be computationally intractable in the worst case, researchers in artificial intelligence and database systems have sought to discover tractable cases of constraint satisfaction and conjunctive query evaluation, as well as to design heuristic algorithms for solving these problems. Thus far, however, these studies have been largely carried out in parallel and with relatively little or no interaction between the two areas. In this project, a unifying game-theoretic framework for identifying and analyzing tractable cases of constraint satisfaction and conjunctive query evaluation will be developed. The scope of this investigation will then be broadened by systematically pursuing tractable cases of extensions of these problems that arise frequently in applications, such as evaluation of conjunctive queries with built-in comparison predicates. Furthermore, an investigation will be carried out to determine whether tractable cases and heuristics for constraint satisfaction can be successfully applied to the problem of information integration from heterogeneous sources, a problem whose solution often involves extensive use of techniques from database query evaluation.

Publications and Products

Project Impact

This grant has partially supported Delbert D. Bailey and Jonathan Panttaja, who are currently fourth-year students in the Ph.D. program of the Computer Science Departent at UC Santa Cruz. Bailey and Panttaja advanced to Ph.D. candidacy during the academic year 2001-02. In addition, Victor Dalmau has participated in this project as a postdoctoral researcher, while funded from different sources. Some of the results obtained during the project were presented in a course on ``Constraint Satisfaction, Complexity, and Logic'' that the PI taught during the First North American Summer School on Logic, Language, and Information (NASSLLI 2002), which took place at Stanford University in June 2002. A complete set of slides from this course is available at www.cs.ucsc.edu/~kolaitis/talks/nasslli.ps. The PI will teach an expanded version of this course at the 15th European Summer School in Logic, Language and Information (ESSLLI 2003), which will take place at the Technical University of Vienna in August 2003. The PI has also presented some of these results in invited talks at the Eigteenth International Joint Conference on Artificial Intelligence (IJCAI 2003) and at the 2003 Annual Meeting of the Association for Symbolic Logic.

Goals, Objectives, and Targeted Actitivities

Constraint Satisfaction, Database Theory and Games

Significant progress has already been made in developing a unifying game-theoretic framework for identifying tractable cases of constraint satisfaction. Specifically, in collaboration with M.Y. Vardi, we have shown that the main consistency concepts used to derive tractability results for constraint satisfaction are intimately related to certain combinatorial pebble games, called the existential k-pebble games, that were originally introduced in the context of Datalog. The crucial insight relating pebble games to constraint satisfaction is that the key concept of strong k-consistency is equivalent to the existence of a winning strategy for the Duplicator player in the existential k-pebble game. Using this insight, we have shown that strong k-consistency can be established if and only if the Duplicator wins the existential k-pebble game. Moreover, whenever strong k-consistency can be established, one method for doing this is to first compute the largest winning strategy for the Duplicator in the existential k-pebble game and then modify the original problem by augmenting it with the constraints expressed by the largest winning strategy. This basic result makes it possible to establish deeper connections between pebble games, consistency properties, and tractability of constraint satisfaction. In particular, we have used existential k-pebble games to introduce the concept of k-locality and to show that it constitutes a new tractable case of constraint satisfaction that properly extends the well known case in which establishing strong k-consistency implies global consistency. In collaboration with V. Dalmau and M.Y. Vardi, we have also systematically investigated the connections between constraint satisfaction problems, structures of bounded treewidth, and definability in logics with a finite number of variables. Specifically, we first showed that constraint satisfaction problems on inputs of treewidth less than k are definable using Datalog programs with at most k variables; this provides a new explanation for the tractability of these classes of problems. After this, we investigated constraint satisfaction on inputs that are homomorphically equivalent to structures of bounded treewidth. We showed that these problems are solvable in polynomial time by establishing that they are actually definable in Datalog; moreover, we obtained a logical characterization of the property ``being homomorphically equivalent to a structure of bounded treewidth" in terms of definability in finite-variable logics. Unfortunately, this expansion of the tractability landscape comes at a price, because we also showed that, for each k>1, determining whether a structure is homomorphically equivalent to a structure of treewidth less than k is an NP-complete problem. In contrast, it is well known that, for each k>1, there is a polynomial-time algorithm for testing whether a given structure is of treewidth less than k. Finally, we obtained a logical characterization of the property ``having bounded treewidth" that sheds light on the complexity-theoretic difference between this property and the property `being homomorphically equivalent to a structure of bounded treewidth". We plan to continue the exploration of the game-theoretic framework for constraint satisfaction and conjunctive-query evaluation. To this effect, we will attempt to establish lower bounds for the computational complexity of determining the winner in the existential k-pebble game, which will yield lower bounds for the computational complexity of establishing strong k-consistency. In a parallel direction, we will use methods and techniques from universal algebra to identify additional tractable cases of constraint satisfaction and conjunctive-query evaluation.

Phase Transitions of Algorithmically Intractable Problems

In collaboration with D.D. Bailey and V. Dalmau, we have also pursued a detailed investigation of phase transitions in algorithmically intractable problems. In particular, we investigated phase transitions for certain counting satisfiability problems that are complete for the class PP of decision problems solvable by polynomial-time probabilistic Turing machines. We have obtained analytical results and have carried out a set of large-scale experiments that provide evidence for the existence of phase transitions for these problems under the fixed clauses-to-variables ratio model. One of our main experimental findings is that, for each of the PP-complete problems considered, the asymptotic probability of solvability undergoes a phase transition at some critical ratio of clauses to variables, but this critical ratio does not always coincide with the ratio at which the average search cost of natural algorithms for this problem peaks. out large-scale experiments and comparing the average-case behavior of different algorithms for these counting satisfiability problems. The overall goal of this investigation is to shed light on the average-case complexity of algorithms for counting the number of solutions of constraint satisfaction problems. We have also investigated phase transitions for the family of bounded satisfiability problems 3SAT(B), recently introduced by Zhang, that ask: given a 3CNF-formula, is there a truth assignment that violates no more than B of its clauses. Zhang's results were experimental and for a fixed number of variables (n=25), and suggested that the locations of the phase transitions for 3SAT(B) are separate and move significantly as B increases. Analysis of these locations was posed as an open question. In collaboration with D.D. Bailey, we established analytically that the phase transitions of all 3SAT(B) problems must occur within a narrow region, regardless of how large the value of the quality bound B is. Moreover, we carried out experiments that revealed that the phase transitions for these problems occur in a remarkable way. Specifically, unlike 3SAT, the probability curves for 3SAT(B) do not have a quasi-common intersection point about which they rotate as they become steeper with increasing n. Instead, they move rapidly to the left towards the narrow region predicted by the analytical results.

Data Exchange

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Although closely related to data integration, this problem is different in that source data have to be exported and an instance over the target schema has to be materialized. In collaboration with R. Fagin, R.J. Miller and L. Popa, we embarked on an exploration of several foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. The main conceptual contribution to the semantics of the data exchange problem was to give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions, called universal. We showed that a universal solution has no more and no less data than required for data exchange and that it, in a certain technical sense, represents the entire space of possible exchange solutions. We also identified fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute such a canonical universal solution efficiently. We adopted the notion of certain answers in indefinite databases as the semantics for query answering in data exchange. We investigated the computational complexity of computing the certain answers in this context and also addressed other algorithmic issues that arise in data exchage. In particular, we studied the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution, and also explore the boundary of what queries can and cannot be answered this way in a data exchange setting. Note that, given a source instance, there may be many universal solutions. This naturally brings up the question of whether there is a ``best'' universal solution, and hence a best solution for data exchange. In collaboration with R. Fagin and L. Popa, we studied this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we showed that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we showed that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. The aforementioned results have been obtained in data exchange settings over relational schemas and with source-to-target constraints specified by tuple generating dependencies. The next stage of this investigation is to study data exchange over XML schemas. In a parallel direction, we plan to investigate the composition of two or more consecutive data exchange settings.

About this document ...

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -ascii_mode rep032.tex.

The translation was initiated by Phokion G. Kolaitis on 2003-10-20


[Next Group] [Up] [Previous Group]
Phokion G. Kolaitis
2003-10-20