[Next Group] [Up] [Previous Group]
Phokion G. Kolaitis (PI)
Computer Science Department
University of California, Santa Cruz
in collaboration with
Moshe Y. Vardi
Department of Computer Science
Rice University
Phokion G. Kolaitis
Computer Science Department
University of California, Santa Cruz
Santa Cruz, CA 95064
Phone: (831) 459-4768
Fax: (831) 459-4829
Email: kolaitis@cs.ucsc.edu
Web: http://www.cs.ucsc.edu/~kolaitis
Dean Bailey, Graduate Student Researcher
Jonathan Panttaja, Graduate Student Researcher
- Award Number: IIS-9907419
- Duration: 10/1/00-9/30/03 (extended to 9/30/04)
- Title: Constraint Satisfaction, Database Query Evaluation,
and Information Integration
constraint satisfaction, conjunctive queries, combinatorial games,
computational complexity.
The overall goal of this project is to enhance the interaction
between artificial intelligence and database systems, and to advance
the technology transfer between these two areas.
Recent research work has established that there are strong and exact
connections between constraint satisfaction and
conjunctive query evaluation, two fundamental and ubiquitous problems
in artificial intelligence and database systems respectively.
Since both these problems are known to be computationally intractable
in the worst case, researchers in artificial intelligence
and database systems have sought to discover tractable cases
of constraint satisfaction and conjunctive query evaluation,
as well as to design heuristic algorithms for solving these problems.
Thus far, however, these studies have
been largely carried out in parallel and with relatively
little or no interaction between the two areas.
In this project, a unifying
game-theoretic framework for identifying and analyzing
tractable cases of constraint satisfaction and conjunctive
query evaluation will be developed.
The scope of this investigation
will then be broadened
by systematically pursuing tractable cases of extensions of these
problems that arise frequently in applications, such as
evaluation of conjunctive queries with built-in comparison
predicates. Furthermore, an investigation will be carried out
to determine whether
tractable cases and heuristics for constraint satisfaction
can be successfully applied to the problem of information
integration from heterogeneous sources, a problem whose
solution often involves extensive use of techniques
from database query evaluation.
- Ph.G. Kolaitis, M.Y. Vardi,
``A game-theoretic approach to constraint satisfaction",
Proceedings of the Seventeenth National
Conference on Artificial Intelligence (AAAI 2000),
AAAI Press/The MIT Press, 2000, pp. 175-181.
- Ph.G. Kolaitis, T. Raffill,
``In search of a phase transition in the AC-matching problem",
Proceedings
of the Seventh International Conference on Principles and
Practice of Constraint Programming (CP '01), LNCS Series,
Springer, Volume 2239, Springer, 2001, pp. 433-450.
- D.D. Bailey, V. Dalmau, Ph.G. Kolaitis
``Phase transitions of PP-complete satisfiability problems'',
Proceedings of the Seventeenth International Joint Conference
on Artificial Intelligence (IJCAI 2001),
pages 183-189.
Preliminary report in:
Proceedings of the Workshop on Theory and
Applications of Satisfiability Testing - SAT 2001, pp. 144-150.
Full version invited to a special issue of Discrete Applied Mathematics.
- D.D. Bailey, V. Dalmau, Ph.G. Kolaitis,
``Comparing phase transitions and peak cost in PP-complete
satisfiability problems'',
Proceedings of the Eighteenth National
Conference on Artificial Intelligence (AAAI 2002), pp. 620-626.
- V. Dalmau, Ph.G. Kolaitis, M.Y. Vardi,
``Constraint satisfaction, bounded treewidth, and
finite-variable logics'',
Proceedings of the Eight International Conference on Principles and
Practice of Constraint Programming (CP 2002), LNCS Series, Volume 2470,
Springer, 2002, pp. 310-326.
- R. Fagin, Ph.G. Kolaitis, R.J. Miller, L. Popa:
``Data Exchange: Semantics and Query Answering'',
Proceedings of
the 9th International Conference on Database Theory (ICDT 2003), LNCS Series,
Volume 2572, Springer, 2003, pp. 207-224. Full version
invited to a special issue of Theoretical Computer Science.
- R. Fagin, Ph.G. Kolaitis, L. Popa:
``Data Exchange: Getting to the core'',
Proceedings of the Twenty-Second ACM Symposium on Principles
of Database Systems (PODS 2003), pp. 90-101.
Full version invited to ACM Transactions on Database Systems.
- Ph.G. Kolaitis:
``Constraint Satisfaction, Databases, and Logic'' (invited paper),
Proceedings of the Eigteenth International Joint Conference
on Artificial Intelligence (IJCAI 2003),
pp. 1587-1595.
- D.D. Bailey and Ph.G. Kolaitis:
``Phase Transitions of Bounded Satisfiability Problems'',
Proceedings of the Eigteenth International Joint Conference
on Artificial Intelligence (IJCAI 2003), pp. 1187-1193.
This grant has partially supported Delbert D. Bailey and
Jonathan Panttaja, who are currently fourth-year students
in the Ph.D. program of the Computer Science Departent
at UC Santa Cruz. Bailey and Panttaja advanced to Ph.D. candidacy
during the academic year 2001-02.
In addition, Victor Dalmau has participated in this
project as a postdoctoral researcher, while funded from different sources.
Some of the results obtained during the project were
presented in a course on ``Constraint Satisfaction, Complexity, and Logic''
that the PI taught during the
First North American Summer School on Logic, Language,
and Information (NASSLLI 2002), which took place
at Stanford University in June 2002. A complete set
of slides from this course is available
at www.cs.ucsc.edu/~kolaitis/talks/nasslli.ps.
The PI will teach an expanded version of this course
at the 15th European
Summer School in Logic, Language and Information (ESSLLI 2003),
which will take place at the Technical University of Vienna in August 2003.
The PI has also presented some of these results in invited talks
at the Eigteenth International Joint Conference
on Artificial Intelligence (IJCAI 2003) and at the 2003 Annual
Meeting of the Association for Symbolic Logic.
Significant progress has already been made in developing
a unifying game-theoretic framework for identifying tractable cases
of constraint satisfaction. Specifically, in collaboration
with M.Y. Vardi, we have shown that
the main
consistency concepts used to derive tractability results for
constraint satisfaction are intimately related to
certain combinatorial pebble games,
called the existential k-pebble games,
that were originally introduced in the
context of Datalog. The crucial insight
relating
pebble games to constraint satisfaction is that the key concept of
strong k-consistency is equivalent to the existence of a winning
strategy for
the Duplicator player in the existential k-pebble game.
Using this insight, we have shown that
strong k-consistency can be
established if and only if the Duplicator wins the existential
k-pebble game. Moreover,
whenever strong k-consistency can be
established, one method for doing this is to first compute the largest
winning strategy for the Duplicator in the existential k-pebble game
and then modify the original problem by augmenting it with the
constraints expressed by the largest winning strategy.
This basic result makes it possible to establish
deeper connections between pebble
games, consistency properties, and tractability of constraint
satisfaction.
In particular,
we have used existential
k-pebble games to introduce the concept
of k-locality and to show that it constitutes a new tractable case of
constraint satisfaction that properly extends the well known
case in which establishing strong k-consistency implies
global consistency.
In collaboration with V. Dalmau and M.Y. Vardi,
we have also systematically investigated the connections between constraint
satisfaction problems, structures of bounded treewidth, and
definability in logics with a finite number of variables.
Specifically, we first showed that constraint satisfaction
problems on inputs of treewidth less than k
are definable using Datalog programs with at most
k variables; this provides a new explanation
for the tractability of these classes of problems.
After this, we investigated constraint satisfaction
on inputs that are homomorphically equivalent
to structures of bounded treewidth.
We showed that these problems are solvable in polynomial time
by establishing that they are actually definable in Datalog;
moreover, we obtained a logical characterization of
the property ``being homomorphically equivalent to a structure
of bounded treewidth" in terms of definability in finite-variable logics.
Unfortunately, this expansion of the tractability landscape comes
at a price, because we also showed that, for each k>1,
determining whether
a structure is homomorphically equivalent to a structure
of treewidth less than k is an NP-complete problem.
In contrast, it is well known that,
for each k>1, there is a polynomial-time algorithm
for testing whether a given structure is of treewidth less than k.
Finally, we obtained a logical characterization of the property ``having bounded
treewidth" that sheds light on
the complexity-theoretic difference between
this property and the property
`being homomorphically equivalent to a structure
of bounded treewidth".
We plan to continue the exploration of the game-theoretic framework
for constraint satisfaction and conjunctive-query evaluation.
To this effect, we will attempt to establish lower bounds
for the computational complexity of determining the winner
in the existential k-pebble game, which will yield lower
bounds for the computational complexity of establishing
strong k-consistency. In a parallel direction, we will
use methods and techniques from universal algebra
to identify additional tractable cases of constraint
satisfaction and conjunctive-query evaluation.
In collaboration with D.D. Bailey and
V. Dalmau, we have also pursued a detailed investigation
of phase transitions in algorithmically intractable problems.
In particular, we investigated
phase transitions for certain counting satisfiability
problems that are complete for the class PP of
decision problems solvable by polynomial-time probabilistic
Turing machines.
We have obtained analytical results and have
carried out a set of large-scale experiments that
provide evidence for the existence of phase transitions
for these problems under the fixed clauses-to-variables ratio model.
One of our main experimental findings is that, for each of the PP-complete
problems considered, the asymptotic probability
of solvability undergoes a phase transition at some critical ratio
of clauses to variables, but this critical ratio does not
always coincide with the ratio at which the average search cost
of natural algorithms for this problem peaks.
out large-scale experiments and comparing the average-case behavior
of different algorithms for these counting satisfiability problems.
The overall goal of this investigation is to shed light on
the average-case complexity of algorithms for counting
the number of solutions of constraint satisfaction problems.
We have also investigated
phase transitions for the family of bounded
satisfiability problems 3SAT(B), recently introduced by Zhang,
that ask: given a 3CNF-formula, is there a truth assignment
that violates no more than B of its clauses.
Zhang's results were experimental and for a fixed
number of variables (n=25), and suggested that the locations
of the phase transitions for 3SAT(B) are separate and move
significantly as B increases. Analysis of these locations
was posed as an open question.
In collaboration with D.D. Bailey,
we established analytically that the phase transitions of all
3SAT(B) problems must occur within a narrow region,
regardless of how large the value of the quality bound B is.
Moreover, we carried out experiments that revealed that the phase
transitions for these problems occur in a remarkable way. Specifically,
unlike 3SAT, the probability curves for 3SAT(B) do not have a
quasi-common intersection point about which they rotate as they become
steeper with increasing n. Instead, they move rapidly to the left
towards the narrow region predicted by the analytical results.
Data exchange is the problem of taking data structured under a source
schema and creating an instance of a target schema that reflects the
source data as accurately as possible. Although closely related
to data integration, this problem is different in that source data
have to be exported and an instance over the target schema has
to be materialized. In collaboration with R. Fagin, R.J. Miller
and L. Popa, we embarked
on an exploration of several foundational and algorithmic issues
related to the semantics of data exchange and to the query answering
problem in the context of data exchange. These issues arise
because, given a source instance, there may be many target
instances that satisfy the constraints of the data exchange problem.
The main conceptual contribution to the semantics of the data exchange
problem was to give an algebraic specification that selects, among
all solutions to the data exchange problem, a special class of solutions,
called universal. We showed that a universal solution has no more
and no less data than required for data exchange and that it, in a certain
technical sense, represents the entire space of possible exchange solutions.
We also identified fairly general, and practical, conditions that
guarantee the existence of a universal solution and yield algorithms
to compute such a canonical universal solution efficiently.
We adopted the notion of certain answers in indefinite databases
as the semantics for query answering in data exchange. We investigated the
computational complexity of computing the certain answers
in this context and also addressed other algorithmic issues
that arise in data exchage. In particular, we studied the problem
of computing the certain answers of target queries by simply
evaluating them on a canonical universal solution, and
also explore the boundary of what queries can and cannot
be answered this way in a data exchange setting.
Note that, given a source instance, there may be many universal solutions.
This naturally brings up the question of whether there is a ``best'' universal
solution, and hence a best solution for data exchange. In collaboration
with R. Fagin and L. Popa,
we studied this question by
considering the well-known notion of the core of a
structure, a notion that was first studied in graph theory, but has also played
a role in conjunctive-query processing. The core of a structure is the smallest
substructure that is also a homomorphic image of the structure. All universal
solutions have the same core (up to isomorphism);
we showed that this core is also a universal solution,
and hence the smallest universal solution. The uniqueness of the core
of a universal solution together with its minimality
make the core an ideal solution for data exchange. Furthermore, we showed that
the core is the best among all universal solutions for answering
unions of conjunctive queries with inequalities.
The aforementioned results have been obtained in data exchange settings
over relational schemas and with source-to-target constraints specified by
tuple generating dependencies.
The next stage of this investigation is to study data exchange
over XML schemas. In a parallel direction, we plan to
investigate the composition of two or more consecutive
data exchange settings.
This document was generated using the
LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -ascii_mode rep032.tex.
The translation was initiated by Phokion G. Kolaitis on 2003-10-20
[Next Group] [Up] [Previous Group]
Phokion G. Kolaitis
2003-10-20