UC Santa CruzUniversity Relations

Debugging and Designing Schema Mappings with Data Examples

Overview

A schema mapping (or mapping) is a high-level, declarative specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema).

In real life scenarios, specifying a semantically correct schema mapping is a complex task. A number of automated tools have been developed for suggesting potential mappings. However, these tools provide to a mapping designer very limited support for understanding the generated mappings and for designing alternative mappings.

Moreover, mapping designers typically understand their data better than mapping specifications. Thus, they can leverage familiar data examples to illustrate how changes to a mapping specification affect its semantics.

We have developed two systems, SPIDER and MUSE, for helping a mapping designer debug and understand mappings. Both systems help a mapping designer understand and design mappings through data examples.

SPIDER: a Schema mapPIng DEbuggeR

We developed a non-intrusive, data-driven approach for debugging schema mappings that employs (test) data to drive the process of exploring, understanding and refining a schema mapping. At the core of our approach lies the notion of routes. Routes are a form of provenance, describing the relationship between source and target data with the schema mapping. Routes have declarative semantics, independent of the implementation of the data exchange engine; therefore, our techniques apply to any schema mapping-based data exchange systems (also, data integration systems).

We have designed polynomial time algorithms for computing one and, respectively, all routes for selected source or target data. The latter algorithm produces a complete, polynomial size representation of the (possibly exponential) set of all routes. We have implemented our algorithms in a prototype system called SPIDER (see main features and demo).

MUSE: Mapping Understanding and deSign by Example

We developed Muse, a mapping design wizard that uses data examples to interactively assist designers in understanding and refining a schema mapping towards the desired specification. Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designer's actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further (see poster and demo).

To demonstrate our techniques, we have implemented SPIDER and MUSE on top of the Clio schema mapping management system from IBM Almaden Research Center.

Acknowledgements This work is supported in part by NSF CAREER Award IIS-0347065 and NSF grant IIS-0430994.
People

Faculty Graduate Students Undergraduate Students
  • Daniel Pepper
External Collaborators
Publications
Debugging Schema Mappings with Routes [.pdf]
Proceedings of International Conference on Very Large Data Bases (VLDB) 2006
L. Chiticariu   W. Tan
SPIDER: a Schema mapPIng DEbuggeR [.pdf]
Proceedings of International Conference on Very Large Data Bases (VLDB) 2006 - Demonstration Track
B. Alexe   L. Chiticariu   W. Tan
MUSE: Mapping Understanding and deSign by Example [.pdf]
Proceedings of International Conference on Data Engineering (ICDE) 2008
B. Alexe   L. Chiticariu   R. J. Miller   W. Tan
MUSE: a system for understanding and designing mappings [.pdf]
Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 2008 - Demonstration Track
B. Alexe   L. Chiticariu   R. J. Miller   D. Pepper   W. Tan
Talks
Debugging Schema Mappings with Routes
Talk given by Laura Chiticariu at VLDB 2006
[.ppt slides]
MUSE: Mapping Understanding and deSign by Example
Talk given by Bogdan Alexe at ICDE 2008
[.pdf slides]