|
A schema mapping (or mapping) is a high-level, declarative specification that describes how data structured under one schema (the
source schema) is to be transformed into data structured under a different schema (the target schema).
In real life scenarios, specifying a semantically correct schema
mapping is a complex task. A number of automated tools have been
developed for suggesting potential mappings. However, these tools
provide to a mapping
designer very limited support for understanding the generated mappings and for designing alternative
mappings.
Moreover, mapping designers typically understand their data better than mapping
specifications. Thus, they can leverage familiar data examples to
illustrate how changes to a mapping specification affect its
semantics.
We have developed two systems, SPIDER and MUSE, for helping a mapping designer debug and understand mappings. Both systems help a mapping designer understand and design mappings through data examples.
SPIDER: a Schema mapPIng DEbuggeR
We developed a non-intrusive, data-driven approach for debugging schema mappings that
employs (test) data to drive the process of exploring, understanding and refining a schema mapping.
At the core of our approach lies the notion of routes.
Routes are a form of provenance, describing the relationship between
source and target data with the schema mapping. Routes have declarative semantics,
independent of the implementation of the data exchange engine; therefore, our
techniques apply to any schema mapping-based data exchange systems (also, data integration systems).
We have designed polynomial time algorithms for computing one and, respectively, all routes
for selected source or target data. The latter algorithm produces a complete, polynomial size representation of the
(possibly exponential) set of all routes. We have implemented our algorithms in a prototype system
called SPIDER (see main features
and demo).
MUSE: Mapping Understanding and deSign by Example
We developed Muse, a mapping design wizard that uses data
examples to interactively assist designers in
understanding and refining a
schema mapping towards the desired specification. Muse systematically
guides the designer on two important components of a mapping
design: the specification of the desired grouping semantics for
sets of data and the choice among alternative interpretations for
semantically ambiguous mappings. In every component, Muse
infers the desired semantics based on the designer's actions on
a short sequence of small examples. Whenever possible, Muse
draws examples from a familiar database, thus facilitating the
design process even further (see poster
and demo).
To demonstrate our techniques, we have implemented SPIDER and MUSE on
top of the Clio
schema mapping management system from IBM Almaden Research Center.
Acknowledgements This work is supported in part by NSF CAREER Award IIS-0347065 and NSF grant IIS-0430994.
|