Project Description
The goal of this research project is to investigate techniques and develop effective tools for tracing the provenance and flow of data in a network of inter-related databases where each database may be constantly evolving. The following three key areas are investigated:
(1) Reasoning about provenance for relational/XML data through schema mappings. The relationships between databases or versions of databases can often be described by high-level abstractions such as schema mappings or constraints. Hence, tracing the provenance or flow of a piece of data involves reasoning about the movement of a piece of data through schema mappings. (2) An annotation management system based on provenance. This system provides an alternative method for computing provenance. It does so "eagerly" by carrying provenance along as annotations when data is transformed. (3) An archiving toolkit based on key constraints. A set of utilities for efficiently archiving and managing evolving databases in a semantically meaningful way will be developed. The archive will enable the provenance and evolution of a piece of data to be be easily traced.
This material is based on work support by the National Science Foundation under Grant IIS-0347065.
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Principal Investigator
Postdoctoral Scholars and Students
External Collaborators
- Mauricio Hernandez(IBM Research - Almaden)
- Melanie Herschel (Hasso-Plattner-Institut)
- Paolo Papotti (Università "Roma Tre")
- Lucian Popa (IBM Research - Almaden)
- Renee J. Miller (University of Toronto)
Publications (in reverse chronological order)
EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples
(with B. Alexe, B. ten Cate, Ph. G. Kolaitis)
Proceedings of the VLDB Endowment (PVLDB) (Demonstration Track) (To appear), 2011.
Designing and Refining Schema Mappings with Data Examples
(with B. Alexe, B. ten Cate, Ph. G. Kolaitis)
ACM SIGMOD International Conference on Management of Data (SIGMOD), 2011.
MapMerge: Correlating Independent Schema Mappings
Bogdan Alexe, Mauricio Hernandez, Lucian Popa, and Wang Chiew Tan
Proceedings of the PVLDB Endowment, vol. 3, pp. 81-92, 2010.
Database Constraints and Homomorphism Dualities
Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan
Proceedings of Principles and Practice of Constraint Programming (CP), pp. 475-490, 2010.
Characterizing Schema Mappings via Data Examples
Bogdan Alexe, Phokion G. Kolaitis, and Wang-Chiew Tan
Proceedings of the Symposium on Principles of Database Systems (PODS), pp. 261-272, 2010
Provenance in Databases: Why, How, and Where
(with J. Cheney, L. Chiticariu)
Foundations and Trends in Databases, 2009.
Artemis: A System for Analyzing Missing Answers
(with M. Herschel, M. A. Hernández)
Proceedings of the VLDB Endowment (PVLDB) (Demonstration Track), 2009.
Comparing and Evaluating Mapping Systems with STMark
(with B. Alexe, Y. Velegrakis)
Proceedings of the VLDB Endowment (PVLDB) (Demonstration Track), 2008.
Data Exchange with Data-Metadata Translations
(with M. A. Hernández, P. Papotti)
Proceedings of the VLDB Endowment (PVLDB), 2008.
Curated Databases
(with P. Buneman, J. Cheney, S. Vansumerren)
ACM Symposium on Principles of Database Systems (PODS), 2008.
Muse: A System for Understanding and Designing Mappings
(with B. Alexe, L. Chiticariu, R. J. Miller, D. Pepper)
ACM SIGMOD International Conference on Management of Data (SIGMOD) (Demonstration Track), 2008
Muse: Mapping Understanding and deSign by Example
(with B. Alexe, L. Chiticariu, R. J. Miller)
International Conference on Data Engineering, 2008
Provenance in Databases: Past, Current, and Future
W. Tan
IEEE Data Engineering Bulletin.
Provenance in Databases
(with P. Buneman)
ACM SIGMOD International Conference on Management of Data (SIGMOD) (Tutorial Track).
Tutorial slides in [ppt] (requires TexPoint for Math symbols to display properly) or [pdf].
Peer Data Exchange
(with A. Fuxman, Ph. G. Kolaitis, R. J. Miller)
ACM Transactions on Database Systems (TODS) Vol. 31, No. 4, pages 1454 - 1498, 2006.
Special issue for selected papers from the PODS 2005 conference
SPIDER: A Schema MapPIng DEbuggeR
(with B. Alexe, L. Chiticariu)
Very Large Data Bases (VLDB) (Demonstration Track), 2006.
Debugging Schema Mappings with Routes
(with L. Chiticariu)
Very Large Data Bases (VLDB), 2006.
An Annotation Management System for Relational Databases
(with D. Bhagwat, L. Chiticariu, G. Vijayvargiya)
VLDB Journal, Vol. 14, No. 4, 2005
Special issue for selected papers from the VLDB 2004 conference
DBNotes: A Post-It System for Relational Databases [fullversion]
(with L. Chiticariu, G. Vijayvargiya)
ACM SIGMOD International Conference on Management of Data (SIGMOD) (Demonstration Track), 2005
Research Problems in Data Provenance
W. Tan
IEEE Data Engineering Bulletin, vol. 27, no. 4, pages 45-52, 2004
Archiving Scientific Data
(with P. Buneman, S. Khanna, K. Tajima)
ACM Transactions on Database Systems (TODS), vol. 29, No. 1, pages 2-42, 2004
Special issue for selected papers from the SIGMOD/PODS 2002 conference