Approximation Techniques for Semi-Structured
Data
Project Description
The Extensible Markup Language (XML) has rapidly emerged as a de-facto
standard for data exchange and integration over the
Internet, and its increasing popularity has created a real need for
processing the growing volume of available XML data. Within the realm
of XML query processing, XML summarization has become a cost-effective
solution for providing fast, yet accurate approximate computations
over XML data. In short, an XML summary, or synopsis, captures in
limited space the key statistical characteristics of the underlying
data set and thus represents a highly-compressed, approximate
version of the base data.
The goal of this project is the development of novel summarization
techniques in order to support approximate query answering over
semi-structured data sets. The resulting synopses can be used as the
"statistics" component of query optimizers, or used to assist users in
the exploration of large collections of semi-structured data.
As part of our ongoing work on this topic, we have implemented an approximate query answering system termed AQAX. AQAX is based on the XClusters framework and provides fast and accurate approximate results over large XML data stores. Some information on AQAX can be found here.
Research in this project is mainly supported by NSF CAREER grant
0447966 and partly by an IBM Faculty Development Award.
People
External Collaborators
Alumni
- Karl Schnaitter (Now at Aster Data).
- Josh Spiegel (Now at Oracle).
- Veronica Mayorga.
- Doxa Chatzopoulou.
Publications (by year)
2009
-
D. Chatzopoulou and M. Eirinaki and N. Polyzotis, "Query Recommendations for Interactive Database Exploration". In Proceedings of the 21st International Conf. on Statistical and Scientific Database Management, New Orleans, LA, 2009.
-
Jonathan Finger and Neoklis Polyzotis, "Robust and Efficient Algorithms for Rank Join Processing". In Proceedings of ACM SIGMOD, Providence, RI, USA, 2009.
-
Joshua Spiegel and Neoklis Polyzotis, "TuG Synopses for Approximate Query Answering". In ACM Transactions on Database Systems, 34(1), 2009.
-
Veronica Mayorga and Neoklis Polyzotis, "Sketch-based Summarization of Ordered XML Streams". In Proceedings of IEEE ICDE 2009 Shanghai, China, 2009.
-
Karl Schnaitter, Joshua Spiegel, Neoklis Polyzotis, "Depth Estimation for Ranking Query Optimization". In International Journal on Very Large Databases (VLDBJ), 18(2), 2009.
2008
- Serge Abiteboul, Ioana Manolescu, Neoklis Polyzotis, Nicoleta Preda, Chong Sun, "XML processing in DHT networks". In Proceedings of IEEE ICDE 2008, Cancun, Mexico, April 2008.
2007
2006
- Joshua Spiegel, Emmanuel D. Pontikakis, Suratna Budalakoti, Neoklis Polyzotis, "AQAX: A System for Approximate XML Query Answers". In Proceedings of VLDB 2006, Seul, Korea, September 2006. (Slides)
- Neoklis Polyzotis, Minos Garofalakis,
"XSketch Synopses for XML Data Graphs"
,
In ACM Transactions on Database Systems, 31(3), September 2006.
- Joshua Spiegel, Neoklis Polyzotis,
"Tuple Graph Synopses for Relational Data Sets"
,
In 5th Hellenic Database Symposium, Thessaloniki, Greece, September 2006.
- Joshua Spiegel, Neoklis Polyzotis,
"Graph-Based Synopses for Relational Selectivity Estimation"
,
In Proceedings of ACM SIGMOD 2006, Chicago, IL, June 2006. (Slides)
- Neoklis Polyzotis, Minos Garofalakis,
"XCluster Synopses for Structured XML Content"
,
In Proceedings of IEEE ICDE 2006, Georgia, Atlanta, April 2006.
(Slides)
2005
-
Neoklis Polyzotis,
"Selectivity-Based Partitioning: A Divide and Union Technique for Effective Query Processing
,
In Proceedings of ACM CIKM 2005, Brehmen, Germany, October 2005. (Slides)
-
Neoklis Polyzotis and
Minos
Garofalakis,
"Approximate Answers for XML Queries with Range Predicates",
In 4th Hellenic Data Management Symposium, Athens, Greece, August 2005.
2004