Thesis of Tom Sebastian

Evaluation of XPath Queries on XML streams with Networks of Early Nested Word Automata

The challenge that we tackle in this thesis is the problem of how to answer XPath queries on XML streams with low latency, full coverage, high time efficiency, and low memory costs. We first propose to approximate earli- est query answering for navigational XPath queries by compilation to early nested word automata. It turns out that this leads to almost optimal la- tency and memory consumption. Second, we contribute a formal semantics of XPath 3.0. It is obtained by mapping XPath to the new query language λXP that we introduce. We then show how to compile λXP queries to net- works of early nested word automata, and develop streaming algorithms for the latter. Thereby we obtain a streaming algorithm that indeed covers all of XPath 3.0. Third, we develop an algorithm for projecting XML streams with respect to the query defined by an early nested word automaton. Thereby we are able to make our streaming algorithms highly time efficient. We have implemented all our algorithms with the objective to obtain an industrially applicable streaming tool. It turns out that our algorithms outperform all previous approaches in time efficiency, coverage, and latency.

Jury

Directeur de Thèse : Joachim NIEHREN Rapporteurs : Anca MUSCHOLL, Carlo ZANIOLO Examinateurs : Kim NGUYEN

Thesis of the team LINKS defended on 17/06/2016