Thèse de Paul Gallot

Safety of data transformations

This thesis aims at studying models for representing transformations of trees, elaborating algorithms for verification on such models and applying these tools to the transformations performed by scripts on file hierarchies. We especially focus on using techniques and models from the tree transducers literature, as opposed to formalisms based on logical formulas. In the Debian GNU/Linux distribution in particular, Shell scripts are a crucial part of the installation, update and removal of software packages. The Shell scripting language provides access to Unix commands performing changes on file systems in addition to other tools. We model Unix file systems as feature trees and we represent the actions of Unix commands on a file system using a model we call tree pattern transducers. This model uses tree patterns to represent modifications on feature trees, and it uses a system of constraints to represent the domains of tree transformations. We translate Unix commands into this model. We then provide an algorithm for computing the composition of tree pattern transducers. Our final goal is to use this representation of commands to detect configurations of the file system in which a given Shell script can fail. Instead of computing the transducers corresponding to Unix commands and composing them, we opt to only compute the inverse image (we view commands as functions from trees to trees) of the set of all feature trees through Unix commands successively. We examine the pros and cons of this algorithm and we implement it. To represent inverse images we use a variation of the system of constraints designed to model the domains of tree pattern transducers. The implementation is then tested on a corpus of Debian package scripts. To better inform the discussion around this algorithm's complexity, we give a proof that the problem we are solving is NP-hard, even on very restricted sets of scripts. In a more theoretical direction, we use techniques from the field of functionnal programming to shed new light on known models of transducers. We contribute a new class of transducers we call High-Order Deterministic tree Transducers (HODT) which generalizes some known models of tree transducers. HODT are defined similarly to Top-down tree transducers (DTOP), but the output of rules are simply-typed lambda-terms. We show how putting constraints on these terms yields different known classes of transducers: restriction to terms of order 0 yields the class of DTOP, while restriction to terms of order at most 1 yields the class of Macro Tree Transducers (MTT). We give a procedure for computing the composition of two HODT. We show that the order of the composition is the sum of the orders of the composed transducers, which gives an interesting explanation of why DTOP are closed under composition (as HODT of order 0) but not MTT (HODT of order 1). In particular we study the restriction of HODT to linear terms, to which we add an inspection by a regular bottom-up automaton. We show that this model represents the same tree-to-tree functions as other known classes of transducers, notably Transductions defined by Monadic Second-Order logic (MSOT). We then prove a similar result for the restriction to almost linear terms and an extension of MSOT called Monadic Second-Order logic tree Transductions with Sharing of subtrees (MSOTS). We then give a specialized procedure for composition of linear HODT which preserves linearity. This procedure relies on linear logic and coherence spaces. Because the time complexity of this procedure depends a lot on the order of transducers, we give a procedure that allows us to reduce the order of linear and almost linear transducers. As a last part, we prove that the word language MIX defined as the commutative closure of the Dyck language does not belong to the class of EDT0L languages nor the class of non-branching Multiple Context-Free Languages.


M. Sylvain SALVATI Université de Lille Directeur de Thèse M. Pierre-Alain REYNIER Université d'Aix-Marseille Rapporteur M. Sylvain SCHMITZ Institut de recherche en informatique fondamentale, Université de Paris Rapporteur M. Patrick BAILLOT Laboratoire de l'informatique du parallélisme Examinateur M. Sebastian MANETH Université de Bremen Examinateur Mme Sophie TISON Université de Lille Examinatrice M. Aurélien LEMAY Université de Lille Invité

Thèse de l'équipe LINKS soutenue le 16/12/2021