Habilitation thesis of Jean Martinet

Advanced features for image representation: integrating relations, weights, depth, and time

Tremendous amounts of visual data are produced every day, such as user-generated images and videos from social media platforms, audiovisual archives, etc. It is important to be able to search and retrieve documents among such large collections. Our work in computer vision and multimedia information retrieval focuses on visual features for image representation. In particular, inside the entire processing chain ranging from visual data acquisition with sensors to the user interface that facilitates the interaction with the system, our research addresses the internal representation of visual data in the form of an index that serves as a reference for the system regarding the image contents. In the general context of image representation, we describe in a first part some contributions related to the widely-used paradigm of "bags of visual words". We also discuss the general notion of relation, taken at several levels – the low level of visual words, the transversal level aiming for cross-modal annotation, and the high level of semantic objects. Finally, we focus on the definition of weighting models, that serve as visual counterparts to popular weighting schemes used for text. Because of the specificity of persons and their faces compared to general objects, we focus in a second part on specific features and methods for person recognition. Two directions are developed to overcome some limitations of static 2D approaches based on face images, with the objective of improving systems' precision and robustness. One direction integrates depth in facial features, and the other takes advantage of temporal information in video streams. In both cases, dedicated features and strategies are investigated. Keywords: Computer vision, Multimedia information retrieval, Image representation, Indexing, Visual features, Weighting scheme, Person recognition.

defended on 15/12/2016

Habilitation thesis of Jean Martinet

Advanced features for image representation: integrating relations, weights, depth, and time

Tremendous amounts of visual data are produced every day, such as user-generated images and videos from social media platforms, audiovisual archives, etc. It is important to be able to search and retrieve documents among such large collections. Our work in computer vision and multimedia information retrieval focuses on visual features for image representation. In particular, inside the entire processing chain ranging from visual data acquisition with sensors to the user interface that facilitates the interaction with the system, our research addresses the internal representation of visual data in the form of an index that serves as a reference for the system regarding the image contents. In the general context of image representation, we describe in a first part some contributions related to the widely-used paradigm of "bags of visual words". We also discuss the general notion of relation, taken at several levels – the low level of visual words, the transversal level aiming for cross-modal annotation, and the high level of semantic objects. Finally, we focus on the definition of weighting models, that serve as visual counterparts to popular weighting schemes used for text. Because of the specificity of persons and their faces compared to general objects, we focus in a second part on specific features and methods for person recognition. Two directions are developed to overcome some limitations of static 2D approaches based on face images, with the objective of improving systems' precision and robustness. One direction integrates depth in facial features, and the other takes advantage of temporal information in video streams. In both cases, dedicated features and strategies are investigated. Keywords: Computer vision, Multimedia information retrieval, Image representation, Indexing, Visual features, Weighting scheme, Person recognition.

defended on 15/12/2016