From image coding and representation to robotic vision

Marie Babel

Résumé

This habilitation thesis is first devoted to applications related to image representation and coding. If the image and video coding community has been traditionally focused on coding standardization processes, advanced services and functionalities have been designed in particular to match content delivery system requirements. In this sense, the complete transmission chain of encoded images has now to be considered. To characterize the ability of any communication network to insure end-to-end quality, the notion of Quality of Service (QoS) has been introduced. First defined by the ITU-T as the set of technologies aiming at the degree of satisfaction of a user of the service, QoS is rather now restricted to solutions designed for monitoring and improving network performance parameters. However, end users are usually not bothered by pure technical performances but are more concerned about their ability to experience the desired content. In fact, QoS addresses network quality issues and provides indicators such as jittering, bandwidth, loss rate... An emerging research area is then focused on the notion of Quality of Experience (QoE, also abbreviated as QoX), that describes the quality perceived by end users. Within this context, QoE faces the challenge of predicting the behaviour of any end users. When considering encoded images, many technical solutions can considerably enhance the end user experience, both in terms of services and functionalities, as well as in terms of final image quality. Ensuring the effective transport of data, maintaining security while obtaining the desired end quality remain key issues for video coding and streaming. First parts of my work are then to be seen within this joint QoS/QoE context. From efficient coding frameworks, additional generic functionalities and services such as scalability, advanced entropy coders, content protection, error resilience, image quality enhancement have been proposed. Related to advanced QoE services, such as Region of Interest definition of object tracking and recognition, we further closely studied pseudo-semantic representation. First designed toward coding purposes, these representations aim at exploiting textural spatial redundancies at region level. Indeed, research, for the past 30 years, provided numerous decorrelation tools that reduce the amount of redundancies across both spatial and temporal dimensions in image sequences. To this day, the classical video compression paradigm locally splits the images into blocks of pixels, and processes the temporal axis on a frame by frame basis, without any obvious continuity. Despite very high compression performances such as AVC and forthcoming HEVC standards , one may still advocate the use of alternative approaches. Disruptive solutions have also been proposed, and offer notably the ability to continuously process the temporal axis. However, they often rely on complex tools (\emph{e.g.} Wavelets, control grids) whose use is rather delicate in practice. We then investigate the viability of alternative representations that embed features of both classical and disruptive approaches. The objective is to exhibit the temporal persistence of the textural information, through a time-continuous description. At last, from this pseudo-semantic level of representation, texture tracking system up to object tracking can be designed. From this technical solution, 3D object tracking is a logical outcome, in particular when considering vision robotic issues.

From image coding and representation to robotic vision

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager