Description de contenu vidéo : mouvements et élasticité temporelle

Abstract : Video recognition gain in performance during the last years, especially due to the improvement in the deep learning performances on images. However the jump in recognition rate on images does not directly impact the recognition rate on videos. This limitation is certainly due to this added dimension, the time, on which a robust description is still hard to extract. The recurrent neural networks introduce temporality but they have a limited memory. State of the art methods for video description usually handle time as a spatial dimension and the combination of video description methods reach the current best accuracies. However the temporal dimension has its own elasticity, different from the spatial dimensions. Indeed, the temporal dimension of a video can be locally deformed: a partial dilatation produces a visual slow down during the video, without changing the understanding, in contrast with a spatial dilatation on an image which will modify the proportions of the shown objects. We can thus expect to improve the video content classification by creating an invariant description to these speed changes. This thesis focus on the question of a robust video description considering the elasticity of the temporal dimension under three different angles. First, we have locally and explicitly described the motion content. Singularities are detected in the optical flow, then tracked along the time axis and organized in chain to describe video part. We have used this description on sport content. Then we have extracted global and implicit description thanks to tensor decompositions. Tensor enables to consider a video as a multi-dimensional data table. The extracted description are evaluated in a classification task. Finally, we have studied speed normalization method thanks to Dynamical Time Warping methods on series. We have showed that this normalization improve the classification rates.
Liste complète des métadonnées

https://tel.archives-ouvertes.fr/tel-02010091
Contributor : Abes Star <>
Submitted on : Wednesday, February 6, 2019 - 6:28:34 PM
Last modification on : Thursday, February 7, 2019 - 1:23:33 AM

File

2018AZUR4212.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02010091, version 1

Collections

Citation

Katy Blanc. Description de contenu vidéo : mouvements et élasticité temporelle. Vision par ordinateur et reconnaissance de formes [cs.CV]. Université Côte d'Azur, 2018. Français. ⟨NNT : 2018AZUR4212⟩. ⟨tel-02010091⟩

Share

Metrics

Record views

190

Files downloads

53