Abstract : The present paper advances a robust video fingerprinting system for tracking the visual content subjected to live recording.
The methodological novelty of the system relies in creating synergies between architectural modules, designed so as to offer: (1) local visual feature representations, invariant with respect to scale, orientation and affine transformations; (2) scalable global feature representations invariant with respect to photometric transformations and (3) time-variant jitter synchroniza-tion.
The system is tested on a reference database of 14 hours of cinematographic content and on a query dataset of 28 hours of video related to two use cases: a) computer generated distortions (Gaussian filtering, sharpening, rotations with 2°, conver-sion to grayscale, contrast changes, brightness changes, geometric random bending) and b) live camera recording. The for-mer use case resulted in ideal rate of false alarm, probability of missed detection of 0.02 and F1 score of 0.97. However, the applicative novelty is given by solving the latter use case: experimental values of the false alarm rate lower than 0.01, prob-ability of missed detection of 0.04 and F1 score equal to 0.94 were obtained for content live recorded from theatres’ and PC screens.