Abstract : This paper tackles the problem of scalable video indexing. We propose a new framework combining spatial and motion patch descriptors. The spatial descriptors are based on a multiscale description of the image and are called Sparse Multiscale Patches. We propose motion patch descriptors based on block motion that describe the motion in a Group of Pictures. The distributions of these sets of patches are compared combining weighted Kullback-Leibler divergences between spatial and motion patches. These divergences are estimated in a non-parametric framework using a k-th Nearest Neighbor estimator. We evaluate this weighted dissimilarity measure on selected videos from the ICOS-HD ANR project. Experiments show that the spatial part of the measure is relevant to detect different sequences, while its motion part allows to detect clips within a sequence. Experiments combining the spatial and temporal parts of the dissimilarity measure show its robustness to resampling and compression; thus exhibiting the spatial scalability of the method on heterogeneous networks.