Label propagation guided by hierarchy of partitions for superpixel computation, Image Analysis and Processing -ICIAP 2019 -20th International Conference, vol.11752, pp.3-13, 2019. ,
Multimodal fusion for multimedia analysis: a survey, Multimedia systems, vol.16, issue.6, pp.345-379, 2010. ,
Speaker naming in movies, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.2206-2216, 2018. ,
Multimodal understanding for person recognition in video broadcasts, International Conference on Spoken Language Processing (ICSLP), pp.607-611, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01194244
Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs, Proceedings of the 8th International Conference on Spoken Language Processing, pp.333-444, 2004. ,
Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors, IEEE Transactions on Multimedia, vol.20, issue.1, pp.107-118, 2017. ,
Multimodal person discovery in broadcast TV at MediaEval, Working notes of the MediaEval 2016 Workshop, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01690330
Person Instance Graphs for Mono-, Crossand Multi-Modal Person Recognition in Multimedia Data. Application to Speaker Identification in TV Broadcast, International Journal of Multimedia Information Retrieval, 2014. ,
A comparative study using manual and automatic transcriptions for diarization, IEEE Workshop on Automatic Speech Recognition and Understanding, pp.415-419, 2005. ,
Speaker diarization from speech transcripts, International Conference on Spoken Language Processing (ICSLP), pp.1272-1275, 2004. ,
Hierarchical segmentation from a non-increasing edge observation attribute, Pattern Recognition Letters, vol.131, pp.105-112, 2020. ,
Video text recognition using sequential Monte Carlo and error voting methods, Pattern Recognition Letters, vol.26, issue.9, pp.1386-1403, 2005. ,
Hierarchical segmentations with graphs: Quasi-flat zones, minimum spanning trees, and saliency maps, Journal of Mathematical Imaging and Vision, vol.60, issue.4, pp.479-502, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01344727
Tag propagation approaches within speaking face graphs for multimodal person discovery, Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CBMI), p.15, 2017. ,
Histograms of Oriented Gradients for Human Detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol.1, pp.886-893, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00548512
Accurate scale estimation for robust visual tracking, Proceedings of the British Machine Vision Conference, 2014. ,
Decision fusion, 1994. ,
Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.4, pp.788-798, 2011. ,
Extracting true speaker identities from transcriptions, International Conference on Spoken Language Processing (IC-SLP), pp.2601-2604, 2007. ,
The first official repere evaluation, First Workshop on Speech, Language and Audio for Multimedia, 2013. ,
Analysis of i-vector length normalization in speaker recognition systems, 12th Annual Conference of the International Speech Communication Association, 2011. ,
Comparison of two methods for unsupervised person identification in tv shows, 12th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01433260
Efficient heuristic methods for multimodal fusion and concept fusion in video concept detection, IEEE Transactions on Multimedia, vol.17, issue.4, pp.498-511, 2015. ,
Named faces: putting names to faces, IEEE Intelligent Systems and their Applications, vol.14, issue.5, pp.45-50, 1999. ,
Deep multimodal speaker naming, Proceedings of the 23rd ACM International Conference on Multimedia, pp.1107-1110, 2015. ,
A presentation of the repere challenge, 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2012. ,
Fast constrained person identity label propagation in stereo videos using a pruned similarity matrix, Signal Processing: Image Communication, vol.67, pp.199-209, 2018. ,
Multimodal data fusion: An overview of methods, challenges, and prospects, Proceedings of the IEEE, vol.103, issue.9, pp.1449-1477, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01179853
The measurement of observer agreement for categorical data, Biometrics, vol.33, issue.1, pp.159-174, 1977. ,
Towards large scale multimedia indexing: A case study on person discovery in broadcast news, Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CBMI), p.18, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01551690
Eumssi team at the mediaeval person discovery challenge, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01433209
Discriminating joint feature analysis for multimedia data understanding, IEEE Transactions on Multimedia, vol.14, issue.6, pp.1662-1672, 2012. ,
Upc system for the 2016 mediaeval multimodal person discovery in broadcast tv task, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016. ,
Random walks and diffusion on networks, Physics Reports, pp.1-58, 2017. ,
Speaker diarization: About whom the speaker is talking? In: IEEE Odyssey -The Speaker and Language Recognition Workshop, pp.1-6, 2006. ,
Building the component tree in quasi-linear time, IEEE Transactions on Image Processing, vol.15, issue.11, pp.3531-3539, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00622110
Hcmus team at the multimodal person discovery in broadcast tv task of mediaeval, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016. ,
Tokyo tech at mediaeval 2016 multimodal person discovery in broadcast tv task, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016. ,
Learning and transferring mid-level image representations using convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911179
Gtm-uvigo system for multimodal person discovery in broadcast tv task at mediaeval, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016. ,
Unsupervised celebrity face naming in web videos, IEEE Transactions on Multimedia, vol.17, issue.6, pp.854-866, 2015. ,
Evaluation of hierarchical watersheds, IEEE Trans. Image Processing, vol.27, issue.4, pp.1676-1688, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01430865
Evaluation of morphological hierarchies for supervised segmentation, Proceedings of the 12th International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, pp.39-50, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01142072
Cross-media alignment of names and faces, IEEE Transactions on Multimedia, vol.12, issue.1, pp.13-27, 2010. ,
M-vad names: a dataset for video captioning with naming, Multimedia Tools and Applications, vol.78, issue.10, p.27, 2019. ,
Unsupervised speaker identification in tv broadcast based on written names, IEEE Transactions on Audio, Speech, and Language Processing, vol.23, issue.1, pp.57-68, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01060827
Multimodal person discovery in broadcast TV at mediaeval, Working Notes Proceedings of the MediaEval 2015 Workshop, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01690332
Multimodal person discovery in broadcast tv: lessons learned from mediaeval 2015, Multimedia Tools and Applications, vol.76, issue.21, pp.547-569, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01690581
Naming multi-modal clusters to identify persons in TV broadcast, Multimedia Tools and Applications, vol.75, issue.15, pp.8999-9023, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01230628
Robust tree-structured named entities recognition from speech, International Conference on Acoustics, Speech and Signal Processing, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00830142
Visual instance retrieval with deep convolutional networks, ITE Transactions on Media Technology and Applications, vol.4, issue.3, pp.251-258, 2016. ,
Generating descriptions with grounded and co-referenced people, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4979-4989, 2017. ,
An open-source state of the art toolbox for broadcast news diarization, pp.25-29, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01433449
Robust face-name graph matching for movie character identification, IEEE Transactions on Multimedia, vol.14, issue.3, pp.586-596, 2012. ,
SSIG and IRISA at Multimodal Person Discovery, Working Notes Proceedings of the MediaEval, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01196171
Name-it: naming and detecting faces in news videos, IEEE MultiMedia, vol.6, issue.1, pp.22-35, 1999. ,
Facenet: A unified embedding for face recognition and clustering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.815-823, 2015. ,
Automatic discovery of discriminative parts as a quadratic assignment problem, Proceedings of the IEEE International Conference on Computer Vision, pp.1059-1068, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02370324
, Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR, 2015.
Unsupervised discovery of character dictionaries in animation movies, IEEE Transactions on Multimedia, vol.20, issue.3, pp.539-551, 2018. ,
, Particular object retrieval with integral max-pooling of cnn activations. International Conference on Learning Representations (ICLR, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01842218
Who really spoke when? finding speaker turns and identities in broadcast news audio, IEEE ICASSP, vol.1, 2006. ,
Naming people in news videos with label propagation, IEEE Multimedia, vol.18, issue.3, pp.44-55, 2011. ,
A multimodal approach to speaker diarization on tv talk-shows, IEEE Transactions on Multimedia, vol.15, issue.3, pp.509-520, 2013. ,
Weak-labeled active learning with conditional label dependence for multilabel image classification, IEEE Transactions on Multimedia, vol.19, issue.6, pp.1156-1169, 2017. ,
Adaptive learning for celebrity identification with video context, IEEE Transactions on Multimedia, vol.16, issue.5, pp.1473-1485, 2014. ,
Naming every individual in news video monologues, Proceedings of the 12th ACM International Conference on Multimedia, pp.580-587, 2004. ,
Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th ACM International Conference on Multimedia, pp.31-40, 2005. ,
A novel region-based active contour model via local patch similarity measure for image segmentation, Multimedia Tools and Applications, vol.77, issue.18, pp.97-121, 2018. ,
A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation, Multimedia Tools and Applications, vol.78, issue.9, pp.779-790, 2019. ,
A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation, Multimedia Tools and Applications, vol.79, issue.9, pp.5743-5765, 2020. ,
Finding celebrities in billions of web images, IEEE Transactions on Multimedia, vol.14, issue.4, pp.995-1007, 2012. ,
A coupled hidden conditional random field model for simultaneous face clustering and naming in videos, IEEE Transactions on Image Processing, vol.25, issue.12, pp.5780-5792, 2016. ,
Learning with local and global consistency, Advances in neural information processing systems, pp.321-328, 2004. ,
Semi-supervised learning literature survey, vol.2, 2008. ,
Person identity label propagation in stereo videos, IEEE Transactions on Multimedia, vol.16, issue.5, pp.1358-1368, 2014. ,