Neural module networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.39-48, 2016. ,
VQA: Visual Question Answering, ICCV, 2015. ,
Mutan: Multimodal tucker fusion for visual question answering, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02073637
Analysis of individual differences in multidimensional scaling via an n-way generalization of "eckart-young" decomposition, 1970. ,
Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings, SIGIR, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01931470
Tensor decompositions for signal processing applications: From twoway to multiway component analysis, 2015. ,
Detecting visual relationships with deep relational networks, CVPR, 2017. ,
Decompositions of a higher-order tensor in block terms -part ii: Definitions and uniqueness, SIAM J. Matrix Anal. Appl, vol.30, issue.3, pp.1033-1066, 2008. ,
Multimodal classification for analysing social media, 2017. ,
WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation, CVPR, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01515640
Finding beans in burgers: Deep semantic-visual embedding with localization, CVPR, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02171857
, EMNLP, 2016.
Making the v in vqa matter: Elevating the role of image understanding in visual question answering, CVPR, 2017. ,
Visual translation embedding network for visual relation detection, CVPR, 2017. ,
Foundations of the parafac procedure: Models and conditions for an "explanatory, 2001. ,
Multimodal learning and reasoning for visual question answering, Advances in Neural Information Processing Systems, pp.551-562, 2017. ,
Pythia v0.1: The winning entry to the vqa challenge, 2018. ,
An analysis of visual question answering algorithms, The IEEE International Conference on Computer Vision (ICCV), 2017. ,
Hadamard Product for Low-rank Bilinear Pooling, The 5th International Conference on Learning Representations, 2017. ,
Adam: A method for stochastic optimization, ICLR, 2015. ,
Skip-thought vectors, NIPS, 2015. ,
Unifying visual-semantic embeddings with multimodal neural language models, 2015. ,
Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision, vol.123, issue.1, pp.32-73, 2017. ,
Vipcnn: Visual phrase guided convolutional neural network, 2017. ,
Deep variationstructured reinforcement learning for visual relationship and attribute detection, CVPR, 2017. ,
Deeper lstm and normalized cnn visual question answering model, 2015. ,
Deformable part-based fully convolutional network for object detection, BMVC, 2016. ,
Training recurrent answering units with joint loss minimization for vqa, 2016. ,
Tips and tricks for visual question answering: Learnings from the 2017 challenge, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. ,
Some mathematical notes on threemode factor analysis, Psychometrika, vol.31, issue.3, pp.279-311, 1966. ,
Visual relationship detection with internal and external linguistic knowledge distillation, ICCV, 2017. ,
Multi-modal factorized bilinear pooling with co-attention learning for visual question answering, 2017. ,
Beyond bilinear: Generalized multi-modal factorized highorder pooling for visual question answering, 2018. ,
Learning to count objects in natural images for visual question answering, ICLR, 2018. ,