Vqa: Visual question answering, Proc. of ICCV, 2015. ,
DOI : 10.1109/iccv.2015.279
URL : http://arxiv.org/pdf/1505.00468
, Layer normalization. Deep Learning Symposium (NIPS, 2016.
Neural machine translation by jointly learning to align and translate, Proc. of ICLR, 2015. ,
Empirical evaluation of gated recurrent neural networks on sequence modeling, Proc. of ICML, 2015. ,
, Visual dialog, 2017.
DOI : 10.1109/cvpr.2017.121
Guesswhat?! visual object discovery through multi-modal dialogue, Proc. of CVPR, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01549641
Modulating and attending the source image during encoding improves multimodal translation, Visually-Grounded Interaction and Language Workshop (NIPS, 2017. ,
A Learned Representation For Artistic Style, Proc. of ICLR, 2017. ,
Feature-wise transformations, Distill, 2018. ,
DOI : 10.23915/distill.00011
URL : https://hal.archives-ouvertes.fr/hal-01841985
The pascal visual object classes (voc) challenge, International journal of computer vision, vol.88, issue.2, pp.303-338, 2010. ,
DOI : 10.1007/s11263-009-0275-4
URL : http://www.dai.ed.ac.uk/homes/ckiw/postscript/ijcv_voc09.pdf
Multimodal compact bilinear pooling for visual question answering and visual grounding, Proc. of EMNLP, 2016. ,
DOI : 10.18653/v1/d16-1044
URL : https://doi.org/10.18653/v1/d16-1044
Rich feature hierarchies for accurate object detection and semantic segmentation, Proc. of of CVPR, 2014. ,
DOI : 10.1109/cvpr.2014.81
URL : http://arxiv.org/pdf/1311.2524
, Neural turing machines, 2014.
Hybrid computing using a neural network with dynamic external memory, Nature, vol.538, issue.7626, p.471, 2016. ,
DOI : 10.1038/nature20101
Deep residual learning for image recognition, Proc. of CVPR, 2016. ,
DOI : 10.1109/cvpr.2016.90
URL : http://arxiv.org/pdf/1512.03385
Segmentation from natural language expressions, Proc. of ECCV, 2016. ,
DOI : 10.1007/978-3-319-46448-0_7
URL : http://arxiv.org/pdf/1603.06180
Natural language object retrieval, Proc. of CVPR, 2016. ,
DOI : 10.1109/cvpr.2016.493
URL : http://arxiv.org/pdf/1511.04164
Compositional attention networks for machine reasoning, Proc. of ICL, 2018. ,
Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proc. of ICML, 2015. ,
Revisiting visual question answering baselines, Proc. of ECCV, 2016. ,
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning, Proc. of CVPR, 2017. ,
Visual question answering: Datasets, algorithms, and future challenges, Computer Vision and Image Understanding, vol.163, pp.3-20, 2017. ,
Referitgame: Referring to objects in photographs of natural scenes, Proc. of EMNLP, 2014. ,
Hadamard Product for Low-rank Bilinear Pooling, Proc. of ICLR, 2017. ,
Multimodal residual learning for visual qa, Proc. of NIPS, 2016. ,
Adam: A method for stochastic optimization, Proc. of ICLR, 2014. ,
Imagenet classification with deep convolutional neural networks, Proc. of of NIPS, 2012. ,
Answerer in questioner's mind for goal-oriented visual dialogue, Visually-Grounded Interaction and Language Workshop (NIPS, 2018. ,
Microsoft coco: Common objects in context, Proc. of ECCV, 2014. ,
Fully convolutional networks for semantic segmentation, Proc. of CVPR, 2015. ,
Hierarchical question-image co-attention for visual question answering, Proc. of NIPS, 2016. ,
Comprehension-guided referring expressions, Proc. of CVPR, 2017. ,
Effective approaches to attention-based neural machine translation, Proc. of EMNLP, 2015. ,
Ask your neurons: A neural-based approach to answering questions about images, Proc. of ICCV, 2015. ,
ImageCLEF: Experimental Evaluation in Visual Information Retrieval, 2012. ,
Modeling context between objects for referring expression understanding, Proc. of ECCV, 2016. ,
Rectified linear units improve restricted boltzmann machines, Proc. of ICML, 2010. ,
Film: Visual reasoning with a general conditioning layer, Proc. of AAAI, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01648685
Grounding of textual phrases in images by reconstruction, Proc. of ECCV, 2016. ,
Guide me: Interacting with deep networks, Proc. of CVPR, 2018. ,
DOI : 10.1109/cvpr.2018.00892
URL : http://arxiv.org/pdf/1803.11544
Imagenet large scale visual recognition challenge, International Journal of Computer Vision, vol.115, issue.3, pp.211-252, 2015. ,
DOI : 10.1007/s11263-015-0816-y
URL : http://arxiv.org/pdf/1409.0575
End-to-end optimization of goal-driven and visually grounded dialogue systems harm de vries, Proc. of IJCAI, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01549642
End-to-end memory networks, Proc. of NIPS, 2015. ,
Modulating early visual processing by language, Proc. of NIPS, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01648683
, Memory networks, 2014.
Dynamic memory networks for visual and textual question answering, Proc. of ICML, 2016. ,
Ask, attend and answer: Exploring question-guided spatial attention for visual question answering, Proc. of ECCV, 2016. ,
DOI : 10.1007/978-3-319-46478-7_28
URL : http://arxiv.org/pdf/1511.05234
Show, attend and tell: Neural image caption generation with visual attention, Proc. of ICML, 2015. ,
Efficient video object segmentation via network modulation, Proc. of CVPR, 2018. ,
DOI : 10.1109/cvpr.2018.00680
URL : http://arxiv.org/pdf/1802.01218
Mattnet: Modular attention network for referring expression comprehension, Proc. of CVPR, 2018. ,
DOI : 10.1109/cvpr.2018.00142
URL : http://arxiv.org/pdf/1801.08186
Modeling context in referring expressions, Proc. of ECCV, 2016. ,
DOI : 10.1007/978-3-319-46475-6_5
URL : http://arxiv.org/pdf/1608.00272
A joint speakerlistener-reinforcer model for referring expressions, Proc. of CVPR, 2016. ,
Reasoning about fine-grained attribute phrases using reference games, Visually-Grounded Interaction and Language Workshop (NIPS, 2017. ,
Parallel attention: A unified framework for visual object discovery through dialogs and queries, Proc. of CVPR, 2018. ,