Multimedia retrieval that works, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp.63-68, 2018. ,
Good news, everyone! context driven entity-aware captioning for news images, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,
BERT: pre-training of deep bidirectional transformers for language understanding, 2018. ,
Finding structure in time, Cognitive science, vol.14, issue.2, pp.179-211, 1990. ,
Vse++: Improving visual-semantic embeddings with hard negatives, 2018. ,
Bridging by word: Image grounded vocabulary construction for visual captioning, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.6514-6524, 2019. ,
Generative adversarial nets, Advances in Neural Information Processing Systems, vol.27, pp.2672-2680, 2014. ,
Canonical correlation analysis: An overview with application to learning methods, Neural computation, vol.16, issue.12, pp.2639-2664, 2004. ,
Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Deep cross-media knowledge transfer, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.8837-8846, 2018. ,
Few-shot image and sentence matching via gated visual-semantic embedding, Proceedings of the AAAI Conference on Artificial Intelligence, vol.33, pp.8489-8496, 2019. ,
Acmm: Aligned cross-modal memory for few-shot image and sentence matching, The IEEE International Conference on Computer Vision (ICCV), 2019. ,
Attribute-guided network for cross-modal zero-shot hashing, IEEE transactions on neural networks and learning systems, 2019. ,
Deep pairwise ranking with multi-label information for cross-modal retrieval, 2019 IEEE International Conference on Multimedia and Expo (ICME), pp.1810-1815, 2019. ,
Stacked cross attention for image-text matching, The European Conference on Computer Vision (ECCV), 2018. ,
Person search with natural language description, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. ,
Object-driven text-to-image synthesis via adversarial training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.12174-12182, 2019. ,
, , 2014.
Focus your attention: A bidirectional focal attention network for image-text matching, Proceedings of the 27th ACM International Conference on Multimedia, MM '19, pp.3-11, 2019. ,
A neighbor-aware approach for image-text matching, ICASSP 2019 -2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3970-3974, 2019. ,
A strong and robust baseline for text-image matching, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp.169-176, 2019. ,
Cross-media retrieval: State-ofthe-art and open issues, Int. J. of Multimedia Intelligence and Security, vol.1, pp.33-52, 2010. ,
Deep adversarial graph attention convolution network for text-based person search, Proceedings of the 27th ACM International Conference on Multimedia, MM '19, pp.665-673, 2019. ,
Cross-modal image-text retrieval with multitask learning, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM '19, pp.2309-2312, 2019. ,
Matching image and sentence with multi-faceted representations, IEEE Transactions on Circuits and Systems for Video Technology, pp.1-1, 2019. ,
Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images, IEEE Trans. Pattern Anal. Mach. Intell, 2019. ,
Cross-modal music retrieval and applications: An overview of key methodologies, IEEE Signal Processing Magazine, vol.36, issue.1, pp.52-62, 2019. ,
Dual attention networks for multimodal reasoning and matching, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2156-2164, 2017. ,
An overview of crossmedia retrieval: Concepts, methodologies, benchmarks, and challenges, IEEE Transactions on Circuits and Systems for Video Technology, vol.28, pp.2372-2385, 2018. ,
Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models, 2015. ,
Mirrorgan: Learning text-to-image generation by redescription, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1505-1514, 2019. ,
A new approach to cross-modal multimedia retrieval, Proceedings of the 18th ACM International Conference on Multimedia, MM '10, pp.251-260, 2010. ,
Learning cross-modal embeddings for cooking recipes and food images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. ,
Adversarial representation learning for text-to-image matching, Proceedings of the IEEE International Conference on Computer Vision, pp.5814-5824, 2019. ,
Phoneme recognition using time-delay neural networks, IEEE transactions on acoustics, speech, and signal processing, vol.37, issue.3, pp.328-339, 1989. ,
Adversarial cross-modal retrieval, Proceedings of the 25th ACM International Conference on Multimedia, MM '17, pp.154-162, 2017. ,
Learning cross-modal embeddings with adversarial networks for cooking recipes and food images, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,
Learning twobranch neural networks for image-text matching tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, issue.2, pp.394-407, 2019. ,
Matching images and text with multi-modal tensor fusion and re-ranking, Proceedings of the 27th ACM International Conference on Multimedia, MM '19, pp.12-20, 2019. ,
Position focused attention network for image-text matching, Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI'19, pp.3792-3798, 2019. ,
Camp: Cross-modal adaptive message passing for text-image retrieval, The IEEE International Conference on Computer Vision (ICCV), 2019. ,
Learning fragment self-attention embeddings for image-text matching, Proceedings of the 27th ACM International Conference on Multimedia, MM '19, pp.2088-2096, 2019. ,
Attngan: Fine-grained text to image generation with attentional generative adversarial networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1316-1324, 2018. ,
Deep cross-modal projection learning for image-text matching, The European Conference on Computer Vision (ECCV), 2018. ,
Deep supervised cross-modal retrieval, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,
R2gan: Crossmodal recipe retrieval with generative adversarial network, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,