Structured Prediction with Reinforcement Learning

Abstract : We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01172474
Contributor : Lip6 Publications <>
Submitted on : Tuesday, July 7, 2015 - 2:34:45 PM
Last modification on : Thursday, March 21, 2019 - 2:34:28 PM

Links full text

Identifiers

Citation

Francis Maes, Ludovic Denoyer, Patrick Gallinari. Structured Prediction with Reinforcement Learning. Machine Learning, Springer Verlag, 2009, 77 (2-3), pp.271-301. ⟨10.1007/s10994-009-5140-8⟩. ⟨hal-01172474⟩

Share

Metrics

Record views

177