Off-policy Learning with Eligibility Traces: A Survey, Journal of Machine Learning Research, vol.15, pp.289-333, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00921275
Algorithmic Survey of Parametric Value Function Approximation, IEEE Transactions on Neural Networks and Learning Systems, vol.24, issue.6, pp.845-867, 2013. ,
DOI : 10.1109/TNNLS.2013.2247418
URL : https://hal.archives-ouvertes.fr/hal-00869725
A C++ Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics, Journal of Machine Learning Research, vol.14, pp.399-402, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00914768
A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization, IEEE Journal of Selected Topics in Signal Processing, vol.6, issue.8, pp.891-902 ,
DOI : 10.1109/JSTSP.2012.2229257
Senthilkumar Chandramohan, et Hervé Frezza-Buet. Sample-Efficient Batch Reinforcement Learning for Dialogue Management Optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, p.2011 ,
Kalman Temporal Differences, Journal of Artificial Intelligence Research (JAIR), vol.39, pp.483-532, 2010. ,
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00858687
From Supervised to Reinforcement Learning: a Kernel-based Bayesian Filtering Framework, International Journal On Advances in Software, vol.2, issue.1, pp.101-116, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00429891
Boosted and Reward-regularized Classification for Apprenticeship Learning, 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), p.2014 ,
URL : https://hal.archives-ouvertes.fr/hal-01107837
Model-free POMDP optimisation of tutoring systems with echo-state networks, Proceedings of the 14th SIGDial Meeting on Discourse and Dialogue, pp.102-106, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00869773
Around Inverse Reinforcement Learning and Score-based Classification, 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00916936
Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), pp.17-32, 2013. ,
DOI : 10.1007/978-3-642-40988-2_2
URL : https://hal.archives-ouvertes.fr/hal-00916938
A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), pp.1-16, 2013. ,
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804
Particle Swarm Optimisation of Spoken Dialogue System Strategies, Proceedings of the 14th Annual Conference of the International Speech Communication Association, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00916935
Laugh-aware virtual agent and its impact on user amusement, International Conference on Autonomous Agents and Multiagent Systems, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00869751
Random Projetctions: a Remedy for Overfitting Issues in Time Series Prediction with Echo State Networks, IEEE International Conference on Acoustics, Speech and Signal Processing, p.2013, 2013. ,
Co-adaptation in Spoken Dialogue Systems, International Workshop on Spoken Dialog Systems, p.2012, 2012. ,
DOI : 10.1007/978-1-4614-8280-2_31
URL : https://hal.archives-ouvertes.fr/hal-00778752
Inverse Reinforcement Learning through Structured Classification, Advances in Neural Information Processing Systems, p.2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00778624
Behavior Specific User Simulation in Spoken Dialogue Systems, ITG Conference on Speech Communication, p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00749421
Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00697169
A Dantzig Selector Approach to Temporal Difference Learning, International Conference on Machine Learning (ICML), p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00749480
Filtering of pathological ventricular rhythms during MRI scanning, International Workshop on Biosignal Interpretation, p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00749457
Clustering behaviors of Spoken Dialogue Systems users, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p.2012, 2012. ,
DOI : 10.1109/ICASSP.2012.6289038
URL : https://hal.archives-ouvertes.fr/hal-00685009
Off-policy Learning in Largescale POMDP-based Dialogue Systems, IEEE International Conference on Acoustics , Speech and Signal Processing, pp.4989-4992, 2012. ,
DOI : 10.1109/icassp.2012.6289040
URL : https://hal.archives-ouvertes.fr/hal-00684819
Monte-Carlo Swarm Policy Search, Symposium on Swarm Intelligence and Differential Evolution, p.2012 ,
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540
Kalman filtering & colored noises: the (autoregressive ) moving-average case, IEEE Workshop on Machine Learning Algorithms, Systems and Applications, p.2011, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00660607
Reducing the dimentionality of the reward space in the Inverse Reinforcement Learning problem, IEEE Workshop on Machine Learning Algorithms, Systems and Applications, p.2011, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00660612
A Non-parametric Approach to Approximate Dynamic Programming, 2011 10th International Conference on Machine Learning and Applications and Workshops, pp.317-322, 2011. ,
DOI : 10.1109/ICMLA.2011.19
URL : https://hal.archives-ouvertes.fr/hal-00652438
Optimization of a Tutoring System from a Fixed Set of Data, ISCA workshop on Speech and Language Technology in Education, p.2011, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652324
Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system, Annual Conference of the International Speech Communication Association, pp.1301-1304, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652194
User Simulation in Dialogue Systems using Inverse Reinforcement Learning, Annual Conference of the International Speech Communication Association, pp.1025-1028, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652446
Performance Evaluation for Particle Filters, International Conference on Information Fusion, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652168
Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences, International Joint Conference on Artificial Intelligence (IJCAI 2011), pp.1878-1883, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00618252
Dynamic neural field optimization using the unscented Kalman filter, 2011 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB), p.2011, 2011. ,
DOI : 10.1109/CCMB.2011.5952113
URL : https://hal.archives-ouvertes.fr/hal-00618117
Parametric value function approximation: A unified view, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.9-16, 2011. ,
DOI : 10.1109/ADPRL.2011.5967355
URL : https://hal.archives-ouvertes.fr/hal-00618112
Managing Uncertainty within the KTD Framework, Workshop on Active Learning and Experimental Design Journal of Machine Learning Research (Conference and Workshop Proceedings), pp.157-168, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00599636
???1-Penalized Projected Bellman Residual, European Workshop on Reinforcement Learning (EWRL 2011), Lecture Notes in Computer Science (LNCS), 2011. ,
DOI : 10.1007/978-3-642-29946-9_12
URL : http://hal.inria.fr/docs/00/64/45/07/PDF/gs_ewrl_l1_cr.pdf
Batch, Off-Policy and Model-Free Apprenticeship Learning, European Workshop on Reinforcement Learning, 2011. ,
DOI : 10.1007/978-3-642-29946-9_28
URL : https://hal.archives-ouvertes.fr/hal-00660623
Recursive Least-Squares Learning with Eligibility Traces, European Workshop on Machine Learning (EWRL 2011), Lecture Notes in Computer Science (LNCS), 2011. ,
DOI : 10.1007/978-3-642-29946-9_14
URL : https://hal.archives-ouvertes.fr/hal-00644511
Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, pp.458-465, 2010. ,
DOI : 10.1109/ICUMT.2010.5676597
URL : https://hal.archives-ouvertes.fr/hal-00553910
Statistically linearized least-squares temporal differences, International Congress on Ultra Modern Telecommunications and Control Systems, pp.450-457, 2010. ,
DOI : 10.1109/ICUMT.2010.5676598
URL : https://hal.archives-ouvertes.fr/hal-00554338
Optimizing Spoken Dialogue Management with Fitted Value Iteration, International Conference on Speech Communication and Technologies, pp.86-89, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553184
Sparse Approximate Dynamic Programming for Dialog Management, SIGDial Conference on Discourse and Dialogue, pp.107-115, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553180
Statistically linearized recursive least squares, 2010 IEEE International Workshop on Machine Learning for Signal Processing, pp.272-276, 2010. ,
DOI : 10.1109/MLSP.2010.5589236
URL : https://hal.archives-ouvertes.fr/hal-00553168
Revisiting Natural Actor-Critics with Value Function Approximation, Modeling Decisions for Artificial Intelligence, pp.207-218, 2010. ,
DOI : 10.1007/11596448_9
URL : https://hal.archives-ouvertes.fr/hal-00554346
Tracking in Reinforcement Learning Kernelizing Vector Quantization Algorithms, International Conference on Neural Information Processing ENNS best student paper award 46. Matthieu Geist, Olivier Pietquin, et Gabriel Fricout European Symposium on Artificial Neural Networks (ESANN 09), pp.502-511, 2009. ,
Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.185-192, 2009. ,
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870
Bayesian Reward Filtering, Recent Advances in Reinforcement Learning, pp.96-109, 2008. ,
DOI : 10.1145/1143844.1143955
URL : https://hal.archives-ouvertes.fr/hal-00351282
Online Bayesian kernel regression from nonlinear mapping of observations, 2008 IEEE Workshop on Machine Learning for Signal Processing, pp.309-314, 2008. ,
DOI : 10.1109/MLSP.2008.4685498
URL : https://hal.archives-ouvertes.fr/hal-00335052
A Sparse Nonlinear Bayesian Online Kernel Regression, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences, pp.199-204, 2008. ,
DOI : 10.1109/ADVCOMP.2008.7
URL : https://hal.archives-ouvertes.fr/hal-00327081
Classification régularisée par la récompense pour l'Apprentissage par Imitation, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), p.2013 ,
Apprentissage par démonstrations : vaut-il la peine d'estimer une fonction de récompense?, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), p.2013 ,
Classi???cation structur??e pour l???apprentissage par renforcement inverse, Conférence Francophone sur l'Apprentissage Automatique, p.2012, 2012. ,
DOI : 10.3166/ria.27.155-169
Optimisation de contrôleurs par essaim de particules, Conférence Francophone sur l'Apprentissage Automatique, p.2012, 2012. ,
Apprentissage off-policy appliqué à un système de dialogue basé sur les PDMPO, Congrès francophone sur la Reconnaissance de Formes et l'Intelligence Artificielle, p.2012, 2012. ,
Un sélecteur de Dantzig pour l'apprentissage par différences temporelles, Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes (JFPDA), p.2012 ,
Approximations de l'algorithme Itérations sur les Politiques Modifié, Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes (JFPDA), p.2012 ,
Regroupement non-supervisé d'utilisateurs par leur comportement pour les systèmes de dialogue parlé, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, p.2012, 2012. ,
Apprentissage par renforcement pour la personnalisation d'un logiciel d'enseignement des langues, Conférence sur les Environnements Informatiques pour l'Apprentissage Humain, p.2011, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652516
Moindres carrés récursifs pour l'évaluation offpolicy d'une politique avec traces d'éligibilité, Journées Francophones de Planification , Décision et Apprentissage pour la conduite de systèmes, p.2011, 2011. ,
Apprentissage par imitation étendu au cas batch, off-policy et sans modèle, Journées Francophones de Planification , Décision et Apprentissage pour la conduite de systèmes, p.2011, 2011. ,
Gestion de l'incertitude pour l'optimisation en ligne d'un gestionnaire de dialogues parlés à grande échelle basé sur les POMDP, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, p.2011, 2011. ,
Apprentissage par Renforcement Inverse pour la Simulation d'Utilisateurs dans les Systèmes de Dialogue, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, p.2011, 2011. ,
Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004. ,
DOI : 10.1145/1015330.1015430
URL : http://www.aicml.cs.ualberta.ca/banff04/icml/pages/papers/335.pdf
Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Computer and System Sciences, vol.66, issue.4, pp.671-687, 2003. ,
DOI : 10.1016/S0022-0000(03)00025-4
URL : https://doi.org/10.1016/s0022-0000(03)00025-4
On using intelligent computer-assisted language learning in real-life foreign language teaching and learning. ReCALL, pp.4-24, 2011. ,
Fitted Q-iteration in continuous action-space MDPs, Advances in neural information processing systems (NIPS), pp.9-16, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00185311
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
DOI : 10.1007/11776420_42
URL : https://hal.archives-ouvertes.fr/hal-00830201
On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Optimal control of Markov processes with incomplete state information, Journal of Mathematical Analysis and Applications, vol.10, issue.1, p.174, 1965. ,
Cost-sensitive multiclass classification risk bounds, International Conference on Machine Learning (ICML), pp.1391-1399, 2013. ,
Residual Algorithms : Reinforcement Learning with Function Approximation, International Conference on Machine Learning (ICML), pp.30-37, 1995. ,
Policy iteration based on stochastic factorization, Journal of Artificial Intelligence Research, pp.763-803, 2014. ,
Infinite-Horizon Gradient-Based Policy Search, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
Guess- Averse Loss Functions For Cost-Sensitive Multiclass Boosting, International Conference on Machine Learning (ICML), pp.586-594, 2014. ,
Dynamic Programming and Optimal Control, Athena Scientific, 1995. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, pp.27-50, 2009. ,
Natural actor???critic algorithms, Automatica, vol.45, issue.11, 2009. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : https://hal.archives-ouvertes.fr/hal-00840470
Relative entropy inverse reinforcement learning, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.182-189, 2011. ,
Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
The Dantzig selector: Statistical estimation when p is much larger than n, The Annals of Statistics, vol.35, issue.6, pp.2313-2351, 2007. ,
DOI : 10.1214/009053606000001523
URL : http://doi.org/10.1214/009053606000001523
Revisiting User Simulation in Dialogue Systems : Do we still need them ? Will imitation play the role of simulation, Thèse de Doctorat en Informatique, 2012. ,
URL : https://hal.archives-ouvertes.fr/tel-00875229
User Simulation in Dialogue Systems using Inverse Reinforcement Learning, Annual Conference of the International Speech Communication Association, pp.1025-1028, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652446
Behavior Specific User Simulation in Spoken Dialogue Systems, ITG Conference on Speech Communication, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00749421
Clustering behaviors of Spoken Dialogue Systems users, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p.2012, 2012. ,
DOI : 10.1109/ICASSP.2012.6289038
URL : https://hal.archives-ouvertes.fr/hal-00685009
Co-adaptation in Spoken Dialogue Systems, International Workshop on Spoken Dialog Systems, p.2012, 2012. ,
DOI : 10.1007/978-1-4614-8280-2_31
URL : https://hal.archives-ouvertes.fr/hal-00778752
Regroupement non-supervisé d'utilisateurs par leur comportement pour les systèmes de dialogue parlé, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, p.2012, 2012. ,
Optimizing Spoken Dialogue Management with Fitted Value Iteration, International Conference on Speech Communication and Technologies, pp.86-89, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553184
Sparse Approximate Dynamic Programming for Dialog Management, SIGDial Conference on Discourse and Dialogue, pp.107-115, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553180
Apprentissage par Renforcement Inverse pour la Simulation d'Utilisateurs dans les Systèmes de Dialogue, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2011. ,
A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems, Intelligent Tutoring Systems, pp.104-113, 2006. ,
DOI : 10.1007/11774303_11
Direct Policy Iteration with Demonstrations, International Joint Conference on Artificial Intelligence (IJCAI), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01237659
Interactive policy learning through confidencebased autonomy, Journal of Artificial Intelligence Research, vol.34, issue.11, 2009. ,
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, pp.207-239, 2006. ,
Performance Evaluation for Particle Filters, International Conference on Information Fusion, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652168
Knowledge tracing : Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, pp.253-278, 1994. ,
Student modeling in the ACT Programming Tutor. Cognitively diagnostic assessment, pp.19-41, 1995. ,
Gestion de l'incertitude pour l'optmisation de systèmes interactifs, Thèse de Doctorat en Informatique, 2013. ,
Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system, Annual Conference of the International Speech Communication Association, pp.1301-1304, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652194
A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization, IEEE Journal of Selected Topics in Signal Processing, vol.6, issue.8, pp.891-902, 2012. ,
DOI : 10.1109/JSTSP.2012.2229257
Apprentissage par renforcement pour la personnalisation d'un logiciel d'enseignement des langues, Conférence sur les Environnements Informatiques pour l'Apprentissage Humain, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652516
Gestion de l'incertitude pour l'optimisation en ligne d'un gestionnaire de dialogues parlés à grande échelle basé sur les POMDP, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2011. ,
Apprentissage off-policy appliqué à un système de dialogue basé sur les PDMPO, Congrès francophone sur la Reconnaissance de Formes et l'Intelligence Artificielle, p.2012, 2012. ,
Off-policy Learning in Largescale POMDP-based Dialogue Systems, IEEE International Conference on Acoustics , Speech and Signal Processing, pp.4989-4992, 2012. ,
DOI : 10.1109/icassp.2012.6289040
URL : https://hal.archives-ouvertes.fr/hal-00684819
Model-free POMDP optimisation of tutoring systems with echo-state networks, Proceedings of the 14th SIGDial Meeting on Discourse and Dialogue, pp.102-106, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00869773
Optimisation par essaims particulaires de stratégies de dialogue, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), 2013. ,
Particle Swarm Optimisation of Spoken Dialogue System Strategies, Proceedings of the 14th Annual Conference of the International Speech Communication Association, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00916935
Random Projetctions : a Remedy for Overfitting Issues in Time Series Prediction with Echo State Networks, IEEE International Conference on Acoustics, Speech and Signal Processing, p.2013, 2013. ,
The linear programming approach to approximate dynamic programming, Operations Research, vol.51, issue.6, pp.850-865, 2003. ,
Least Angle Regression, Annals of Statistics, vol.32, issue.2, pp.407-499, 2004. ,
Algorithms and Representations for Reinforcement Learning, 2005. ,
Bayes Meets Bellman : The Gaussian Process Approach to Temporal Difference Learning, International Conference on Machine Learning (ICML), pp.154-161, 2003. ,
The Kernel Recursive Least-Squares Algorithm, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2275-2285, 2004. ,
DOI : 10.1109/TSP.2004.830985
URL : http://www-ee.technion.ac.il/~rmeir/Publications/Engel-Mannor-Meir-IEEE04.pdf
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Regularized policy iteration, Advances in Neural Information Processing Systems (NIPS), pp.441-448, 2009. ,
Model selection in reinforcement learning, Machine Learning, vol.18, issue.1, pp.299-332, 2011. ,
DOI : 10.1109/TNN.2007.899161
Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems (NIPS), pp.568-576, 2010. ,
Approximate Policy Iteration with a Policy Language Bias : Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research (JAIR), vol.25, pp.75-118, 2006. ,
Monte-Carlo Swarm Policy Search, Symposium on Swarm Intelligence and Differential Evolution, 2012. ,
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540
Optimisation de contrôleurs par essaim de particules, Conférence Francophone sur l'Apprentissage Automatique, p.2012, 2012. ,
Dynamic neural field optimization using the unscented Kalman filter, 2011 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB), 2011. ,
DOI : 10.1109/CCMB.2011.5952113
URL : https://hal.archives-ouvertes.fr/hal-00618117
A C++ Template-Based Reinforcement Learning Library : Fitting the Code to the Mathematics, Journal of Machine Learning Research, vol.14, pp.399-402, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00914768
Analyse des données pour l'analyse, le suivi et le contrôle des dispersions, 2006. ,
Modélisation de chaînes de production et de leurs interactions, Supélec (M2R Mathématiques), 2006. ,
Optimisation des chaînes de production dans l'industrie sidérurgique : une approche statistique de l'apprentissage par renforcement, Doctorat en Mathématiques, 2009. ,
A multiplicative UCB strategy for Gamma rewards, European Workshop on Reinforcement Learning (EWRL), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01258820
Soft-max boosting, Machine Learning, 2015. ,
DOI : 10.1007/s00365-006-0662-3
URL : https://hal.archives-ouvertes.fr/hal-01258816
Around Inverse Reinforcement Learning and Score-based Classification, 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00916936
Architectures acteur-critique avec approximation de la valeur, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2010. ,
Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, pp.458-465, 2010. ,
DOI : 10.1109/ICUMT.2010.5676597
URL : https://hal.archives-ouvertes.fr/hal-00553910
Gestion de l'incertitude dans le cadre de l'approximation de la fonction de valeur pour l'apprentissage par renforcement, Conférence francophone sur l'apprentissage automatique, pp.101-112, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553895
Kalman Temporal Differences, Journal of Artificial Intelligence Research (JAIR), vol.39, pp.483-532, 2010. ,
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00858687
Linéarisation statistique pour les différences temporelles par moindres carrés, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2010. ,
Revisiting Natural Actor-Critics with Value Function Approximation, Modeling Decisions for Artificial Intelligence, pp.207-218, 2010. ,
DOI : 10.1007/11596448_9
URL : https://hal.archives-ouvertes.fr/hal-00554346
Statistically linearized least-squares temporal differences, International Congress on Ultra Modern Telecommunications and Control Systems, pp.450-457, 2010. ,
DOI : 10.1109/ICUMT.2010.5676598
URL : https://hal.archives-ouvertes.fr/hal-00554338
Statistically linearized recursive least squares, 2010 IEEE International Workshop on Machine Learning for Signal Processing, pp.272-276, 2010. ,
DOI : 10.1109/MLSP.2010.5589236
URL : https://hal.archives-ouvertes.fr/hal-00553168
Kalman filtering & colored noises : the (autoregressive ) moving-average case, IEEE Workshop on Machine Learning Algorithms, Systems and Applications, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00660607
Managing Uncertainty within the KTD Framework, Workshop on Active Learning and Experimental Design Journal of Machine Learning Research (Conference and Workshop Proceedings), pp.157-168, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00599636
Parametric value function approximation: A unified view, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.9-16, 2011. ,
DOI : 10.1109/ADPRL.2011.5967355
URL : https://hal.archives-ouvertes.fr/hal-00618112
Algorithmic Survey of Parametric Value Function Approximation, IEEE Transactions on Neural Networks and Learning Systems, vol.24, issue.6, pp.845-867, 2013. ,
DOI : 10.1109/TNNLS.2013.2247418
URL : https://hal.archives-ouvertes.fr/hal-00869725
A Sparse Nonlinear Bayesian Online Kernel Regression, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences, pp.199-204, 2008. ,
DOI : 10.1109/ADVCOMP.2008.7
URL : https://hal.archives-ouvertes.fr/hal-00327081
Bayesian Reward Filtering, Recent Advances in Reinforcement Learning, pp.96-109, 2008. ,
DOI : 10.1145/1143844.1143955
URL : https://hal.archives-ouvertes.fr/hal-00351282
Filtrage bayésien de la récompense, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, pp.113-122, 2008. ,
Online Bayesian kernel regression from nonlinear mapping of observations, 2008 IEEE Workshop on Machine Learning for Signal Processing, pp.309-314, 2008. ,
DOI : 10.1109/MLSP.2008.4685498
URL : https://hal.archives-ouvertes.fr/hal-00335052
Différences Temporelles de Kalman, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2009. ,
DOI : 10.3166/ria.24.423-443
Différences Temporelles de Kalman : le cas stochastique, Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2009. ,
DOI : 10.3166/ria.24.423-443
From Supervised to Reinforcement Learning : a Kernel-based Bayesian Filtering Framework, International Journal On Advances in Software, vol.2, issue.1, pp.101-116, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00429891
Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.185-192, 2009. ,
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870
Kernelizing Vector Quantization Algorithms, European Symposium on Artificial Neural Networks (ESANN 09), pp.541-546, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00429892
Tracking in Reinforcement Learning, International Conference on Neural Information Processing, pp.502-511, 2009. ,
DOI : 10.1007/978-3-642-10677-4_57
URL : https://hal.archives-ouvertes.fr/hal-00439316
Astuce du Noyau & Quantification Vectorielle, Colloque sur la Reconnaissance des Formes et l'Intelligence Artificielle (RFIA'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00553114
Différences temporelles de Kalman : cas déterministe. Revue d'Intelligence Artificielle, pp.423-442, 2010. ,
DOI : 10.3166/ria.24.423-443
???1-Penalized Projected Bellman Residual, European Workshop on Reinforcement Learning Lecture Notes in Computer Science (LNCS), 2011. ,
DOI : 10.1007/978-3-642-29946-9_12
URL : http://hal.inria.fr/docs/00/64/45/07/PDF/gs_ewrl_l1_cr.pdf
Moindres carrés récursifs pour l'évaluation offpolicy d'une politique avec traces d'éligibilité, Journées Francophones de Planification , Décision et Apprentissage pour la conduite de systèmes, 2011. ,
Off-policy Learning with Eligibility Traces : A Survey, Journal of Machine Learning Research, vol.15, pp.289-333, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00921275
A Dantzig Selector Approach to Temporal Difference Learning, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00749480
Un sélecteur de Dantzig pour l'apprentissage par différences temporelles, Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes (JFPDA), 2012. ,
A Non-parametric Approach to Approximate Dynamic Programming, 2011 10th International Conference on Machine Learning and Applications and Workshops, pp.317-322, 2011. ,
DOI : 10.1109/ICMLA.2011.19
URL : https://hal.archives-ouvertes.fr/hal-00652438
Stable Function Approximation in Dynamic Programming, International Conference on Machine Learning (IMCL), 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
URL : http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Gordon-1995.pdf
AutoTutor : An intelligent tutoring system with mixed-initiative dialogue, IEEE Transactions on Education, vol.48, issue.4, pp.612-618, 2005. ,
Generalized Boosting Algorithms for Convex Optimization, International Conference on Machine Learning (ICML), pp.1209-1216, 2011. ,
Modelling transition dynamics in MDPs with RKHS embeddings, International Conference on Machine Learning (ICML), pp.535-542, 2012. ,
Evolution Strategies for Direct Policy Search, Parallel Problem Solving from Nature?PPSN X, pp.428-437, 2008. ,
DOI : 10.1007/978-3-540-87700-4_43
URL : http://www.neuroinformatik.ruhr-uni-bochum.de/thbio/members/profil/Heidrich-Meisner/H-MIppsn08.pdf
Regularized Least Squares Temporal Difference Learning with Nested ???2 and ???1 Penalization, European Workshop on Reinforcement Learning (EWRL), 2011. ,
DOI : 10.1007/978-3-642-29946-9_13
Learning attractor landscapes for learning motor primitives, Advances in Neural Information Processing Systems (NIPS), pp.1523-1530, 2002. ,
The " echo state " approach to analyzing and training recurrent neural networks, Fraunhofer Institute for Autonomous Intelligent Systems, 2001. ,
Unscented filtering and nonlinear estimation, Proceedings of the IEEE, pp.401-422, 2004. ,
Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998. ,
DOI : 10.1016/S0004-3702(98)00023-X
A Natural Policy Gradient, Neural Information Processing Systems (NIPS), pp.1531-1538, 2001. ,
Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), pp.267-274, 2002. ,
Bias-Variance Error Bounds for Temporal Difference Updates, Conference on Learning Theory (COLT), 2000. ,
An iterative design methodology for user-friendly natural language office information applications, ACM Transactions on Information Systems (TOIS), vol.2, issue.1, pp.26-41, 1984. ,
Particle swarm optimization, Proceedings of ICNN'95, International Conference on Neural Networks, pp.1942-1948, 1995. ,
DOI : 10.1109/ICNN.1995.488968
Learning from limited demonstrations, Advances in Neural Information Processing Systems (NIPS), pp.2859-2867, 2013. ,
Contributions à l'apprentissage par renforcement inverse, Thèse de Doctorat en Informatique, 2013. ,
DOI : 10.3166/ria.27.155-169
URL : https://hal.archives-ouvertes.fr/tel-01303275
Apprentissage par imitation étendu au cas batch, off-policy et sans modèle, Journées Francophones de Planification , Décision et Apprentissage pour la conduite de systèmes, 2011. ,
Batch, Off-Policy and Model-Free Apprenticeship Learning, European Workshop on Reinforcement Learning Lecture Notes in Computer Science (LNCS), 2011. ,
DOI : 10.1007/978-3-642-29946-9_28
URL : https://hal.archives-ouvertes.fr/hal-00660623
Reducing the dimentionality of the reward space in the Inverse Reinforcement Learning problem, IEEE Workshop on Machine Learning Algorithms, Systems and Applications, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00660612
Inverse Reinforcement Learning through Structured Classification, Advances in Neural Information Processing Systems, p.2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00778624
Classi???cation structur??e pour l???apprentissage par renforcement inverse, Conférence Francophone sur l'Apprentissage Automatique, p.2012, 2012. ,
DOI : 10.3166/ria.27.155-169
A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), pp.1-16, 2013. ,
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804
Apprentissage par renforcement inverse en cascadant classification et régression, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), 2013. ,
Classi???cation structur??e pour l???apprentissage par renforcement inverse, Revue d'intelligence artificielle, vol.27, issue.2, 2013. ,
DOI : 10.3166/ria.27.155-169
Policy search for motor primitives in robotics, Machine Learning, pp.171-203, 2011. ,
DOI : 10.1007/978-3-319-03194-1_4
URL : http://papers.nips.cc/paper/3545-policy-search-for-motor-primitives-in-robotics.pdf
Intelligent Tutoring Goes To School in the Big City, International Journal of Artificial Intelligence in Education, vol.8, pp.30-43, 1997. ,
The Fixed Ponts of Off-Policy TD, Neural Information Processing Systems (NIPS), 2011. ,
Regularization and Feature Selection in Least-Squares Temporal Difference Learning, International Conference on Machine Learning, 2009. ,
DOI : 10.1145/1553374.1553442
URL : http://www.cs.mcgill.ca/~icml2009/papers/439.pdf
Least-squares policy iteration, The Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Reinforcement Learning as Classification : Leveraging Modern Classifiers, International Conference on Machine Learning (ICML), pp.424-431, 2003. ,
Information state and dialogue management in the TRINDI dialogue move engine toolkit, Natural Language Engineering, vol.6, issue.3&4, pp.323-340, 2000. ,
DOI : 10.1017/S1351324900002539
Analysis of a classification-based policy iteration algorithm, International Conference on Machine Learning (ICML), pp.607-614, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Multicategory Support Vector Machines, Journal of the American Statistical Association, vol.99, issue.465, pp.9967-81, 2004. ,
DOI : 10.1198/016214504000000098
URL : http://www.stat.wisc.edu/~wahba/ftp1/lee.lin.wahba.04.pdf
An ISU dialogue system exhibiting reinforcement learning of dialogue policies, Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations on, EACL '06, pp.119-122, 2006. ,
DOI : 10.3115/1608974.1608986
Data-Driven Methods for Adaptive Spoken Dialogue Systems : Computational Learning for Conversational Interfaces, 2012. ,
DOI : 10.1007/978-1-4614-4803-7
URL : https://hal.archives-ouvertes.fr/hal-00756740
Modelling transition dynamics in MDPs with RKHS embeddings, International Conference on Machine Learning (ICML), pp.535-542, 2012. ,
A stochastic model of humanmachine interaction for learning dialog strategies. Speech and Audio Processing, IEEE Transactions on, vol.8, issue.1, pp.11-23, 2000. ,
Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection, InterSpeech, pp.2475-2478, 2009. ,
Sparse Temporal Difference Learning Using LASSO, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.352-359, 2007. ,
DOI : 10.1109/ADPRL.2007.368210
URL : https://hal.archives-ouvertes.fr/inria-00117075
Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous , Multidimensional States and Actions, 2010. ,
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation, Advances in Neural Information Processing Systems (NIPS), pp.1204-1212, 2009. ,
GQ(?) : A general gradient algorithm for temporal-difference prediction learning with eligibility traces, Conference on Artificial General Intelligence (AGI), 2010. ,
Toward Off-Policy Learning Control with Function Approximation, International Conference on Machine Learning (ICML), 2010. ,
Boosting algorithms as gradient descent in function space, Neural Information Processing Systems (NIPS), 1999. ,
Boosting algorithms as gradient descent in function space, Neural Information Processing Systems (NIPS, 1999. ,
Error bounds for approximate policy iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003. ,
Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007. ,
DOI : 10.1137/040614384
URL : http://hal.archives-ouvertes.fr/docs/00/12/46/85/PDF/avi_siam_final.pdf
Finite-time bounds for fitted value iteration, The Journal of Machine Learning Research (JMLR), vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Inverse Reinforcement Learning in Relational Domains, International Joint Conferences on Artificial Intelligence (IJCAI), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01154650
Least Squares Policy Evaluation Algorithms with Linear Function Approximation. Discrete Event Dynamic Systems : Theory and Applications, pp.79-110, 2003. ,
Training parsers by inverse reinforcement learning, Machine Learning, vol.285, issue.5, pp.303-337, 2009. ,
DOI : 10.1017/CBO9780511546921
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5110-1.pdf
Policy invariance under reward transformations : Theory and application to reward shaping, International Conference on Machine Learning (ICML), pp.278-287, 1999. ,
Algorithms for inverse reinforcement learning, International Conference on Machine Learning (ICML), pp.663-670, 2000. ,
Laugh-aware virtual agent and its impact on user amusement, International Conference on Autonomous Agents and Multiagent Systems, p.2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00869751
Kernel-based reinforcement learning, Machine Learning, vol.49, issue.2/3, pp.161-178, 2002. ,
DOI : 10.1023/A:1017928328829
Filtering of pathological ventricular rhythms during MRI scanning, International Workshop on Biosignal Interpretation, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00749457
Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes, International Conference on Machine Learning (ICML), pp.871-878, 2010. ,
A framework for unsupervised learning of dialogue strategies, 2004. ,
Consistent goal-directed user model for realisitc man-machine taskoriented spoken dialogue simulation, IEEE International Conference on Multimedia and Expo, pp.425-428, 2006. ,
DOI : 10.1109/icme.2006.262563
URL : http://hal.archives-ouvertes.fr/docs/00/21/59/68/PDF/icme-pietquin.pdf
Optimization of a Tutoring System from a Fixed Set of Data, ISCA workshop on Speech and Language Technology in Education, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652324
A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006. ,
DOI : 10.1109/TSA.2005.855836
URL : https://hal.archives-ouvertes.fr/hal-00207952
Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences, International Joint Conference on Artificial Intelligence (IJCAI 2011), pp.1878-1883, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00618252
A survey on metrics for the evaluation of user simulations. The knowledge engineering review, pp.59-73, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00771654
Apprentissage hors-ligne avec Démonstrations Expertes, Thèse de Doctorat en Informatique, 2014. ,
Apprentissage par démonstrations : vaut-il la peine d'estimer une fonction de récompense ?, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), 2013. ,
Classification régularisée par la récompense pour l'Apprentissage par Imitation, Journées Francophones de Plannification , Décision et Apprentissage (JFPDA), 2013. ,
Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), pp.17-32, 2013. ,
DOI : 10.1007/978-3-642-40988-2_2
URL : https://hal.archives-ouvertes.fr/hal-00916938
Boosted and Reward-regularized Classification for Apprenticeship Learning, 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01107837
Boosted Bellman Residual Minimization Handling Expert Demonstrations, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) ,
DOI : 10.1007/978-3-662-44851-9_35
URL : https://hal.archives-ouvertes.fr/hal-01060953
Difference of Convex Functions Programming for Reinforcement Learning, Advances in Neural Information Processing Systems, p.2014, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01104419
Méthode de minimisation du résidu de Bellman boostée qui tient compte des démonstrations expertes, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), 2014. ,
Imitation Learning Applied to Embodied Conversational Agents, Machine Learning and Interactive Systems (MLIS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01225816
Predicting when to laugh with structured classification, Annual Conference of the International Speech Communication Association (InterSpeech), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01104739
Statistical linear estimation with penalized estimators : an application to reinforcement learning, International Conference on Machine Learning (ICML), pp.1535-1542, 2012. ,
Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994. ,
Modified policy iteration algorithms for discounted markov decision problems, Management Science, vol.24, issue.11, pp.1127-1137, 1978. ,
Bootstrapping reinforcement learning-based dialogue strategies from wizard-of-oz data, 2008. ,
Stochastic Simulation, 1987. ,
DOI : 10.1002/9780470316726
Online Q-Learning using Connectionist Systems, 1994. ,
Learning agents for uncertain environments, Conference on Computational Learning Theory (COLT), pp.101-103, 1998. ,
DOI : 10.1145/279943.279964
URL : http://www.eecs.berkeley.edu/~russell/papers/colt98-uncertainty.pdf
Boosting : Foundations and algorithms, 2012. ,
A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The knowledge engineering review, pp.97-126, 2006. ,
Effects of the user model on simulation-based learning of dialogue strategies, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pp.220-225, 2005. ,
DOI : 10.1109/ASRU.2005.1566539
Approximate Policy Iteration Schemes : A Comparison, International Conference on Machine Learning (ICML), pp.1314-1322, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00989982
Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00697169
Approximations de l'algorithme Itérations sur les Politiques Modifié, Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite des systèmes (JFPDA), 2012. ,
Recursive Least-Squares Learning with Eligibility Traces, Lecture Notes in Computer Science (LNCS), 2011. ,
DOI : 10.1007/978-3-642-29946-9_14
URL : https://hal.archives-ouvertes.fr/hal-00644511
Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2014. ,
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079
Quand l'optimalité locale implique une garantie globale : recherche locale de politique dans un espace convexe et algorithme d'itéation sur les politiques conservatif vu comme une montée de gradient, Journées Francophones de Plannification, Décision et Apprentissage (JFPDA), 2014. ,
DOI : 10.3166/ria.29.685-704
Approximate Modified Policy Iteration and its Application to the Game of Tetris, Journal of Machine Learning Research, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01091341
On the use of non-stationary policies for stationary infinite-horizon markov decision processes, Advances in Neural Information Processing Systems (NIPS), pp.1826-1834, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Markov decision processes in artificial intelligence, 2013. ,
DOI : 10.1002/9781118557426
URL : https://hal.archives-ouvertes.fr/inria-00432735
Optimal State Estimation : Kalman, H Infinity, and Nonlinear Approaches, 2006. ,
DOI : 10.1002/0470045345
Hilbert space embeddings of conditional distributions with applications to dynamical systems, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.961-968, 2009. ,
DOI : 10.1145/1553374.1553497
Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,
DOI : 10.3758/BF03205056
Reinforcement Learning : An Introduction, 1998. ,
Fast gradient-descent methods for temporaldifference learning with linear function approximation, International Conference on Machine Learning (ICML), pp.993-1000, 2009. ,
DOI : 10.1145/1553374.1553501
URL : http://webdocs.cs.ualberta.ca/~sutton/papers/gradTD1.pdf
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. arXiv preprint, 2015. ,
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999. ,
A game-theoretic approach to apprenticeship learning, Advances in Neural Information Processing Systems (NIPS), pp.1449-1456, 2007. ,
A reduction from apprenticeship learning to classification, Advances in Neural Information Processing Systems (NIPS), pp.2253-2261, 2010. ,
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, pp.1-103, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Reinforcement Learning with Echo State Networks, International Conference on Artificial Neural Networks (ICANN), 2006. ,
DOI : 10.1007/11840817_86
On the Rate of Convergence and Error Bounds for LSTD(?), International Conference on Machine Learning (ICML), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01186667
Convex analysis approach to dc programming : Theory, algorithms and applications, Acta Mathematica Vietnamica, vol.22, issue.1, pp.289-355, 1997. ,
The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, vol.133, pp.1-423, 2005. ,
Learning structured prediction models, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.896-903, 2005. ,
DOI : 10.1145/1102351.1102464
Value function approximation in noisy environments using locally smoothed regularized approximate linear programs, Uncertainty in Artificial Intelligence (UAI), 2012. ,
High confidence off-policy evaluation, Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015. ,
Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society, vol.58, issue.1, pp.267-288, 1996. ,
DOI : 10.1111/j.1467-9868.2011.00771.x
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874
Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, 2004. ,
Statistical learning theory, 1998. ,
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management, Computer Speech & Language, vol.24, issue.2, pp.150-174, 2010. ,
DOI : 10.1016/j.csl.2009.04.001
URL : https://hal.archives-ouvertes.fr/hal-00598186
The Hidden Information State Approach to Dialog Management, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007. ,
DOI : 10.1109/ICASSP.2007.367185
Convergence of least-squares temporal difference methods under general conditions, International Conference on Machine Learning (ICML), 2010. ,
DOI : 10.1137/100807879
Q-Learning Algorithms for Optimal Stopping Based on Least Squares, European Control Conference, 2007. ,
Maximum Entropy Inverse Reinforcement Learning, AAAI Conference on Artificial Intelligence, pp.1433-1438, 2008. ,
Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005. ,
DOI : 10.1073/pnas.201162998