INFORMS Journal on Computing
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


INFORMS JOURNAL ON COMPUTING
Vol. 20, No. 2, Spring 2008, pp. 288-301
DOI: 10.1287/ijoc.1070.0240
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kunnumkal, S.
Right arrow Articles by Topaloglu, H.
Right arrow Search for Related Content

Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

Sumit Kunnumkal, Huseyin Topaloglu

Indian School of Business, Gachibowli, Hyderabad 500032, India
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

sumit_kunnumkal{at}isb.edu
topaloglu{at}orie.cornell.edu

This paper shows how to exploit the structural properties of the underlying Markov decision problem to improve the convergence behavior of the Q-learning algorithm. In particular, we consider infinite-horizon discounted-cost Markov decision problems where there is a natural ordering between the states of the system and the value function is known to be monotone in the state. We propose a new variant of the Q-learning algorithm that ensures that the value function approximations obtained during the intermediate iterations are also monotone in the state. We establish the convergence of the proposed algorithm and experimentally show that it significantly improves the convergence behavior of the standard version of the Q-learning algorithm.

Key words: Markov decision processes; Q-learning; stochastic approximation methods
History: received July 2005; revised May 2007; accepted September 2007.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by INFORMS.