|
|
||||||||
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, New York 11794, USA
This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
Robert H. Smith School of Business and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
jqhu{at}ams.sunysb.edu
mfu{at}rhsmith.umd.edu
rvahid{at}isr.umd.edu
marcus{at}umd.edu
Key words: dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues
History: received April 2004;
revised June 2005;
accepted June 2005.
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |