Author(s):

Brázdil, Tomáš; Brožek, Václav; Chatterjee, Krishnendu; Forejt, Vojtěch; Kučera, Antonín

Article Title: 
Markov decision processes with multiple longrun average objectives

Affiliation 
IST Austria 
Abstract: 
We study Markov decision processes (MDPs) with multiple limitaverage (or meanpayoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with κ limitaverage functions, in the expectation objective the goal is to maximize the expected limitaverage value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limitaverage value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limitaverage function, both randomization and memory are necessary for strategies even for εapproximation, and that finitememory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limitaverage function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finitememory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for εapproximation, for all ε > 0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the tradeoff curve (Pareto curve) can be εapproximated in time polynomial in the size of the MDP and 1/ε, and exponential in the number of limitaverage functions, for all ε > 0. Our analysis also reveals flaws in previous work for MDPs with multiple meanpayoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.

Keywords: 
Formal verification; Markov Decision Processes; Meanpayoff reward; Multiobjective optimisation

Journal Title:

Logical Methods in Computer Science

Volume: 
10

Issue 
1

ISSN:

18605974

Publisher:

International Federation of Computational Logic

Date Published:

20140214

Start Page: 
paper 13

Copyright Statement: 
CCBY

URL: 

DOI: 
10.2168/LMCS10(1:13)2014

Notes: 
T. Brazdil is supported by the Czech Science Foundation, grant No P202/12/P612. K. Chatterjee is supported by the Austrian Science Fund (FWF) Grant No P 23499N23; FWF NFN Grant No S11407N23 (RiSE); ERC Start grant (279307: Graph Games); Microsoft faculty fellows award. V. Forejt is supported by a Royal Society Newton Fellowship and EPSRC project EP/J012564/1.

Open access: 
yes (OA journal) 