Abstract: 
We study the complexity of central controller synthesis problems for finitestate Markov decision processes, where the objective is to optimize both the expected meanpayoff performance of the system and its stability. e argue that the basic theoretical notion of expressing the stability in terms of the variance of the meanpayoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own meanpayoff and from the expected meanpayoff, respectively. We show that a strategy ensuring both the expected meanpayoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudopolynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomialtime approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected meanpayoff together with zero variance. Here we show that the problems can be all solved in polynomial time.
