TY - JOUR

T1 - On the complexity of partially observed Markov decision processes

AU - Burago, Dima

AU - De Rougemont, Michel

AU - Slissenko, Anatol

N1 - Funding Information:
* C’orrespondence address: UniversitC Paris-12, Bit.P3, Informatque. hl, Ave. du GCnCral de Gaulle, Y40 IO CrPtcil, France. E-mail: sliss@litp.ibp.fr. The research of this author was partially supported by DRET contract 91/1061.

PY - 1996/5/5

Y1 - 1996/5/5

N2 - In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilis-tic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.

AB - In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilis-tic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.

UR - http://www.scopus.com/inward/record.url?scp=0030570119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030570119&partnerID=8YFLogxK

U2 - 10.1016/0304-3975(95)00158-1

DO - 10.1016/0304-3975(95)00158-1

M3 - Article

AN - SCOPUS:0030570119

VL - 157

SP - 161

EP - 183

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

IS - 2

ER -