Abstract

Partially Observable Markov Decision Process (POMDP) is a fundamental model for probabilistic planning in stochastic domains. More recently, constrained POMDP and chance-constrained POMDP extend the model allowing constraints to be specified on some aspects of the policy in addition to the objective function. Despite their expressive power, these models assume all actions take a fixed duration, which poses a limitation in modeling real-world planning problems. In this work, we propose a unified model for durative POMDP and its constrained extensions. First, we convert these extensions into an Integer Linear Programming (ILP) formulation, which can be solved using existing solvers in the ILP literature. Second, a heuristic search approach is provided that can efficiently prune the search space, guided by solving successive partial ILP programs. Third, we give a theoretical analysis of the problem: unlike short-horizon POMDPs, with policies of a constant depth, which can be solved in polynomial time, the constrained extensions are NP-Hard even with a planning horizon of two and non-negative rewards. To alleviate that, we propose a Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies in polynomial time. The FPTAS is among the best achievable in theory in terms of approximation ratio. Finally, evaluation results show that our approach is empirically superior to the state-of-the-art fixed-horizon chance-constrained POMDP solver.

Majid Khonji (2023). “Approximability and Efficient Algorithms for Constrained Fixed-Horizon POMDPs with Durative Actions.” The Journal of Artificial Intelligence (AIJ).