Optimal Models with Maximizing the Probability of First Achieving Target Value in the Preceding Stages

Publisher:
Zhongguo Kexue Zazhishe
Publication Type:
Journal Article
Citation:
Science In China Series A, 2003, 46 (3), pp. 396 - 414
Issue Date:
2003-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2008003832OK.pdf397.13 kB
Adobe PDF
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of ?-optimal policy for finite state space. We give an example for the reliability of the satellite systems using the above theory. Finally, we extend these results to more general cases.
Please use this identifier to cite or link to this item: