BACK to VOLUME 34 NO.2
BACK to VOLUME 34 NO.2
Abstract:
We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),$ $t=0,1,\ldots$ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
AMS: 90C;
BACK to VOLUME 34 NO.2