000 01935nam a22002417a 4500
008 230803b |||||||| |||| 00| 0 eng d
041 _aen
082 _a600
_bNAM
100 _aSaxena, Naman
245 _aAverage reward actor-critic with deterministic policy search
260 _aBangalore :
_bIISc ,
_c2023 .
300 _aviii, 143p.
_bcol. ill. ;
_c29.1 cm * 20.5 cm
_ee-Thesis
_g3.477Mb
500 _ainclude bibliographic reference and index
502 _aMTech (Res); 2023; Computer science and automation
520 _aThe average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
650 _aReinforcement Learning
650 _aActor-Critic Algorithm
650 _aStochastic Approximation
700 _aKolathaya, Shishir N Y advised
700 _aBhatnagar, Shalabh advised
856 _uhttps://etd.iisc.ac.in/handle/2005/6175
942 _cT
999 _c429608
_d429608