# Stochastic optimization and its application in reinforcement learning

Material type: BookLanguage: en Publication details: Bengaluru : Indian institute of science, 2022.Description: ix, 72p. e-Thesis ill. ; 29.1 cm * 20.5 cm 514.3KbDissertation: MTech (Res); 2022; Computer science and automationSubject(s): DDC classification:- 600 AKA

Item type | Current library | Call number | Status | Date due | Barcode |
---|---|---|---|---|---|

E-BOOKS | JRD Tata Memorial Library | 600 AKA (Browse shelf(Opens below)) | Available | ET00100 |

Include bibliogrphical references and index

MTech (Res); 2022; Computer science and automation

Numerous engineering fields, such as transportation systems, manufacturing, communication networks, healthcare, and finance, frequently encounter problems requiring optimization in the presence of uncertainty. Simulation-based optimization is a workable substitute for accurate analytical solutions because of the numerous input variables and the need for a system model. Smoothed functional (SF) algorithms belong to the class of simultaneous perturbation methods that have been found useful for stochastic optimization problems, particularly in high-dimensional parameter spaces. SF methods update the gradient of the objective using function measurements involving parameters that are perturbed simultaneously along all component directions. \cite{katkul} originally developed the SF gradient procedure. This results in the objective function getting smoothed because of the convolution. The objective function smoothing that results from the convolution with a smoothing density function can help the algorithm converge to a global minimum or a point close to it. First, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the $\delta$ sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $\epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings, and we further validate its performance over convex (noisy) objectives. Next, we consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. We propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms towards finding an $\epsilon$-SOSP is $\mathcal{O}(\epsilon^{-3.5})$, and this is a significant improvement over the previous state-of-the-art sample complexity of $\mathcal{O}(\epsilon^{-4.5})$.

There are no comments on this title.