Barrier function inspired reward shaping in reinforcement learning (Record no. 432354)

MARC details
000 -LEADER
fixed length control field 02467nam a2200217 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 240719b |||||||| |||| 00| 0 eng d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.31
Item number RAN
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Ranjan, Abhishek.
245 ## - TITLE STATEMENT
Title Barrier function inspired reward shaping in reinforcement learning
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Bangalore :
Name of publisher, distributor, etc Indian Institute of Science,
Date of publication, distribution, etc 2024.
300 ## - PHYSICAL DESCRIPTION
Extent xi, 56 p. :
Other physical details col. ill.
Accompanying material e- Thesis.
Size of unit 20.64 Mb.
500 ## - GENERAL NOTE
General note Includes bibliography.
502 ## - DISSERTATION NOTE
Dissertation note MSc(Res);2024;Computer Science and Automation.
520 ## - SUMMARY, ETC.
Summary, etc Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps, which in the real world limits the practicality of these algorithms as this can lead to potentially dangerous behaviour. Hence, safe exploration is a critical issue when applying RL algorithms in the real world. Although RL excels in solving these challenging problems, the time required for convergence during training remains a significant limitation. Various techniques have been proposed to mitigate this issue, and reward shaping has emerged as a popular solution. However, most existing reward-shaping methods rely on value functions, which can pose scalability challenges as the environment’s complexity grows. Our research proposes a novel framework for reward shaping inspired by Barrier Functions, which is safety-oriented, intuitive, and easy to implement for any environment or task. To evaluate the effectiveness of our proposed reward formulations, we present our results on a challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym. We have conducted experiments on various environments, including CartPole, Half-Cheetah, Ant, and Humanoid. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. Moreover, our formulation has a theoretical basis for safety, which is crucial for real-world applications. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Reinforcement learning
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Robotics
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Barrier function
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Advised by Kolathaya, Shishir N Y.
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier https://etd.iisc.ac.in/handle/2005/6558
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type Thesis

No items available.

                                                                                                                                                                                                    Facebook    Twitter

                             Copyright © 2024. J.R.D. Tata Memorial Library, Indian Institute of Science, Bengaluru - 560012

                             Contact   Phone: +91 80 2293 2832