Barrier function inspired reward shaping in reinforcement learning (Record no. 432354)
[ view plain ]
000 -LEADER | |
---|---|
fixed length control field | 02467nam a2200217 4500 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
fixed length control field | 240719b |||||||| |||| 00| 0 eng d |
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
Classification number | 006.31 |
Item number | RAN |
100 ## - MAIN ENTRY--PERSONAL NAME | |
Personal name | Ranjan, Abhishek. |
245 ## - TITLE STATEMENT | |
Title | Barrier function inspired reward shaping in reinforcement learning |
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) | |
Place of publication, distribution, etc | Bangalore : |
Name of publisher, distributor, etc | Indian Institute of Science, |
Date of publication, distribution, etc | 2024. |
300 ## - PHYSICAL DESCRIPTION | |
Extent | xi, 56 p. : |
Other physical details | col. ill. |
Accompanying material | e- Thesis. |
Size of unit | 20.64 Mb. |
500 ## - GENERAL NOTE | |
General note | Includes bibliography. |
502 ## - DISSERTATION NOTE | |
Dissertation note | MSc(Res);2024;Computer Science and Automation. |
520 ## - SUMMARY, ETC. | |
Summary, etc | Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps, which in the real world limits the practicality of these algorithms as this can lead to potentially dangerous behaviour. Hence, safe exploration is a critical issue when applying RL algorithms in the real world. Although RL excels in solving these challenging problems, the time required for convergence during training remains a significant limitation. Various techniques have been proposed to mitigate this issue, and reward shaping has emerged as a popular solution. However, most existing reward-shaping methods rely on value functions, which can pose scalability challenges as the environment’s complexity grows. Our research proposes a novel framework for reward shaping inspired by Barrier Functions, which is safety-oriented, intuitive, and easy to implement for any environment or task. To evaluate the effectiveness of our proposed reward formulations, we present our results on a challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym. We have conducted experiments on various environments, including CartPole, Half-Cheetah, Ant, and Humanoid. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. Moreover, our formulation has a theoretical basis for safety, which is crucial for real-world applications. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework. |
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Reinforcement learning |
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Robotics |
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
Topical term or geographic name as entry element | Barrier function |
700 ## - ADDED ENTRY--PERSONAL NAME | |
Personal name | Advised by Kolathaya, Shishir N Y. |
856 ## - ELECTRONIC LOCATION AND ACCESS | |
Uniform Resource Identifier | https://etd.iisc.ac.in/handle/2005/6558 |
942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
Koha item type | Thesis |
No items available.