Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

10.788 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

Proximal Policy Optimization (PPO) is one of the most popular reinforcement learning algorithms, and works with a variety of domains from robotics control to Atari games to chip design

In this video, we dive deep into 8 implementation details for continuous action spaces and build from the PPO implementation from our first video (https://youtu.be/MEt6rrxH8W4). 

---

Source code: https://github.com/vwxyzjn/ppo-implementation-details
Related blog post: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Background music: Flutes Will Chill — https://artlist.io/song/48722/flutes-will-chill
Homework solution: https://wandb.ai/cleanrl/cleanrl.benchmark/runs/34pstq7f/code?workspace=user-costa-huang

---

0:00 Introduction
0:41 Setup
1:30 1. Continuous actions via normal distributions
2:46 2. State-independent log standard deviation
3:50 3. Independent action components
4:37 Note on MultiDiscrete action space
5:36 Match hyperparameters
6:14 Environment preprocessing
6:33 4. Action clipped to the valid range
7:02 5. Observation normalization
7:54 6. Observation clipping
8:10 7. Reward normalization
9:00 8. Reward clipping
9:29 Experiment results
10:49 Related work
11:10 Summary of code change
11:58 Homework					

Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

Nhạc Theo Chủ Đề

Liên kết website

Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

Những bài liên quan

Chưa có bài liên quan nào!