RL CH3 - Markov Decision Processes (MDPs) and Dynamic Programming

3.674 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

RL CH3 - Markov Decision Processes (MDPs) and Dynamic Programming

In this Chapter:
 - Environment dynamics
 - Stochastic processes with Markovian assumption
 - Stochastic processes with Stationary assumption
 - Policy Iteration
 - Value  Iteration
 - Modified Policy Iteration
Aim of this chapter:
- Understand concepts of formal problems of finite Markov decision processes. Discuss the associative aspects of choosing different actions in different situations and understand Dynamic Programming with Policy Iteration and Value Iteration algorithms with examples.

**Update in  Slide 27 and 29 minutes 47:00 and 48:00: the 'Q-value' should be replaced with 'state-value' in 'we calculate the Q-value sing the Bellman equation'**
**Update in the examples, the update equation will be in the form of 𝑉_𝜋 (𝑠)=∑(𝑠′∈𝑆)〖𝑝(𝑠′,𝑟|𝑠,𝜋(𝑠))[𝑟+𝛾𝑉_𝜋 (𝑠′ )], 'there are different forms of Bellman equation', rewards will be replaced by next states**					

RL CH3 - Markov Decision Processes (MDPs) and Dynamic Programming

Nhạc Theo Chủ Đề

Liên kết website