In this Chapter:
- Environment dynamics
- Stochastic processes with Markovian assumption
- Stochastic processes with Stationary assumption
- Policy Iteration
- Value Iteration
- Modified Policy Iteration
Aim of this chapter:
- Understand concepts of formal problems of finite Markov decision processes. Discuss the associative aspects of choosing different actions in different situations and understand Dynamic Programming with Policy Iteration and Value Iteration algorithms with examples.
**Update in Slide 27 and 29 minutes
47:00 and
48:00: the 'Q-value' should be replaced with 'state-value' in 'we calculate the Q-value sing the Bellman equation'**
**Update in the examples, the update equation will be in the form of 𝑉_𝜋 (𝑠)=∑(𝑠′∈𝑆)〖𝑝(𝑠′,𝑟|𝑠,𝜋(𝑠))[𝑟+𝛾𝑉_𝜋 (𝑠′ )], 'there are different forms of Bellman equation', rewards will be replaced by next states**