RLR 13 - Policy-based Methods

RLR 13 - Policy-based Methods

14 Lượt nghe
RLR 13 - Policy-based Methods
This session is dedicated to the policy-based methods of RL. These methods are well suited for the continuous domain actions, making it impossible to represent in value-based approach. Here the policy is an objective function of the parameters of the system. The goal is to find the values of the parameters such that the objective function is at its maximum. The gradient of the objective function with respect to the parameters will indicate the direction in which the parameters to be changed such that the objective function is increasing.