Bilevel Optimization: Stochastic Algorithms and Applications in Inverse Reinforcement Learning
Bilevel Optimization is a class of optimization problem that has two levels of nested optimization subproblems. It can be used to model applications arising in areas such as signal processing, machine learning, game theory, etc. In this talk, we will first discuss several recent works, including a few of our own works, which develop efficient stochastic algorithms for this class of problems. These works together provide a set of useful tools for the practitioners to customize for different application domains. In the second part of this talk, we will dive deeper into a recent application of the bilevel optimization – the inverse reinforcement learning (IRL) problem. The IRL aims to recover the structure of an agent’s reward (and sometimes the environment dynamics) that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Being able to learn accurate models of expertise (i.e., the reward model) from observation data has applications in safety-sensitive applications such as clinical decision making and autonomous driving. In this work, we propose a new formulation of the reward model estimation task, which happens to take the bilevel optimization form. We then propose a set of efficient algorithms to solve the IRL problem, and provide statistical and computational guarantees of performance for the associated reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.