Glossary
This is a community-created glossary. Contributions are welcome!
Deep Q-Learning: A value-based deep reinforcement learning algorithm that uses a deep neural network to approximate Q-values for actions in a given state. The goal of Deep Q-learning is to find the optimal policy that maximizes the expected cumulative reward by learning the action-values.
Value-based methods: Reinforcement Learning methods that estimate a value function as an intermediate step towards finding an optimal policy.
Policy-based methods: Reinforcement Learning methods that directly learn to approximate the optimal policy without learning a value function. In practice they output a probability distribution over actions.
The benefits of using policy-gradient methods over value-based methods include:
- simplicity of integration: no need to store action values;
- ability to learn a stochastic policy: the agent explores the state space without always taking the same trajectory, and avoids the problem of perceptual aliasing;
- effectiveness in high-dimensional and continuous action spaces; and
- improved convergence properties.
Policy Gradient: A subset of policy-based methods where the objective is to maximize the performance of a parameterized policy using gradient ascent. The goal of a policy-gradient is to control the probability distribution of actions by tuning the policy such that good actions (that maximize the return) are sampled more frequently in the future.
Monte Carlo Reinforce: A policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter.
If you want to improve the course, you can open a Pull Request.
This glossary was made possible thanks to:
< > Update on GitHub