Deep RL Course documentation

Glossary

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Glossary

This is a community-created glossary. Contributions are welcome!

  • Deep Q-Learning: A value-based deep reinforcement learning algorithm that uses a deep neural network to approximate Q-values for actions in a given state. The goal of Deep Q-learning is to find the optimal policy that maximizes the expected cumulative reward by learning the action-values.

  • Value-based methods: Reinforcement Learning methods that estimate a value function as an intermediate step towards finding an optimal policy.

  • Policy-based methods: Reinforcement Learning methods that directly learn to approximate the optimal policy without learning a value function. In practice they output a probability distribution over actions.

    The benefits of using policy-gradient methods over value-based methods include:

    • simplicity of integration: no need to store action values;
    • ability to learn a stochastic policy: the agent explores the state space without always taking the same trajectory, and avoids the problem of perceptual aliasing;
    • effectiveness in high-dimensional and continuous action spaces; and
    • improved convergence properties.
  • Policy Gradient: A subset of policy-based methods where the objective is to maximize the performance of a parameterized policy using gradient ascent. The goal of a policy-gradient is to control the probability distribution of actions by tuning the policy such that good actions (that maximize the return) are sampled more frequently in the future.

  • Monte Carlo Reinforce: A policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter.

If you want to improve the course, you can open a Pull Request.

This glossary was made possible thanks to:

< > Update on GitHub