--- library_name: stable-baselines3 tags: - MountainCar-v0 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: MountainCar-v0 type: MountainCar-v0 metrics: - type: mean_reward value: -116.20 +/- 1.83 name: mean_reward verified: false --- # **PPO** Agent playing **MountainCar-v0** This is a trained model of a **PPO** agent playing **MountainCar-v0** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). # Model Details ```python - Model Name: ppo-MountainCar-v0 - Model Type: Proximal Policy Optimization (PPO) - Policy Architecture: MultiLayerPerceptron (MLPPolicy) - Environment: MountainCar-v0 ``` - Training Data: The model was trained using three consecutive training sessions: - First training session: Total timesteps = 1,000,000 - Second training session: Total timesteps = 500,000 - Third training session: Total timesteps = 500,000 # Model Parameters ```python - n_steps: 2048 - batch_size: 64 - n_epochs: 8 - gamma: 0.999 - gae_lambda: 0.95 - ent_coef: 0.01 - max_grad_norm: 0.5 - Verbose: Enabled (Verbose level = 1) ```