Francesco-A
commited on
Commit
•
2a24662
1
Parent(s):
b316424
Update README.md
Browse files![thumbnail.png](https://cdn-uploads.huggingface.co/production/uploads/6493577a357b252af725bf67/wO94FZwazqho096MpER93.png)
README.md
CHANGED
@@ -5,31 +5,95 @@ tags:
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- ML-Agents-Pyramids
|
|
|
8 |
---
|
9 |
|
10 |
# **ppo** Agent playing **Pyramids**
|
11 |
This is a trained model of a **ppo** agent playing **Pyramids**
|
12 |
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
13 |
|
14 |
-
|
15 |
-
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
https://huggingface.co/learn/deep-rl-course/unit5/introduction
|
22 |
|
23 |
### Resume the training
|
24 |
```bash
|
25 |
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
|
26 |
```
|
27 |
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- ML-Agents-Pyramids
|
8 |
+
license: apache-2.0
|
9 |
---
|
10 |
|
11 |
# **ppo** Agent playing **Pyramids**
|
12 |
This is a trained model of a **ppo** agent playing **Pyramids**
|
13 |
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
14 |
|
15 |
+
## Watch the Agent play
|
16 |
+
You can watch the agent playing directly in your browser
|
17 |
|
18 |
+
Go to https://huggingface.co/spaces/unity/ML-Agents-Pyramids
|
19 |
+
Step 1: Find the model_id: Francesco-A/ppo-Pyramids-v1
|
20 |
+
Step 2: Select the .nn /.onnx file
|
21 |
+
Click on Watch the agent play
|
|
|
22 |
|
23 |
### Resume the training
|
24 |
```bash
|
25 |
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
|
26 |
```
|
27 |
|
28 |
+
### Training hyperparameters
|
29 |
+
```python
|
30 |
+
behaviors:
|
31 |
+
Pyramids:
|
32 |
+
trainer_type: ppo
|
33 |
+
hyperparameters:
|
34 |
+
batch_size: 128
|
35 |
+
buffer_size: 2048
|
36 |
+
learning_rate: 0.0003
|
37 |
+
beta: 0.01
|
38 |
+
epsilon: 0.2
|
39 |
+
lambd: 0.95
|
40 |
+
num_epoch: 3
|
41 |
+
learning_rate_schedule: linear
|
42 |
+
network_settings:
|
43 |
+
normalize: false
|
44 |
+
hidden_units: 512
|
45 |
+
num_layers: 2
|
46 |
+
vis_encode_type: simple
|
47 |
+
reward_signals:
|
48 |
+
extrinsic:
|
49 |
+
gamma: 0.99
|
50 |
+
strength: 1.0
|
51 |
+
rnd:
|
52 |
+
gamma: 0.99
|
53 |
+
strength: 0.01
|
54 |
+
network_settings:
|
55 |
+
hidden_units: 64
|
56 |
+
num_layers: 3
|
57 |
+
learning_rate: 0.0001
|
58 |
+
keep_checkpoints: 5
|
59 |
+
max_steps: 1000000
|
60 |
+
time_horizon: 128
|
61 |
+
summary_freq: 30000
|
62 |
+
```
|
63 |
|
64 |
+
## Training details
|
65 |
+
| Step | Time Elapsed | Mean Reward | Std of Reward | Status |
|
66 |
+
|---------|--------------|-------------|---------------|-----------|
|
67 |
+
| 30000 | 59.481 s | -1.000 | 0.000 | Training |
|
68 |
+
| 60000 | 118.648 s | -0.798 | 0.661 | Training |
|
69 |
+
| 90000 | 180.684 s | -0.701 | 0.808 | Training |
|
70 |
+
| 120000 | 240.734 s | -0.931 | 0.373 | Training |
|
71 |
+
| 150000 | 300.978 s | -0.851 | 0.588 | Training |
|
72 |
+
| 180000 | 360.137 s | -0.934 | 0.361 | Training |
|
73 |
+
| 210000 | 424.326 s | -1.000 | 0.000 | Training |
|
74 |
+
| 240000 | 484.774 s | -0.849 | 0.595 | Training |
|
75 |
+
| 270000 | 546.089 s | -0.377 | 1.029 | Training |
|
76 |
+
| 300000 | 614.797 s | -0.735 | 0.689 | Training |
|
77 |
+
| 330000 | 684.241 s | -0.926 | 0.405 | Training |
|
78 |
+
| 360000 | 745.790 s | -0.819 | 0.676 | Training |
|
79 |
+
| 390000 | 812.573 s | -0.715 | 0.755 | Training |
|
80 |
+
| 420000 | 877.836 s | -0.781 | 0.683 | Training |
|
81 |
+
| 450000 | 944.423 s | -0.220 | 1.114 | Training |
|
82 |
+
| 480000 | 1010.918 s | -0.484 | 0.962 | Training |
|
83 |
+
| 510000 | 1074.058 s | -0.003 | 1.162 | Training |
|
84 |
+
| 540000 | 1138.848 s | -0.021 | 1.222 | Training |
|
85 |
+
| 570000 | 1204.326 s | 0.384 | 1.231 | Training |
|
86 |
+
| 600000 | 1276.488 s | 0.690 | 1.174 | Training |
|
87 |
+
| 630000 | 1345.297 s | 0.943 | 1.058 | Training |
|
88 |
+
| 660000 | 1412.791 s | 1.014 | 1.043 | Training |
|
89 |
+
| 690000 | 1482.712 s | 0.927 | 1.054 | Training |
|
90 |
+
| 720000 | 1548.726 s | 0.900 | 1.128 | Training |
|
91 |
+
| 750000 | 1618.284 s | 1.379 | 0.701 | Training |
|
92 |
+
| 780000 | 1692.080 s | 1.567 | 0.359 | Training |
|
93 |
+
| 810000 | 1762.159 s | 1.475 | 0.567 | Training |
|
94 |
+
| 840000 | 1832.166 s | 1.438 | 0.648 | Training |
|
95 |
+
| 870000 | 1907.191 s | 1.534 | 0.536 | Training |
|
96 |
+
| 900000 | 1977.521 s | 1.552 | 0.478 | Training |
|
97 |
+
| 930000 | 2051.259 s | 1.458 | 0.633 | Training |
|
98 |
+
| 960000 | 2126.498 s | 1.545 | 0.586 | Training |
|
99 |
+
| 990000 | 2198.591 s | 1.565 | 0.591 | Training |
|