Joy28 commited on
Commit
d626768
1 Parent(s): 5835a5a

End of training

Browse files
Files changed (3) hide show
  1. all_results.json +8 -0
  2. test_results.json +8 -0
  3. trainer_state.json +4278 -0
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 99.0,
3
+ "eval_accuracy": 0.7685185185185185,
4
+ "eval_loss": 0.7077056765556335,
5
+ "eval_runtime": 166.0464,
6
+ "eval_samples_per_second": 1.301,
7
+ "eval_steps_per_second": 0.163
8
+ }
test_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 99.0,
3
+ "eval_accuracy": 0.7685185185185185,
4
+ "eval_loss": 0.7077056765556335,
5
+ "eval_runtime": 166.0464,
6
+ "eval_samples_per_second": 1.301,
7
+ "eval_steps_per_second": 0.163
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,4278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.815668202764977,
3
+ "best_model_checkpoint": "videomae-base-finetuned-subset-100epochs/checkpoint-2352",
4
+ "epoch": 99.00108108108108,
5
+ "eval_steps": 500,
6
+ "global_step": 5550,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 9.00900900900901e-07,
14
+ "loss": 1.7041,
15
+ "step": 10
16
+ },
17
+ {
18
+ "epoch": 0.0,
19
+ "learning_rate": 1.801801801801802e-06,
20
+ "loss": 1.6707,
21
+ "step": 20
22
+ },
23
+ {
24
+ "epoch": 0.01,
25
+ "learning_rate": 2.702702702702703e-06,
26
+ "loss": 1.588,
27
+ "step": 30
28
+ },
29
+ {
30
+ "epoch": 0.01,
31
+ "learning_rate": 3.603603603603604e-06,
32
+ "loss": 1.6605,
33
+ "step": 40
34
+ },
35
+ {
36
+ "epoch": 0.01,
37
+ "learning_rate": 4.504504504504505e-06,
38
+ "loss": 1.6657,
39
+ "step": 50
40
+ },
41
+ {
42
+ "epoch": 0.01,
43
+ "eval_accuracy": 0.22580645161290322,
44
+ "eval_loss": 1.6248142719268799,
45
+ "eval_runtime": 177.2019,
46
+ "eval_samples_per_second": 1.225,
47
+ "eval_steps_per_second": 0.158,
48
+ "step": 56
49
+ },
50
+ {
51
+ "epoch": 1.0,
52
+ "learning_rate": 5.405405405405406e-06,
53
+ "loss": 1.6433,
54
+ "step": 60
55
+ },
56
+ {
57
+ "epoch": 1.0,
58
+ "learning_rate": 6.306306306306306e-06,
59
+ "loss": 1.6323,
60
+ "step": 70
61
+ },
62
+ {
63
+ "epoch": 1.0,
64
+ "learning_rate": 7.207207207207208e-06,
65
+ "loss": 1.6449,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 1.01,
70
+ "learning_rate": 8.108108108108109e-06,
71
+ "loss": 1.5961,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 1.01,
76
+ "learning_rate": 9.00900900900901e-06,
77
+ "loss": 1.6435,
78
+ "step": 100
79
+ },
80
+ {
81
+ "epoch": 1.01,
82
+ "learning_rate": 9.90990990990991e-06,
83
+ "loss": 1.6109,
84
+ "step": 110
85
+ },
86
+ {
87
+ "epoch": 1.01,
88
+ "eval_accuracy": 0.391705069124424,
89
+ "eval_loss": 1.5600569248199463,
90
+ "eval_runtime": 170.9412,
91
+ "eval_samples_per_second": 1.269,
92
+ "eval_steps_per_second": 0.164,
93
+ "step": 112
94
+ },
95
+ {
96
+ "epoch": 2.0,
97
+ "learning_rate": 1.0810810810810812e-05,
98
+ "loss": 1.5845,
99
+ "step": 120
100
+ },
101
+ {
102
+ "epoch": 2.0,
103
+ "learning_rate": 1.1711711711711713e-05,
104
+ "loss": 1.6078,
105
+ "step": 130
106
+ },
107
+ {
108
+ "epoch": 2.01,
109
+ "learning_rate": 1.2612612612612611e-05,
110
+ "loss": 1.6136,
111
+ "step": 140
112
+ },
113
+ {
114
+ "epoch": 2.01,
115
+ "learning_rate": 1.3513513513513515e-05,
116
+ "loss": 1.5919,
117
+ "step": 150
118
+ },
119
+ {
120
+ "epoch": 2.01,
121
+ "learning_rate": 1.4414414414414416e-05,
122
+ "loss": 1.5669,
123
+ "step": 160
124
+ },
125
+ {
126
+ "epoch": 2.01,
127
+ "eval_accuracy": 0.37327188940092165,
128
+ "eval_loss": 1.5562978982925415,
129
+ "eval_runtime": 172.5259,
130
+ "eval_samples_per_second": 1.258,
131
+ "eval_steps_per_second": 0.162,
132
+ "step": 168
133
+ },
134
+ {
135
+ "epoch": 3.0,
136
+ "learning_rate": 1.5315315315315316e-05,
137
+ "loss": 1.5883,
138
+ "step": 170
139
+ },
140
+ {
141
+ "epoch": 3.0,
142
+ "learning_rate": 1.6216216216216218e-05,
143
+ "loss": 1.5374,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 3.0,
148
+ "learning_rate": 1.7117117117117117e-05,
149
+ "loss": 1.5314,
150
+ "step": 190
151
+ },
152
+ {
153
+ "epoch": 3.01,
154
+ "learning_rate": 1.801801801801802e-05,
155
+ "loss": 1.4725,
156
+ "step": 200
157
+ },
158
+ {
159
+ "epoch": 3.01,
160
+ "learning_rate": 1.891891891891892e-05,
161
+ "loss": 1.4064,
162
+ "step": 210
163
+ },
164
+ {
165
+ "epoch": 3.01,
166
+ "learning_rate": 1.981981981981982e-05,
167
+ "loss": 1.45,
168
+ "step": 220
169
+ },
170
+ {
171
+ "epoch": 3.01,
172
+ "eval_accuracy": 0.5990783410138248,
173
+ "eval_loss": 1.0987794399261475,
174
+ "eval_runtime": 176.949,
175
+ "eval_samples_per_second": 1.226,
176
+ "eval_steps_per_second": 0.158,
177
+ "step": 224
178
+ },
179
+ {
180
+ "epoch": 4.0,
181
+ "learning_rate": 2.0720720720720722e-05,
182
+ "loss": 1.4079,
183
+ "step": 230
184
+ },
185
+ {
186
+ "epoch": 4.0,
187
+ "learning_rate": 2.1621621621621624e-05,
188
+ "loss": 1.2881,
189
+ "step": 240
190
+ },
191
+ {
192
+ "epoch": 4.0,
193
+ "learning_rate": 2.2522522522522523e-05,
194
+ "loss": 1.2991,
195
+ "step": 250
196
+ },
197
+ {
198
+ "epoch": 4.01,
199
+ "learning_rate": 2.3423423423423425e-05,
200
+ "loss": 1.2036,
201
+ "step": 260
202
+ },
203
+ {
204
+ "epoch": 4.01,
205
+ "learning_rate": 2.4324324324324327e-05,
206
+ "loss": 1.1179,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 4.01,
211
+ "learning_rate": 2.5225225225225222e-05,
212
+ "loss": 1.1208,
213
+ "step": 280
214
+ },
215
+ {
216
+ "epoch": 4.01,
217
+ "eval_accuracy": 0.5714285714285714,
218
+ "eval_loss": 1.2278647422790527,
219
+ "eval_runtime": 175.1133,
220
+ "eval_samples_per_second": 1.239,
221
+ "eval_steps_per_second": 0.16,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 5.0,
226
+ "learning_rate": 2.6126126126126128e-05,
227
+ "loss": 1.2803,
228
+ "step": 290
229
+ },
230
+ {
231
+ "epoch": 5.0,
232
+ "learning_rate": 2.702702702702703e-05,
233
+ "loss": 1.2979,
234
+ "step": 300
235
+ },
236
+ {
237
+ "epoch": 5.01,
238
+ "learning_rate": 2.7927927927927926e-05,
239
+ "loss": 1.2177,
240
+ "step": 310
241
+ },
242
+ {
243
+ "epoch": 5.01,
244
+ "learning_rate": 2.882882882882883e-05,
245
+ "loss": 1.1382,
246
+ "step": 320
247
+ },
248
+ {
249
+ "epoch": 5.01,
250
+ "learning_rate": 2.9729729729729733e-05,
251
+ "loss": 1.1588,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 5.01,
256
+ "eval_accuracy": 0.7096774193548387,
257
+ "eval_loss": 0.8424109816551208,
258
+ "eval_runtime": 164.0794,
259
+ "eval_samples_per_second": 1.323,
260
+ "eval_steps_per_second": 0.171,
261
+ "step": 336
262
+ },
263
+ {
264
+ "epoch": 6.0,
265
+ "learning_rate": 3.063063063063063e-05,
266
+ "loss": 1.2975,
267
+ "step": 340
268
+ },
269
+ {
270
+ "epoch": 6.0,
271
+ "learning_rate": 3.153153153153153e-05,
272
+ "loss": 1.1493,
273
+ "step": 350
274
+ },
275
+ {
276
+ "epoch": 6.0,
277
+ "learning_rate": 3.2432432432432436e-05,
278
+ "loss": 1.0285,
279
+ "step": 360
280
+ },
281
+ {
282
+ "epoch": 6.01,
283
+ "learning_rate": 3.3333333333333335e-05,
284
+ "loss": 0.9137,
285
+ "step": 370
286
+ },
287
+ {
288
+ "epoch": 6.01,
289
+ "learning_rate": 3.4234234234234234e-05,
290
+ "loss": 0.9867,
291
+ "step": 380
292
+ },
293
+ {
294
+ "epoch": 6.01,
295
+ "learning_rate": 3.513513513513514e-05,
296
+ "loss": 1.0834,
297
+ "step": 390
298
+ },
299
+ {
300
+ "epoch": 6.01,
301
+ "eval_accuracy": 0.5345622119815668,
302
+ "eval_loss": 1.103454351425171,
303
+ "eval_runtime": 178.4343,
304
+ "eval_samples_per_second": 1.216,
305
+ "eval_steps_per_second": 0.157,
306
+ "step": 392
307
+ },
308
+ {
309
+ "epoch": 7.0,
310
+ "learning_rate": 3.603603603603604e-05,
311
+ "loss": 1.0298,
312
+ "step": 400
313
+ },
314
+ {
315
+ "epoch": 7.0,
316
+ "learning_rate": 3.693693693693694e-05,
317
+ "loss": 1.0116,
318
+ "step": 410
319
+ },
320
+ {
321
+ "epoch": 7.01,
322
+ "learning_rate": 3.783783783783784e-05,
323
+ "loss": 0.9508,
324
+ "step": 420
325
+ },
326
+ {
327
+ "epoch": 7.01,
328
+ "learning_rate": 3.873873873873874e-05,
329
+ "loss": 1.2421,
330
+ "step": 430
331
+ },
332
+ {
333
+ "epoch": 7.01,
334
+ "learning_rate": 3.963963963963964e-05,
335
+ "loss": 1.2194,
336
+ "step": 440
337
+ },
338
+ {
339
+ "epoch": 7.01,
340
+ "eval_accuracy": 0.4838709677419355,
341
+ "eval_loss": 1.0748963356018066,
342
+ "eval_runtime": 175.7888,
343
+ "eval_samples_per_second": 1.234,
344
+ "eval_steps_per_second": 0.159,
345
+ "step": 448
346
+ },
347
+ {
348
+ "epoch": 8.0,
349
+ "learning_rate": 4.0540540540540545e-05,
350
+ "loss": 1.1793,
351
+ "step": 450
352
+ },
353
+ {
354
+ "epoch": 8.0,
355
+ "learning_rate": 4.1441441441441444e-05,
356
+ "loss": 1.0139,
357
+ "step": 460
358
+ },
359
+ {
360
+ "epoch": 8.0,
361
+ "learning_rate": 4.234234234234234e-05,
362
+ "loss": 1.0531,
363
+ "step": 470
364
+ },
365
+ {
366
+ "epoch": 8.01,
367
+ "learning_rate": 4.324324324324325e-05,
368
+ "loss": 1.098,
369
+ "step": 480
370
+ },
371
+ {
372
+ "epoch": 8.01,
373
+ "learning_rate": 4.414414414414415e-05,
374
+ "loss": 0.8256,
375
+ "step": 490
376
+ },
377
+ {
378
+ "epoch": 8.01,
379
+ "learning_rate": 4.5045045045045046e-05,
380
+ "loss": 0.8462,
381
+ "step": 500
382
+ },
383
+ {
384
+ "epoch": 8.01,
385
+ "eval_accuracy": 0.6405529953917051,
386
+ "eval_loss": 0.8755273222923279,
387
+ "eval_runtime": 175.5514,
388
+ "eval_samples_per_second": 1.236,
389
+ "eval_steps_per_second": 0.159,
390
+ "step": 504
391
+ },
392
+ {
393
+ "epoch": 9.0,
394
+ "learning_rate": 4.594594594594595e-05,
395
+ "loss": 1.0328,
396
+ "step": 510
397
+ },
398
+ {
399
+ "epoch": 9.0,
400
+ "learning_rate": 4.684684684684685e-05,
401
+ "loss": 1.1518,
402
+ "step": 520
403
+ },
404
+ {
405
+ "epoch": 9.0,
406
+ "learning_rate": 4.774774774774775e-05,
407
+ "loss": 1.0959,
408
+ "step": 530
409
+ },
410
+ {
411
+ "epoch": 9.01,
412
+ "learning_rate": 4.8648648648648654e-05,
413
+ "loss": 0.9388,
414
+ "step": 540
415
+ },
416
+ {
417
+ "epoch": 9.01,
418
+ "learning_rate": 4.954954954954955e-05,
419
+ "loss": 0.937,
420
+ "step": 550
421
+ },
422
+ {
423
+ "epoch": 9.01,
424
+ "learning_rate": 4.994994994994995e-05,
425
+ "loss": 1.058,
426
+ "step": 560
427
+ },
428
+ {
429
+ "epoch": 9.01,
430
+ "eval_accuracy": 0.6497695852534562,
431
+ "eval_loss": 0.9025103449821472,
432
+ "eval_runtime": 177.1283,
433
+ "eval_samples_per_second": 1.225,
434
+ "eval_steps_per_second": 0.158,
435
+ "step": 560
436
+ },
437
+ {
438
+ "epoch": 10.0,
439
+ "learning_rate": 4.984984984984985e-05,
440
+ "loss": 0.8948,
441
+ "step": 570
442
+ },
443
+ {
444
+ "epoch": 10.0,
445
+ "learning_rate": 4.974974974974975e-05,
446
+ "loss": 0.9187,
447
+ "step": 580
448
+ },
449
+ {
450
+ "epoch": 10.01,
451
+ "learning_rate": 4.964964964964965e-05,
452
+ "loss": 0.8642,
453
+ "step": 590
454
+ },
455
+ {
456
+ "epoch": 10.01,
457
+ "learning_rate": 4.954954954954955e-05,
458
+ "loss": 0.8694,
459
+ "step": 600
460
+ },
461
+ {
462
+ "epoch": 10.01,
463
+ "learning_rate": 4.944944944944945e-05,
464
+ "loss": 1.0163,
465
+ "step": 610
466
+ },
467
+ {
468
+ "epoch": 10.01,
469
+ "eval_accuracy": 0.4838709677419355,
470
+ "eval_loss": 1.2587543725967407,
471
+ "eval_runtime": 175.3167,
472
+ "eval_samples_per_second": 1.238,
473
+ "eval_steps_per_second": 0.16,
474
+ "step": 616
475
+ },
476
+ {
477
+ "epoch": 11.0,
478
+ "learning_rate": 4.9349349349349347e-05,
479
+ "loss": 1.0813,
480
+ "step": 620
481
+ },
482
+ {
483
+ "epoch": 11.0,
484
+ "learning_rate": 4.9249249249249253e-05,
485
+ "loss": 0.8843,
486
+ "step": 630
487
+ },
488
+ {
489
+ "epoch": 11.0,
490
+ "learning_rate": 4.9149149149149154e-05,
491
+ "loss": 0.9094,
492
+ "step": 640
493
+ },
494
+ {
495
+ "epoch": 11.01,
496
+ "learning_rate": 4.9049049049049054e-05,
497
+ "loss": 0.9094,
498
+ "step": 650
499
+ },
500
+ {
501
+ "epoch": 11.01,
502
+ "learning_rate": 4.8948948948948954e-05,
503
+ "loss": 0.7727,
504
+ "step": 660
505
+ },
506
+ {
507
+ "epoch": 11.01,
508
+ "learning_rate": 4.884884884884885e-05,
509
+ "loss": 1.0639,
510
+ "step": 670
511
+ },
512
+ {
513
+ "epoch": 11.01,
514
+ "eval_accuracy": 0.6359447004608295,
515
+ "eval_loss": 0.8927530646324158,
516
+ "eval_runtime": 173.189,
517
+ "eval_samples_per_second": 1.253,
518
+ "eval_steps_per_second": 0.162,
519
+ "step": 672
520
+ },
521
+ {
522
+ "epoch": 12.0,
523
+ "learning_rate": 4.8748748748748754e-05,
524
+ "loss": 1.0281,
525
+ "step": 680
526
+ },
527
+ {
528
+ "epoch": 12.0,
529
+ "learning_rate": 4.8648648648648654e-05,
530
+ "loss": 0.8524,
531
+ "step": 690
532
+ },
533
+ {
534
+ "epoch": 12.01,
535
+ "learning_rate": 4.854854854854855e-05,
536
+ "loss": 0.963,
537
+ "step": 700
538
+ },
539
+ {
540
+ "epoch": 12.01,
541
+ "learning_rate": 4.8448448448448455e-05,
542
+ "loss": 0.823,
543
+ "step": 710
544
+ },
545
+ {
546
+ "epoch": 12.01,
547
+ "learning_rate": 4.834834834834835e-05,
548
+ "loss": 0.9317,
549
+ "step": 720
550
+ },
551
+ {
552
+ "epoch": 12.01,
553
+ "eval_accuracy": 0.6221198156682027,
554
+ "eval_loss": 0.8824858069419861,
555
+ "eval_runtime": 171.7569,
556
+ "eval_samples_per_second": 1.263,
557
+ "eval_steps_per_second": 0.163,
558
+ "step": 728
559
+ },
560
+ {
561
+ "epoch": 13.0,
562
+ "learning_rate": 4.824824824824825e-05,
563
+ "loss": 0.8926,
564
+ "step": 730
565
+ },
566
+ {
567
+ "epoch": 13.0,
568
+ "learning_rate": 4.814814814814815e-05,
569
+ "loss": 0.8433,
570
+ "step": 740
571
+ },
572
+ {
573
+ "epoch": 13.0,
574
+ "learning_rate": 4.804804804804805e-05,
575
+ "loss": 1.0372,
576
+ "step": 750
577
+ },
578
+ {
579
+ "epoch": 13.01,
580
+ "learning_rate": 4.7947947947947955e-05,
581
+ "loss": 1.0262,
582
+ "step": 760
583
+ },
584
+ {
585
+ "epoch": 13.01,
586
+ "learning_rate": 4.784784784784785e-05,
587
+ "loss": 0.8883,
588
+ "step": 770
589
+ },
590
+ {
591
+ "epoch": 13.01,
592
+ "learning_rate": 4.774774774774775e-05,
593
+ "loss": 0.9038,
594
+ "step": 780
595
+ },
596
+ {
597
+ "epoch": 13.01,
598
+ "eval_accuracy": 0.5622119815668203,
599
+ "eval_loss": 0.8764956593513489,
600
+ "eval_runtime": 172.944,
601
+ "eval_samples_per_second": 1.255,
602
+ "eval_steps_per_second": 0.162,
603
+ "step": 784
604
+ },
605
+ {
606
+ "epoch": 14.0,
607
+ "learning_rate": 4.764764764764765e-05,
608
+ "loss": 0.9196,
609
+ "step": 790
610
+ },
611
+ {
612
+ "epoch": 14.0,
613
+ "learning_rate": 4.754754754754755e-05,
614
+ "loss": 1.0407,
615
+ "step": 800
616
+ },
617
+ {
618
+ "epoch": 14.0,
619
+ "learning_rate": 4.744744744744745e-05,
620
+ "loss": 0.9383,
621
+ "step": 810
622
+ },
623
+ {
624
+ "epoch": 14.01,
625
+ "learning_rate": 4.734734734734735e-05,
626
+ "loss": 0.8965,
627
+ "step": 820
628
+ },
629
+ {
630
+ "epoch": 14.01,
631
+ "learning_rate": 4.724724724724725e-05,
632
+ "loss": 0.9297,
633
+ "step": 830
634
+ },
635
+ {
636
+ "epoch": 14.01,
637
+ "learning_rate": 4.714714714714715e-05,
638
+ "loss": 0.9155,
639
+ "step": 840
640
+ },
641
+ {
642
+ "epoch": 14.01,
643
+ "eval_accuracy": 0.7004608294930875,
644
+ "eval_loss": 0.8431310653686523,
645
+ "eval_runtime": 172.9883,
646
+ "eval_samples_per_second": 1.254,
647
+ "eval_steps_per_second": 0.162,
648
+ "step": 840
649
+ },
650
+ {
651
+ "epoch": 15.0,
652
+ "learning_rate": 4.704704704704705e-05,
653
+ "loss": 0.9725,
654
+ "step": 850
655
+ },
656
+ {
657
+ "epoch": 15.0,
658
+ "learning_rate": 4.694694694694695e-05,
659
+ "loss": 0.8736,
660
+ "step": 860
661
+ },
662
+ {
663
+ "epoch": 15.01,
664
+ "learning_rate": 4.684684684684685e-05,
665
+ "loss": 0.8519,
666
+ "step": 870
667
+ },
668
+ {
669
+ "epoch": 15.01,
670
+ "learning_rate": 4.674674674674675e-05,
671
+ "loss": 0.7844,
672
+ "step": 880
673
+ },
674
+ {
675
+ "epoch": 15.01,
676
+ "learning_rate": 4.6646646646646644e-05,
677
+ "loss": 1.0731,
678
+ "step": 890
679
+ },
680
+ {
681
+ "epoch": 15.01,
682
+ "eval_accuracy": 0.7004608294930875,
683
+ "eval_loss": 0.8174832463264465,
684
+ "eval_runtime": 171.3687,
685
+ "eval_samples_per_second": 1.266,
686
+ "eval_steps_per_second": 0.163,
687
+ "step": 896
688
+ },
689
+ {
690
+ "epoch": 16.0,
691
+ "learning_rate": 4.654654654654655e-05,
692
+ "loss": 0.7801,
693
+ "step": 900
694
+ },
695
+ {
696
+ "epoch": 16.0,
697
+ "learning_rate": 4.644644644644645e-05,
698
+ "loss": 0.7508,
699
+ "step": 910
700
+ },
701
+ {
702
+ "epoch": 16.0,
703
+ "learning_rate": 4.634634634634635e-05,
704
+ "loss": 0.8745,
705
+ "step": 920
706
+ },
707
+ {
708
+ "epoch": 16.01,
709
+ "learning_rate": 4.624624624624625e-05,
710
+ "loss": 0.786,
711
+ "step": 930
712
+ },
713
+ {
714
+ "epoch": 16.01,
715
+ "learning_rate": 4.6146146146146144e-05,
716
+ "loss": 0.8919,
717
+ "step": 940
718
+ },
719
+ {
720
+ "epoch": 16.01,
721
+ "learning_rate": 4.604604604604605e-05,
722
+ "loss": 0.6864,
723
+ "step": 950
724
+ },
725
+ {
726
+ "epoch": 16.01,
727
+ "eval_accuracy": 0.5852534562211982,
728
+ "eval_loss": 1.059116005897522,
729
+ "eval_runtime": 172.8249,
730
+ "eval_samples_per_second": 1.256,
731
+ "eval_steps_per_second": 0.162,
732
+ "step": 952
733
+ },
734
+ {
735
+ "epoch": 17.0,
736
+ "learning_rate": 4.594594594594595e-05,
737
+ "loss": 0.8259,
738
+ "step": 960
739
+ },
740
+ {
741
+ "epoch": 17.0,
742
+ "learning_rate": 4.5845845845845845e-05,
743
+ "loss": 0.7213,
744
+ "step": 970
745
+ },
746
+ {
747
+ "epoch": 17.01,
748
+ "learning_rate": 4.574574574574575e-05,
749
+ "loss": 0.6611,
750
+ "step": 980
751
+ },
752
+ {
753
+ "epoch": 17.01,
754
+ "learning_rate": 4.5645645645645645e-05,
755
+ "loss": 0.8739,
756
+ "step": 990
757
+ },
758
+ {
759
+ "epoch": 17.01,
760
+ "learning_rate": 4.5545545545545545e-05,
761
+ "loss": 0.9537,
762
+ "step": 1000
763
+ },
764
+ {
765
+ "epoch": 17.01,
766
+ "eval_accuracy": 0.6221198156682027,
767
+ "eval_loss": 0.9702848792076111,
768
+ "eval_runtime": 172.8825,
769
+ "eval_samples_per_second": 1.255,
770
+ "eval_steps_per_second": 0.162,
771
+ "step": 1008
772
+ },
773
+ {
774
+ "epoch": 18.0,
775
+ "learning_rate": 4.544544544544545e-05,
776
+ "loss": 0.9157,
777
+ "step": 1010
778
+ },
779
+ {
780
+ "epoch": 18.0,
781
+ "learning_rate": 4.5345345345345345e-05,
782
+ "loss": 0.93,
783
+ "step": 1020
784
+ },
785
+ {
786
+ "epoch": 18.0,
787
+ "learning_rate": 4.524524524524525e-05,
788
+ "loss": 0.7541,
789
+ "step": 1030
790
+ },
791
+ {
792
+ "epoch": 18.01,
793
+ "learning_rate": 4.5145145145145146e-05,
794
+ "loss": 0.9246,
795
+ "step": 1040
796
+ },
797
+ {
798
+ "epoch": 18.01,
799
+ "learning_rate": 4.5045045045045046e-05,
800
+ "loss": 0.8477,
801
+ "step": 1050
802
+ },
803
+ {
804
+ "epoch": 18.01,
805
+ "learning_rate": 4.4944944944944946e-05,
806
+ "loss": 0.7499,
807
+ "step": 1060
808
+ },
809
+ {
810
+ "epoch": 18.01,
811
+ "eval_accuracy": 0.5806451612903226,
812
+ "eval_loss": 0.8370996117591858,
813
+ "eval_runtime": 164.6146,
814
+ "eval_samples_per_second": 1.318,
815
+ "eval_steps_per_second": 0.17,
816
+ "step": 1064
817
+ },
818
+ {
819
+ "epoch": 19.0,
820
+ "learning_rate": 4.4844844844844846e-05,
821
+ "loss": 0.6531,
822
+ "step": 1070
823
+ },
824
+ {
825
+ "epoch": 19.0,
826
+ "learning_rate": 4.4744744744744746e-05,
827
+ "loss": 1.0249,
828
+ "step": 1080
829
+ },
830
+ {
831
+ "epoch": 19.0,
832
+ "learning_rate": 4.4644644644644646e-05,
833
+ "loss": 0.8424,
834
+ "step": 1090
835
+ },
836
+ {
837
+ "epoch": 19.01,
838
+ "learning_rate": 4.4544544544544546e-05,
839
+ "loss": 0.6561,
840
+ "step": 1100
841
+ },
842
+ {
843
+ "epoch": 19.01,
844
+ "learning_rate": 4.4444444444444447e-05,
845
+ "loss": 0.6909,
846
+ "step": 1110
847
+ },
848
+ {
849
+ "epoch": 19.01,
850
+ "learning_rate": 4.434434434434435e-05,
851
+ "loss": 0.7142,
852
+ "step": 1120
853
+ },
854
+ {
855
+ "epoch": 19.01,
856
+ "eval_accuracy": 0.663594470046083,
857
+ "eval_loss": 0.9132143259048462,
858
+ "eval_runtime": 179.8294,
859
+ "eval_samples_per_second": 1.207,
860
+ "eval_steps_per_second": 0.156,
861
+ "step": 1120
862
+ },
863
+ {
864
+ "epoch": 20.0,
865
+ "learning_rate": 4.424424424424425e-05,
866
+ "loss": 0.7644,
867
+ "step": 1130
868
+ },
869
+ {
870
+ "epoch": 20.0,
871
+ "learning_rate": 4.414414414414415e-05,
872
+ "loss": 0.7901,
873
+ "step": 1140
874
+ },
875
+ {
876
+ "epoch": 20.01,
877
+ "learning_rate": 4.404404404404405e-05,
878
+ "loss": 0.751,
879
+ "step": 1150
880
+ },
881
+ {
882
+ "epoch": 20.01,
883
+ "learning_rate": 4.394394394394394e-05,
884
+ "loss": 0.9573,
885
+ "step": 1160
886
+ },
887
+ {
888
+ "epoch": 20.01,
889
+ "learning_rate": 4.384384384384385e-05,
890
+ "loss": 0.675,
891
+ "step": 1170
892
+ },
893
+ {
894
+ "epoch": 20.01,
895
+ "eval_accuracy": 0.6728110599078341,
896
+ "eval_loss": 0.759711503982544,
897
+ "eval_runtime": 176.3584,
898
+ "eval_samples_per_second": 1.23,
899
+ "eval_steps_per_second": 0.159,
900
+ "step": 1176
901
+ },
902
+ {
903
+ "epoch": 21.0,
904
+ "learning_rate": 4.374374374374375e-05,
905
+ "loss": 0.6662,
906
+ "step": 1180
907
+ },
908
+ {
909
+ "epoch": 21.0,
910
+ "learning_rate": 4.364364364364365e-05,
911
+ "loss": 0.6619,
912
+ "step": 1190
913
+ },
914
+ {
915
+ "epoch": 21.0,
916
+ "learning_rate": 4.354354354354355e-05,
917
+ "loss": 0.7366,
918
+ "step": 1200
919
+ },
920
+ {
921
+ "epoch": 21.01,
922
+ "learning_rate": 4.344344344344344e-05,
923
+ "loss": 0.7486,
924
+ "step": 1210
925
+ },
926
+ {
927
+ "epoch": 21.01,
928
+ "learning_rate": 4.334334334334335e-05,
929
+ "loss": 0.8793,
930
+ "step": 1220
931
+ },
932
+ {
933
+ "epoch": 21.01,
934
+ "learning_rate": 4.324324324324325e-05,
935
+ "loss": 0.604,
936
+ "step": 1230
937
+ },
938
+ {
939
+ "epoch": 21.01,
940
+ "eval_accuracy": 0.5714285714285714,
941
+ "eval_loss": 1.2003726959228516,
942
+ "eval_runtime": 174.3155,
943
+ "eval_samples_per_second": 1.245,
944
+ "eval_steps_per_second": 0.161,
945
+ "step": 1232
946
+ },
947
+ {
948
+ "epoch": 22.0,
949
+ "learning_rate": 4.314314314314314e-05,
950
+ "loss": 0.7404,
951
+ "step": 1240
952
+ },
953
+ {
954
+ "epoch": 22.0,
955
+ "learning_rate": 4.304304304304305e-05,
956
+ "loss": 0.631,
957
+ "step": 1250
958
+ },
959
+ {
960
+ "epoch": 22.01,
961
+ "learning_rate": 4.294294294294294e-05,
962
+ "loss": 0.7146,
963
+ "step": 1260
964
+ },
965
+ {
966
+ "epoch": 22.01,
967
+ "learning_rate": 4.284284284284284e-05,
968
+ "loss": 0.7736,
969
+ "step": 1270
970
+ },
971
+ {
972
+ "epoch": 22.01,
973
+ "learning_rate": 4.274274274274275e-05,
974
+ "loss": 0.7738,
975
+ "step": 1280
976
+ },
977
+ {
978
+ "epoch": 22.01,
979
+ "eval_accuracy": 0.5668202764976958,
980
+ "eval_loss": 1.0632843971252441,
981
+ "eval_runtime": 173.5303,
982
+ "eval_samples_per_second": 1.251,
983
+ "eval_steps_per_second": 0.161,
984
+ "step": 1288
985
+ },
986
+ {
987
+ "epoch": 23.0,
988
+ "learning_rate": 4.264264264264264e-05,
989
+ "loss": 0.6972,
990
+ "step": 1290
991
+ },
992
+ {
993
+ "epoch": 23.0,
994
+ "learning_rate": 4.254254254254255e-05,
995
+ "loss": 0.7115,
996
+ "step": 1300
997
+ },
998
+ {
999
+ "epoch": 23.0,
1000
+ "learning_rate": 4.244244244244244e-05,
1001
+ "loss": 0.5288,
1002
+ "step": 1310
1003
+ },
1004
+ {
1005
+ "epoch": 23.01,
1006
+ "learning_rate": 4.234234234234234e-05,
1007
+ "loss": 0.7268,
1008
+ "step": 1320
1009
+ },
1010
+ {
1011
+ "epoch": 23.01,
1012
+ "learning_rate": 4.224224224224225e-05,
1013
+ "loss": 0.5663,
1014
+ "step": 1330
1015
+ },
1016
+ {
1017
+ "epoch": 23.01,
1018
+ "learning_rate": 4.214214214214214e-05,
1019
+ "loss": 0.7651,
1020
+ "step": 1340
1021
+ },
1022
+ {
1023
+ "epoch": 23.01,
1024
+ "eval_accuracy": 0.6820276497695853,
1025
+ "eval_loss": 0.6864995360374451,
1026
+ "eval_runtime": 171.8309,
1027
+ "eval_samples_per_second": 1.263,
1028
+ "eval_steps_per_second": 0.163,
1029
+ "step": 1344
1030
+ },
1031
+ {
1032
+ "epoch": 24.0,
1033
+ "learning_rate": 4.204204204204204e-05,
1034
+ "loss": 0.5517,
1035
+ "step": 1350
1036
+ },
1037
+ {
1038
+ "epoch": 24.0,
1039
+ "learning_rate": 4.194194194194194e-05,
1040
+ "loss": 0.6645,
1041
+ "step": 1360
1042
+ },
1043
+ {
1044
+ "epoch": 24.0,
1045
+ "learning_rate": 4.1841841841841843e-05,
1046
+ "loss": 0.5702,
1047
+ "step": 1370
1048
+ },
1049
+ {
1050
+ "epoch": 24.01,
1051
+ "learning_rate": 4.1741741741741744e-05,
1052
+ "loss": 0.8262,
1053
+ "step": 1380
1054
+ },
1055
+ {
1056
+ "epoch": 24.01,
1057
+ "learning_rate": 4.1641641641641644e-05,
1058
+ "loss": 0.9137,
1059
+ "step": 1390
1060
+ },
1061
+ {
1062
+ "epoch": 24.01,
1063
+ "learning_rate": 4.1541541541541544e-05,
1064
+ "loss": 0.6292,
1065
+ "step": 1400
1066
+ },
1067
+ {
1068
+ "epoch": 24.01,
1069
+ "eval_accuracy": 0.6912442396313364,
1070
+ "eval_loss": 0.7606768012046814,
1071
+ "eval_runtime": 169.2507,
1072
+ "eval_samples_per_second": 1.282,
1073
+ "eval_steps_per_second": 0.165,
1074
+ "step": 1400
1075
+ },
1076
+ {
1077
+ "epoch": 25.0,
1078
+ "learning_rate": 4.1441441441441444e-05,
1079
+ "loss": 0.6136,
1080
+ "step": 1410
1081
+ },
1082
+ {
1083
+ "epoch": 25.0,
1084
+ "learning_rate": 4.1341341341341344e-05,
1085
+ "loss": 0.5996,
1086
+ "step": 1420
1087
+ },
1088
+ {
1089
+ "epoch": 25.01,
1090
+ "learning_rate": 4.124124124124124e-05,
1091
+ "loss": 0.7239,
1092
+ "step": 1430
1093
+ },
1094
+ {
1095
+ "epoch": 25.01,
1096
+ "learning_rate": 4.1141141141141144e-05,
1097
+ "loss": 0.5766,
1098
+ "step": 1440
1099
+ },
1100
+ {
1101
+ "epoch": 25.01,
1102
+ "learning_rate": 4.1041041041041045e-05,
1103
+ "loss": 0.7387,
1104
+ "step": 1450
1105
+ },
1106
+ {
1107
+ "epoch": 25.01,
1108
+ "eval_accuracy": 0.5345622119815668,
1109
+ "eval_loss": 1.303768277168274,
1110
+ "eval_runtime": 164.0294,
1111
+ "eval_samples_per_second": 1.323,
1112
+ "eval_steps_per_second": 0.171,
1113
+ "step": 1456
1114
+ },
1115
+ {
1116
+ "epoch": 26.0,
1117
+ "learning_rate": 4.0940940940940945e-05,
1118
+ "loss": 0.7438,
1119
+ "step": 1460
1120
+ },
1121
+ {
1122
+ "epoch": 26.0,
1123
+ "learning_rate": 4.0840840840840845e-05,
1124
+ "loss": 0.7224,
1125
+ "step": 1470
1126
+ },
1127
+ {
1128
+ "epoch": 26.0,
1129
+ "learning_rate": 4.074074074074074e-05,
1130
+ "loss": 0.9601,
1131
+ "step": 1480
1132
+ },
1133
+ {
1134
+ "epoch": 26.01,
1135
+ "learning_rate": 4.0640640640640645e-05,
1136
+ "loss": 0.6726,
1137
+ "step": 1490
1138
+ },
1139
+ {
1140
+ "epoch": 26.01,
1141
+ "learning_rate": 4.0540540540540545e-05,
1142
+ "loss": 0.7734,
1143
+ "step": 1500
1144
+ },
1145
+ {
1146
+ "epoch": 26.01,
1147
+ "learning_rate": 4.044044044044044e-05,
1148
+ "loss": 0.7038,
1149
+ "step": 1510
1150
+ },
1151
+ {
1152
+ "epoch": 26.01,
1153
+ "eval_accuracy": 0.5529953917050692,
1154
+ "eval_loss": 1.2832341194152832,
1155
+ "eval_runtime": 163.9119,
1156
+ "eval_samples_per_second": 1.324,
1157
+ "eval_steps_per_second": 0.171,
1158
+ "step": 1512
1159
+ },
1160
+ {
1161
+ "epoch": 27.0,
1162
+ "learning_rate": 4.0340340340340346e-05,
1163
+ "loss": 0.4438,
1164
+ "step": 1520
1165
+ },
1166
+ {
1167
+ "epoch": 27.0,
1168
+ "learning_rate": 4.024024024024024e-05,
1169
+ "loss": 0.5555,
1170
+ "step": 1530
1171
+ },
1172
+ {
1173
+ "epoch": 27.01,
1174
+ "learning_rate": 4.014014014014014e-05,
1175
+ "loss": 0.9817,
1176
+ "step": 1540
1177
+ },
1178
+ {
1179
+ "epoch": 27.01,
1180
+ "learning_rate": 4.0040040040040046e-05,
1181
+ "loss": 0.8025,
1182
+ "step": 1550
1183
+ },
1184
+ {
1185
+ "epoch": 27.01,
1186
+ "learning_rate": 3.993993993993994e-05,
1187
+ "loss": 0.7565,
1188
+ "step": 1560
1189
+ },
1190
+ {
1191
+ "epoch": 27.01,
1192
+ "eval_accuracy": 0.7004608294930875,
1193
+ "eval_loss": 0.8127643465995789,
1194
+ "eval_runtime": 165.3212,
1195
+ "eval_samples_per_second": 1.313,
1196
+ "eval_steps_per_second": 0.169,
1197
+ "step": 1568
1198
+ },
1199
+ {
1200
+ "epoch": 28.0,
1201
+ "learning_rate": 3.9839839839839846e-05,
1202
+ "loss": 0.6489,
1203
+ "step": 1570
1204
+ },
1205
+ {
1206
+ "epoch": 28.0,
1207
+ "learning_rate": 3.973973973973974e-05,
1208
+ "loss": 0.6275,
1209
+ "step": 1580
1210
+ },
1211
+ {
1212
+ "epoch": 28.0,
1213
+ "learning_rate": 3.963963963963964e-05,
1214
+ "loss": 0.7778,
1215
+ "step": 1590
1216
+ },
1217
+ {
1218
+ "epoch": 28.01,
1219
+ "learning_rate": 3.953953953953955e-05,
1220
+ "loss": 0.7334,
1221
+ "step": 1600
1222
+ },
1223
+ {
1224
+ "epoch": 28.01,
1225
+ "learning_rate": 3.943943943943944e-05,
1226
+ "loss": 0.839,
1227
+ "step": 1610
1228
+ },
1229
+ {
1230
+ "epoch": 28.01,
1231
+ "learning_rate": 3.933933933933934e-05,
1232
+ "loss": 0.6516,
1233
+ "step": 1620
1234
+ },
1235
+ {
1236
+ "epoch": 28.01,
1237
+ "eval_accuracy": 0.5391705069124424,
1238
+ "eval_loss": 1.0892592668533325,
1239
+ "eval_runtime": 164.7722,
1240
+ "eval_samples_per_second": 1.317,
1241
+ "eval_steps_per_second": 0.17,
1242
+ "step": 1624
1243
+ },
1244
+ {
1245
+ "epoch": 29.0,
1246
+ "learning_rate": 3.923923923923924e-05,
1247
+ "loss": 0.7291,
1248
+ "step": 1630
1249
+ },
1250
+ {
1251
+ "epoch": 29.0,
1252
+ "learning_rate": 3.913913913913914e-05,
1253
+ "loss": 0.5831,
1254
+ "step": 1640
1255
+ },
1256
+ {
1257
+ "epoch": 29.0,
1258
+ "learning_rate": 3.903903903903904e-05,
1259
+ "loss": 0.7305,
1260
+ "step": 1650
1261
+ },
1262
+ {
1263
+ "epoch": 29.01,
1264
+ "learning_rate": 3.893893893893894e-05,
1265
+ "loss": 0.6436,
1266
+ "step": 1660
1267
+ },
1268
+ {
1269
+ "epoch": 29.01,
1270
+ "learning_rate": 3.883883883883884e-05,
1271
+ "loss": 0.7121,
1272
+ "step": 1670
1273
+ },
1274
+ {
1275
+ "epoch": 29.01,
1276
+ "learning_rate": 3.873873873873874e-05,
1277
+ "loss": 0.7074,
1278
+ "step": 1680
1279
+ },
1280
+ {
1281
+ "epoch": 29.01,
1282
+ "eval_accuracy": 0.5990783410138248,
1283
+ "eval_loss": 1.08941650390625,
1284
+ "eval_runtime": 163.3245,
1285
+ "eval_samples_per_second": 1.329,
1286
+ "eval_steps_per_second": 0.171,
1287
+ "step": 1680
1288
+ },
1289
+ {
1290
+ "epoch": 30.0,
1291
+ "learning_rate": 3.863863863863864e-05,
1292
+ "loss": 0.6485,
1293
+ "step": 1690
1294
+ },
1295
+ {
1296
+ "epoch": 30.0,
1297
+ "learning_rate": 3.8538538538538534e-05,
1298
+ "loss": 0.672,
1299
+ "step": 1700
1300
+ },
1301
+ {
1302
+ "epoch": 30.01,
1303
+ "learning_rate": 3.843843843843844e-05,
1304
+ "loss": 0.6947,
1305
+ "step": 1710
1306
+ },
1307
+ {
1308
+ "epoch": 30.01,
1309
+ "learning_rate": 3.833833833833834e-05,
1310
+ "loss": 0.7492,
1311
+ "step": 1720
1312
+ },
1313
+ {
1314
+ "epoch": 30.01,
1315
+ "learning_rate": 3.823823823823824e-05,
1316
+ "loss": 0.4902,
1317
+ "step": 1730
1318
+ },
1319
+ {
1320
+ "epoch": 30.01,
1321
+ "eval_accuracy": 0.5622119815668203,
1322
+ "eval_loss": 1.0694791078567505,
1323
+ "eval_runtime": 166.2845,
1324
+ "eval_samples_per_second": 1.305,
1325
+ "eval_steps_per_second": 0.168,
1326
+ "step": 1736
1327
+ },
1328
+ {
1329
+ "epoch": 31.0,
1330
+ "learning_rate": 3.813813813813814e-05,
1331
+ "loss": 0.6269,
1332
+ "step": 1740
1333
+ },
1334
+ {
1335
+ "epoch": 31.0,
1336
+ "learning_rate": 3.8038038038038035e-05,
1337
+ "loss": 0.871,
1338
+ "step": 1750
1339
+ },
1340
+ {
1341
+ "epoch": 31.0,
1342
+ "learning_rate": 3.793793793793794e-05,
1343
+ "loss": 0.4449,
1344
+ "step": 1760
1345
+ },
1346
+ {
1347
+ "epoch": 31.01,
1348
+ "learning_rate": 3.783783783783784e-05,
1349
+ "loss": 0.8107,
1350
+ "step": 1770
1351
+ },
1352
+ {
1353
+ "epoch": 31.01,
1354
+ "learning_rate": 3.7737737737737736e-05,
1355
+ "loss": 0.7082,
1356
+ "step": 1780
1357
+ },
1358
+ {
1359
+ "epoch": 31.01,
1360
+ "learning_rate": 3.763763763763764e-05,
1361
+ "loss": 0.4563,
1362
+ "step": 1790
1363
+ },
1364
+ {
1365
+ "epoch": 31.01,
1366
+ "eval_accuracy": 0.5299539170506913,
1367
+ "eval_loss": 1.2921912670135498,
1368
+ "eval_runtime": 164.2241,
1369
+ "eval_samples_per_second": 1.321,
1370
+ "eval_steps_per_second": 0.17,
1371
+ "step": 1792
1372
+ },
1373
+ {
1374
+ "epoch": 32.0,
1375
+ "learning_rate": 3.7537537537537536e-05,
1376
+ "loss": 0.6189,
1377
+ "step": 1800
1378
+ },
1379
+ {
1380
+ "epoch": 32.0,
1381
+ "learning_rate": 3.7437437437437436e-05,
1382
+ "loss": 0.689,
1383
+ "step": 1810
1384
+ },
1385
+ {
1386
+ "epoch": 32.01,
1387
+ "learning_rate": 3.733733733733734e-05,
1388
+ "loss": 0.6324,
1389
+ "step": 1820
1390
+ },
1391
+ {
1392
+ "epoch": 32.01,
1393
+ "learning_rate": 3.7237237237237236e-05,
1394
+ "loss": 0.6537,
1395
+ "step": 1830
1396
+ },
1397
+ {
1398
+ "epoch": 32.01,
1399
+ "learning_rate": 3.713713713713714e-05,
1400
+ "loss": 0.7543,
1401
+ "step": 1840
1402
+ },
1403
+ {
1404
+ "epoch": 32.01,
1405
+ "eval_accuracy": 0.6820276497695853,
1406
+ "eval_loss": 0.8960239887237549,
1407
+ "eval_runtime": 174.6812,
1408
+ "eval_samples_per_second": 1.242,
1409
+ "eval_steps_per_second": 0.16,
1410
+ "step": 1848
1411
+ },
1412
+ {
1413
+ "epoch": 33.0,
1414
+ "learning_rate": 3.7037037037037037e-05,
1415
+ "loss": 0.8928,
1416
+ "step": 1850
1417
+ },
1418
+ {
1419
+ "epoch": 33.0,
1420
+ "learning_rate": 3.693693693693694e-05,
1421
+ "loss": 0.6618,
1422
+ "step": 1860
1423
+ },
1424
+ {
1425
+ "epoch": 33.0,
1426
+ "learning_rate": 3.6836836836836844e-05,
1427
+ "loss": 0.6464,
1428
+ "step": 1870
1429
+ },
1430
+ {
1431
+ "epoch": 33.01,
1432
+ "learning_rate": 3.673673673673674e-05,
1433
+ "loss": 0.5765,
1434
+ "step": 1880
1435
+ },
1436
+ {
1437
+ "epoch": 33.01,
1438
+ "learning_rate": 3.663663663663664e-05,
1439
+ "loss": 0.5994,
1440
+ "step": 1890
1441
+ },
1442
+ {
1443
+ "epoch": 33.01,
1444
+ "learning_rate": 3.653653653653654e-05,
1445
+ "loss": 0.7467,
1446
+ "step": 1900
1447
+ },
1448
+ {
1449
+ "epoch": 33.01,
1450
+ "eval_accuracy": 0.7465437788018433,
1451
+ "eval_loss": 0.7861269116401672,
1452
+ "eval_runtime": 174.8536,
1453
+ "eval_samples_per_second": 1.241,
1454
+ "eval_steps_per_second": 0.16,
1455
+ "step": 1904
1456
+ },
1457
+ {
1458
+ "epoch": 34.0,
1459
+ "learning_rate": 3.643643643643644e-05,
1460
+ "loss": 0.5162,
1461
+ "step": 1910
1462
+ },
1463
+ {
1464
+ "epoch": 34.0,
1465
+ "learning_rate": 3.633633633633634e-05,
1466
+ "loss": 0.5848,
1467
+ "step": 1920
1468
+ },
1469
+ {
1470
+ "epoch": 34.0,
1471
+ "learning_rate": 3.623623623623624e-05,
1472
+ "loss": 0.5114,
1473
+ "step": 1930
1474
+ },
1475
+ {
1476
+ "epoch": 34.01,
1477
+ "learning_rate": 3.613613613613614e-05,
1478
+ "loss": 0.5392,
1479
+ "step": 1940
1480
+ },
1481
+ {
1482
+ "epoch": 34.01,
1483
+ "learning_rate": 3.603603603603604e-05,
1484
+ "loss": 0.5633,
1485
+ "step": 1950
1486
+ },
1487
+ {
1488
+ "epoch": 34.01,
1489
+ "learning_rate": 3.593593593593594e-05,
1490
+ "loss": 0.6459,
1491
+ "step": 1960
1492
+ },
1493
+ {
1494
+ "epoch": 34.01,
1495
+ "eval_accuracy": 0.5622119815668203,
1496
+ "eval_loss": 1.283542275428772,
1497
+ "eval_runtime": 176.7914,
1498
+ "eval_samples_per_second": 1.227,
1499
+ "eval_steps_per_second": 0.158,
1500
+ "step": 1960
1501
+ },
1502
+ {
1503
+ "epoch": 35.0,
1504
+ "learning_rate": 3.583583583583583e-05,
1505
+ "loss": 0.533,
1506
+ "step": 1970
1507
+ },
1508
+ {
1509
+ "epoch": 35.0,
1510
+ "learning_rate": 3.573573573573574e-05,
1511
+ "loss": 0.7035,
1512
+ "step": 1980
1513
+ },
1514
+ {
1515
+ "epoch": 35.01,
1516
+ "learning_rate": 3.563563563563564e-05,
1517
+ "loss": 0.6015,
1518
+ "step": 1990
1519
+ },
1520
+ {
1521
+ "epoch": 35.01,
1522
+ "learning_rate": 3.553553553553554e-05,
1523
+ "loss": 0.7309,
1524
+ "step": 2000
1525
+ },
1526
+ {
1527
+ "epoch": 35.01,
1528
+ "learning_rate": 3.543543543543544e-05,
1529
+ "loss": 0.7296,
1530
+ "step": 2010
1531
+ },
1532
+ {
1533
+ "epoch": 35.01,
1534
+ "eval_accuracy": 0.5806451612903226,
1535
+ "eval_loss": 1.0303059816360474,
1536
+ "eval_runtime": 175.2838,
1537
+ "eval_samples_per_second": 1.238,
1538
+ "eval_steps_per_second": 0.16,
1539
+ "step": 2016
1540
+ },
1541
+ {
1542
+ "epoch": 36.0,
1543
+ "learning_rate": 3.533533533533533e-05,
1544
+ "loss": 0.6262,
1545
+ "step": 2020
1546
+ },
1547
+ {
1548
+ "epoch": 36.0,
1549
+ "learning_rate": 3.523523523523524e-05,
1550
+ "loss": 0.6559,
1551
+ "step": 2030
1552
+ },
1553
+ {
1554
+ "epoch": 36.0,
1555
+ "learning_rate": 3.513513513513514e-05,
1556
+ "loss": 0.5336,
1557
+ "step": 2040
1558
+ },
1559
+ {
1560
+ "epoch": 36.01,
1561
+ "learning_rate": 3.503503503503503e-05,
1562
+ "loss": 0.6212,
1563
+ "step": 2050
1564
+ },
1565
+ {
1566
+ "epoch": 36.01,
1567
+ "learning_rate": 3.493493493493494e-05,
1568
+ "loss": 0.8769,
1569
+ "step": 2060
1570
+ },
1571
+ {
1572
+ "epoch": 36.01,
1573
+ "learning_rate": 3.483483483483483e-05,
1574
+ "loss": 0.5,
1575
+ "step": 2070
1576
+ },
1577
+ {
1578
+ "epoch": 36.01,
1579
+ "eval_accuracy": 0.6129032258064516,
1580
+ "eval_loss": 0.8923883438110352,
1581
+ "eval_runtime": 172.7445,
1582
+ "eval_samples_per_second": 1.256,
1583
+ "eval_steps_per_second": 0.162,
1584
+ "step": 2072
1585
+ },
1586
+ {
1587
+ "epoch": 37.0,
1588
+ "learning_rate": 3.473473473473473e-05,
1589
+ "loss": 0.3772,
1590
+ "step": 2080
1591
+ },
1592
+ {
1593
+ "epoch": 37.0,
1594
+ "learning_rate": 3.463463463463464e-05,
1595
+ "loss": 0.5472,
1596
+ "step": 2090
1597
+ },
1598
+ {
1599
+ "epoch": 37.01,
1600
+ "learning_rate": 3.453453453453453e-05,
1601
+ "loss": 0.3361,
1602
+ "step": 2100
1603
+ },
1604
+ {
1605
+ "epoch": 37.01,
1606
+ "learning_rate": 3.443443443443444e-05,
1607
+ "loss": 0.8774,
1608
+ "step": 2110
1609
+ },
1610
+ {
1611
+ "epoch": 37.01,
1612
+ "learning_rate": 3.4334334334334334e-05,
1613
+ "loss": 0.5181,
1614
+ "step": 2120
1615
+ },
1616
+ {
1617
+ "epoch": 37.01,
1618
+ "eval_accuracy": 0.7235023041474654,
1619
+ "eval_loss": 0.8768814206123352,
1620
+ "eval_runtime": 169.9168,
1621
+ "eval_samples_per_second": 1.277,
1622
+ "eval_steps_per_second": 0.165,
1623
+ "step": 2128
1624
+ },
1625
+ {
1626
+ "epoch": 38.0,
1627
+ "learning_rate": 3.4234234234234234e-05,
1628
+ "loss": 0.667,
1629
+ "step": 2130
1630
+ },
1631
+ {
1632
+ "epoch": 38.0,
1633
+ "learning_rate": 3.413413413413414e-05,
1634
+ "loss": 0.5336,
1635
+ "step": 2140
1636
+ },
1637
+ {
1638
+ "epoch": 38.0,
1639
+ "learning_rate": 3.4034034034034034e-05,
1640
+ "loss": 0.7924,
1641
+ "step": 2150
1642
+ },
1643
+ {
1644
+ "epoch": 38.01,
1645
+ "learning_rate": 3.3933933933933934e-05,
1646
+ "loss": 0.6568,
1647
+ "step": 2160
1648
+ },
1649
+ {
1650
+ "epoch": 38.01,
1651
+ "learning_rate": 3.3833833833833834e-05,
1652
+ "loss": 0.666,
1653
+ "step": 2170
1654
+ },
1655
+ {
1656
+ "epoch": 38.01,
1657
+ "learning_rate": 3.3733733733733734e-05,
1658
+ "loss": 0.5225,
1659
+ "step": 2180
1660
+ },
1661
+ {
1662
+ "epoch": 38.01,
1663
+ "eval_accuracy": 0.7511520737327189,
1664
+ "eval_loss": 0.7287940382957458,
1665
+ "eval_runtime": 165.8928,
1666
+ "eval_samples_per_second": 1.308,
1667
+ "eval_steps_per_second": 0.169,
1668
+ "step": 2184
1669
+ },
1670
+ {
1671
+ "epoch": 39.0,
1672
+ "learning_rate": 3.3633633633633635e-05,
1673
+ "loss": 0.6175,
1674
+ "step": 2190
1675
+ },
1676
+ {
1677
+ "epoch": 39.0,
1678
+ "learning_rate": 3.3533533533533535e-05,
1679
+ "loss": 0.4297,
1680
+ "step": 2200
1681
+ },
1682
+ {
1683
+ "epoch": 39.0,
1684
+ "learning_rate": 3.3433433433433435e-05,
1685
+ "loss": 0.4564,
1686
+ "step": 2210
1687
+ },
1688
+ {
1689
+ "epoch": 39.01,
1690
+ "learning_rate": 3.3333333333333335e-05,
1691
+ "loss": 0.7067,
1692
+ "step": 2220
1693
+ },
1694
+ {
1695
+ "epoch": 39.01,
1696
+ "learning_rate": 3.3233233233233235e-05,
1697
+ "loss": 0.7556,
1698
+ "step": 2230
1699
+ },
1700
+ {
1701
+ "epoch": 39.01,
1702
+ "learning_rate": 3.3133133133133135e-05,
1703
+ "loss": 0.5617,
1704
+ "step": 2240
1705
+ },
1706
+ {
1707
+ "epoch": 39.01,
1708
+ "eval_accuracy": 0.7926267281105991,
1709
+ "eval_loss": 0.6330269575119019,
1710
+ "eval_runtime": 164.5501,
1711
+ "eval_samples_per_second": 1.319,
1712
+ "eval_steps_per_second": 0.17,
1713
+ "step": 2240
1714
+ },
1715
+ {
1716
+ "epoch": 40.0,
1717
+ "learning_rate": 3.3033033033033035e-05,
1718
+ "loss": 0.4753,
1719
+ "step": 2250
1720
+ },
1721
+ {
1722
+ "epoch": 40.0,
1723
+ "learning_rate": 3.2932932932932935e-05,
1724
+ "loss": 0.5791,
1725
+ "step": 2260
1726
+ },
1727
+ {
1728
+ "epoch": 40.01,
1729
+ "learning_rate": 3.2832832832832836e-05,
1730
+ "loss": 0.4268,
1731
+ "step": 2270
1732
+ },
1733
+ {
1734
+ "epoch": 40.01,
1735
+ "learning_rate": 3.2732732732732736e-05,
1736
+ "loss": 0.5892,
1737
+ "step": 2280
1738
+ },
1739
+ {
1740
+ "epoch": 40.01,
1741
+ "learning_rate": 3.263263263263263e-05,
1742
+ "loss": 0.677,
1743
+ "step": 2290
1744
+ },
1745
+ {
1746
+ "epoch": 40.01,
1747
+ "eval_accuracy": 0.7419354838709677,
1748
+ "eval_loss": 0.7732899785041809,
1749
+ "eval_runtime": 164.8066,
1750
+ "eval_samples_per_second": 1.317,
1751
+ "eval_steps_per_second": 0.17,
1752
+ "step": 2296
1753
+ },
1754
+ {
1755
+ "epoch": 41.0,
1756
+ "learning_rate": 3.2532532532532536e-05,
1757
+ "loss": 0.6502,
1758
+ "step": 2300
1759
+ },
1760
+ {
1761
+ "epoch": 41.0,
1762
+ "learning_rate": 3.2432432432432436e-05,
1763
+ "loss": 0.5512,
1764
+ "step": 2310
1765
+ },
1766
+ {
1767
+ "epoch": 41.0,
1768
+ "learning_rate": 3.233233233233233e-05,
1769
+ "loss": 0.6401,
1770
+ "step": 2320
1771
+ },
1772
+ {
1773
+ "epoch": 41.01,
1774
+ "learning_rate": 3.2232232232232236e-05,
1775
+ "loss": 0.3983,
1776
+ "step": 2330
1777
+ },
1778
+ {
1779
+ "epoch": 41.01,
1780
+ "learning_rate": 3.213213213213213e-05,
1781
+ "loss": 0.7081,
1782
+ "step": 2340
1783
+ },
1784
+ {
1785
+ "epoch": 41.01,
1786
+ "learning_rate": 3.203203203203203e-05,
1787
+ "loss": 0.6891,
1788
+ "step": 2350
1789
+ },
1790
+ {
1791
+ "epoch": 41.01,
1792
+ "eval_accuracy": 0.815668202764977,
1793
+ "eval_loss": 0.7462968826293945,
1794
+ "eval_runtime": 163.9014,
1795
+ "eval_samples_per_second": 1.324,
1796
+ "eval_steps_per_second": 0.171,
1797
+ "step": 2352
1798
+ },
1799
+ {
1800
+ "epoch": 42.0,
1801
+ "learning_rate": 3.193193193193194e-05,
1802
+ "loss": 0.3909,
1803
+ "step": 2360
1804
+ },
1805
+ {
1806
+ "epoch": 42.0,
1807
+ "learning_rate": 3.183183183183183e-05,
1808
+ "loss": 0.641,
1809
+ "step": 2370
1810
+ },
1811
+ {
1812
+ "epoch": 42.01,
1813
+ "learning_rate": 3.173173173173174e-05,
1814
+ "loss": 0.5696,
1815
+ "step": 2380
1816
+ },
1817
+ {
1818
+ "epoch": 42.01,
1819
+ "learning_rate": 3.163163163163163e-05,
1820
+ "loss": 0.419,
1821
+ "step": 2390
1822
+ },
1823
+ {
1824
+ "epoch": 42.01,
1825
+ "learning_rate": 3.153153153153153e-05,
1826
+ "loss": 0.6662,
1827
+ "step": 2400
1828
+ },
1829
+ {
1830
+ "epoch": 42.01,
1831
+ "eval_accuracy": 0.7235023041474654,
1832
+ "eval_loss": 0.9304085969924927,
1833
+ "eval_runtime": 165.0958,
1834
+ "eval_samples_per_second": 1.314,
1835
+ "eval_steps_per_second": 0.17,
1836
+ "step": 2408
1837
+ },
1838
+ {
1839
+ "epoch": 43.0,
1840
+ "learning_rate": 3.143143143143144e-05,
1841
+ "loss": 0.5243,
1842
+ "step": 2410
1843
+ },
1844
+ {
1845
+ "epoch": 43.0,
1846
+ "learning_rate": 3.133133133133133e-05,
1847
+ "loss": 0.4339,
1848
+ "step": 2420
1849
+ },
1850
+ {
1851
+ "epoch": 43.0,
1852
+ "learning_rate": 3.123123123123123e-05,
1853
+ "loss": 0.4849,
1854
+ "step": 2430
1855
+ },
1856
+ {
1857
+ "epoch": 43.01,
1858
+ "learning_rate": 3.113113113113113e-05,
1859
+ "loss": 0.85,
1860
+ "step": 2440
1861
+ },
1862
+ {
1863
+ "epoch": 43.01,
1864
+ "learning_rate": 3.103103103103103e-05,
1865
+ "loss": 0.4044,
1866
+ "step": 2450
1867
+ },
1868
+ {
1869
+ "epoch": 43.01,
1870
+ "learning_rate": 3.093093093093093e-05,
1871
+ "loss": 0.4602,
1872
+ "step": 2460
1873
+ },
1874
+ {
1875
+ "epoch": 43.01,
1876
+ "eval_accuracy": 0.5207373271889401,
1877
+ "eval_loss": 1.5115013122558594,
1878
+ "eval_runtime": 163.1033,
1879
+ "eval_samples_per_second": 1.33,
1880
+ "eval_steps_per_second": 0.172,
1881
+ "step": 2464
1882
+ },
1883
+ {
1884
+ "epoch": 44.0,
1885
+ "learning_rate": 3.083083083083083e-05,
1886
+ "loss": 0.5138,
1887
+ "step": 2470
1888
+ },
1889
+ {
1890
+ "epoch": 44.0,
1891
+ "learning_rate": 3.073073073073073e-05,
1892
+ "loss": 0.3286,
1893
+ "step": 2480
1894
+ },
1895
+ {
1896
+ "epoch": 44.0,
1897
+ "learning_rate": 3.063063063063063e-05,
1898
+ "loss": 0.4418,
1899
+ "step": 2490
1900
+ },
1901
+ {
1902
+ "epoch": 44.01,
1903
+ "learning_rate": 3.053053053053053e-05,
1904
+ "loss": 0.4038,
1905
+ "step": 2500
1906
+ },
1907
+ {
1908
+ "epoch": 44.01,
1909
+ "learning_rate": 3.0430430430430436e-05,
1910
+ "loss": 0.7055,
1911
+ "step": 2510
1912
+ },
1913
+ {
1914
+ "epoch": 44.01,
1915
+ "learning_rate": 3.0330330330330332e-05,
1916
+ "loss": 0.581,
1917
+ "step": 2520
1918
+ },
1919
+ {
1920
+ "epoch": 44.01,
1921
+ "eval_accuracy": 0.6175115207373272,
1922
+ "eval_loss": 1.229552984237671,
1923
+ "eval_runtime": 179.9416,
1924
+ "eval_samples_per_second": 1.206,
1925
+ "eval_steps_per_second": 0.156,
1926
+ "step": 2520
1927
+ },
1928
+ {
1929
+ "epoch": 45.0,
1930
+ "learning_rate": 3.0230230230230232e-05,
1931
+ "loss": 0.5623,
1932
+ "step": 2530
1933
+ },
1934
+ {
1935
+ "epoch": 45.0,
1936
+ "learning_rate": 3.013013013013013e-05,
1937
+ "loss": 0.6803,
1938
+ "step": 2540
1939
+ },
1940
+ {
1941
+ "epoch": 45.01,
1942
+ "learning_rate": 3.0030030030030033e-05,
1943
+ "loss": 0.6378,
1944
+ "step": 2550
1945
+ },
1946
+ {
1947
+ "epoch": 45.01,
1948
+ "learning_rate": 2.9929929929929933e-05,
1949
+ "loss": 0.6153,
1950
+ "step": 2560
1951
+ },
1952
+ {
1953
+ "epoch": 45.01,
1954
+ "learning_rate": 2.982982982982983e-05,
1955
+ "loss": 0.5418,
1956
+ "step": 2570
1957
+ },
1958
+ {
1959
+ "epoch": 45.01,
1960
+ "eval_accuracy": 0.6221198156682027,
1961
+ "eval_loss": 1.0069782733917236,
1962
+ "eval_runtime": 171.9564,
1963
+ "eval_samples_per_second": 1.262,
1964
+ "eval_steps_per_second": 0.163,
1965
+ "step": 2576
1966
+ },
1967
+ {
1968
+ "epoch": 46.0,
1969
+ "learning_rate": 2.9729729729729733e-05,
1970
+ "loss": 0.489,
1971
+ "step": 2580
1972
+ },
1973
+ {
1974
+ "epoch": 46.0,
1975
+ "learning_rate": 2.962962962962963e-05,
1976
+ "loss": 0.4833,
1977
+ "step": 2590
1978
+ },
1979
+ {
1980
+ "epoch": 46.0,
1981
+ "learning_rate": 2.952952952952953e-05,
1982
+ "loss": 0.8491,
1983
+ "step": 2600
1984
+ },
1985
+ {
1986
+ "epoch": 46.01,
1987
+ "learning_rate": 2.9429429429429427e-05,
1988
+ "loss": 0.4682,
1989
+ "step": 2610
1990
+ },
1991
+ {
1992
+ "epoch": 46.01,
1993
+ "learning_rate": 2.932932932932933e-05,
1994
+ "loss": 0.6626,
1995
+ "step": 2620
1996
+ },
1997
+ {
1998
+ "epoch": 46.01,
1999
+ "learning_rate": 2.9229229229229234e-05,
2000
+ "loss": 0.5199,
2001
+ "step": 2630
2002
+ },
2003
+ {
2004
+ "epoch": 46.01,
2005
+ "eval_accuracy": 0.6082949308755761,
2006
+ "eval_loss": 1.1344462633132935,
2007
+ "eval_runtime": 173.0896,
2008
+ "eval_samples_per_second": 1.254,
2009
+ "eval_steps_per_second": 0.162,
2010
+ "step": 2632
2011
+ },
2012
+ {
2013
+ "epoch": 47.0,
2014
+ "learning_rate": 2.912912912912913e-05,
2015
+ "loss": 0.6701,
2016
+ "step": 2640
2017
+ },
2018
+ {
2019
+ "epoch": 47.0,
2020
+ "learning_rate": 2.902902902902903e-05,
2021
+ "loss": 0.504,
2022
+ "step": 2650
2023
+ },
2024
+ {
2025
+ "epoch": 47.01,
2026
+ "learning_rate": 2.8928928928928928e-05,
2027
+ "loss": 0.5013,
2028
+ "step": 2660
2029
+ },
2030
+ {
2031
+ "epoch": 47.01,
2032
+ "learning_rate": 2.882882882882883e-05,
2033
+ "loss": 0.5753,
2034
+ "step": 2670
2035
+ },
2036
+ {
2037
+ "epoch": 47.01,
2038
+ "learning_rate": 2.872872872872873e-05,
2039
+ "loss": 0.6876,
2040
+ "step": 2680
2041
+ },
2042
+ {
2043
+ "epoch": 47.01,
2044
+ "eval_accuracy": 0.576036866359447,
2045
+ "eval_loss": 0.9799597859382629,
2046
+ "eval_runtime": 175.2983,
2047
+ "eval_samples_per_second": 1.238,
2048
+ "eval_steps_per_second": 0.16,
2049
+ "step": 2688
2050
+ },
2051
+ {
2052
+ "epoch": 48.0,
2053
+ "learning_rate": 2.8628628628628628e-05,
2054
+ "loss": 0.5597,
2055
+ "step": 2690
2056
+ },
2057
+ {
2058
+ "epoch": 48.0,
2059
+ "learning_rate": 2.852852852852853e-05,
2060
+ "loss": 0.579,
2061
+ "step": 2700
2062
+ },
2063
+ {
2064
+ "epoch": 48.0,
2065
+ "learning_rate": 2.8428428428428428e-05,
2066
+ "loss": 0.5215,
2067
+ "step": 2710
2068
+ },
2069
+ {
2070
+ "epoch": 48.01,
2071
+ "learning_rate": 2.832832832832833e-05,
2072
+ "loss": 0.4575,
2073
+ "step": 2720
2074
+ },
2075
+ {
2076
+ "epoch": 48.01,
2077
+ "learning_rate": 2.8228228228228232e-05,
2078
+ "loss": 0.4567,
2079
+ "step": 2730
2080
+ },
2081
+ {
2082
+ "epoch": 48.01,
2083
+ "learning_rate": 2.812812812812813e-05,
2084
+ "loss": 0.5165,
2085
+ "step": 2740
2086
+ },
2087
+ {
2088
+ "epoch": 48.01,
2089
+ "eval_accuracy": 0.5069124423963134,
2090
+ "eval_loss": 1.3708863258361816,
2091
+ "eval_runtime": 174.555,
2092
+ "eval_samples_per_second": 1.243,
2093
+ "eval_steps_per_second": 0.16,
2094
+ "step": 2744
2095
+ },
2096
+ {
2097
+ "epoch": 49.0,
2098
+ "learning_rate": 2.8028028028028032e-05,
2099
+ "loss": 0.4317,
2100
+ "step": 2750
2101
+ },
2102
+ {
2103
+ "epoch": 49.0,
2104
+ "learning_rate": 2.7927927927927926e-05,
2105
+ "loss": 0.5479,
2106
+ "step": 2760
2107
+ },
2108
+ {
2109
+ "epoch": 49.0,
2110
+ "learning_rate": 2.782782782782783e-05,
2111
+ "loss": 0.4649,
2112
+ "step": 2770
2113
+ },
2114
+ {
2115
+ "epoch": 49.01,
2116
+ "learning_rate": 2.7727727727727733e-05,
2117
+ "loss": 0.5208,
2118
+ "step": 2780
2119
+ },
2120
+ {
2121
+ "epoch": 49.01,
2122
+ "learning_rate": 2.762762762762763e-05,
2123
+ "loss": 0.5801,
2124
+ "step": 2790
2125
+ },
2126
+ {
2127
+ "epoch": 49.01,
2128
+ "learning_rate": 2.752752752752753e-05,
2129
+ "loss": 0.5727,
2130
+ "step": 2800
2131
+ },
2132
+ {
2133
+ "epoch": 49.01,
2134
+ "eval_accuracy": 0.6866359447004609,
2135
+ "eval_loss": 0.9960273504257202,
2136
+ "eval_runtime": 169.912,
2137
+ "eval_samples_per_second": 1.277,
2138
+ "eval_steps_per_second": 0.165,
2139
+ "step": 2800
2140
+ },
2141
+ {
2142
+ "epoch": 50.0,
2143
+ "learning_rate": 2.7427427427427426e-05,
2144
+ "loss": 0.5112,
2145
+ "step": 2810
2146
+ },
2147
+ {
2148
+ "epoch": 50.0,
2149
+ "learning_rate": 2.732732732732733e-05,
2150
+ "loss": 0.37,
2151
+ "step": 2820
2152
+ },
2153
+ {
2154
+ "epoch": 50.01,
2155
+ "learning_rate": 2.722722722722723e-05,
2156
+ "loss": 0.6214,
2157
+ "step": 2830
2158
+ },
2159
+ {
2160
+ "epoch": 50.01,
2161
+ "learning_rate": 2.7127127127127127e-05,
2162
+ "loss": 0.375,
2163
+ "step": 2840
2164
+ },
2165
+ {
2166
+ "epoch": 50.01,
2167
+ "learning_rate": 2.702702702702703e-05,
2168
+ "loss": 0.3698,
2169
+ "step": 2850
2170
+ },
2171
+ {
2172
+ "epoch": 50.01,
2173
+ "eval_accuracy": 0.5483870967741935,
2174
+ "eval_loss": 1.2246321439743042,
2175
+ "eval_runtime": 164.0861,
2176
+ "eval_samples_per_second": 1.322,
2177
+ "eval_steps_per_second": 0.171,
2178
+ "step": 2856
2179
+ },
2180
+ {
2181
+ "epoch": 51.0,
2182
+ "learning_rate": 2.6926926926926927e-05,
2183
+ "loss": 0.6204,
2184
+ "step": 2860
2185
+ },
2186
+ {
2187
+ "epoch": 51.0,
2188
+ "learning_rate": 2.6826826826826827e-05,
2189
+ "loss": 0.4234,
2190
+ "step": 2870
2191
+ },
2192
+ {
2193
+ "epoch": 51.0,
2194
+ "learning_rate": 2.672672672672673e-05,
2195
+ "loss": 0.4375,
2196
+ "step": 2880
2197
+ },
2198
+ {
2199
+ "epoch": 51.01,
2200
+ "learning_rate": 2.6626626626626627e-05,
2201
+ "loss": 0.3495,
2202
+ "step": 2890
2203
+ },
2204
+ {
2205
+ "epoch": 51.01,
2206
+ "learning_rate": 2.652652652652653e-05,
2207
+ "loss": 0.4522,
2208
+ "step": 2900
2209
+ },
2210
+ {
2211
+ "epoch": 51.01,
2212
+ "learning_rate": 2.6426426426426428e-05,
2213
+ "loss": 0.5836,
2214
+ "step": 2910
2215
+ },
2216
+ {
2217
+ "epoch": 51.01,
2218
+ "eval_accuracy": 0.6866359447004609,
2219
+ "eval_loss": 0.9892141222953796,
2220
+ "eval_runtime": 166.0199,
2221
+ "eval_samples_per_second": 1.307,
2222
+ "eval_steps_per_second": 0.169,
2223
+ "step": 2912
2224
+ },
2225
+ {
2226
+ "epoch": 52.0,
2227
+ "learning_rate": 2.6326326326326328e-05,
2228
+ "loss": 0.4078,
2229
+ "step": 2920
2230
+ },
2231
+ {
2232
+ "epoch": 52.0,
2233
+ "learning_rate": 2.6226226226226224e-05,
2234
+ "loss": 0.5069,
2235
+ "step": 2930
2236
+ },
2237
+ {
2238
+ "epoch": 52.01,
2239
+ "learning_rate": 2.6126126126126128e-05,
2240
+ "loss": 0.4839,
2241
+ "step": 2940
2242
+ },
2243
+ {
2244
+ "epoch": 52.01,
2245
+ "learning_rate": 2.6026026026026028e-05,
2246
+ "loss": 0.418,
2247
+ "step": 2950
2248
+ },
2249
+ {
2250
+ "epoch": 52.01,
2251
+ "learning_rate": 2.5925925925925925e-05,
2252
+ "loss": 0.6017,
2253
+ "step": 2960
2254
+ },
2255
+ {
2256
+ "epoch": 52.01,
2257
+ "eval_accuracy": 0.6589861751152074,
2258
+ "eval_loss": 0.9387974143028259,
2259
+ "eval_runtime": 163.3812,
2260
+ "eval_samples_per_second": 1.328,
2261
+ "eval_steps_per_second": 0.171,
2262
+ "step": 2968
2263
+ },
2264
+ {
2265
+ "epoch": 53.0,
2266
+ "learning_rate": 2.582582582582583e-05,
2267
+ "loss": 0.5042,
2268
+ "step": 2970
2269
+ },
2270
+ {
2271
+ "epoch": 53.0,
2272
+ "learning_rate": 2.5725725725725725e-05,
2273
+ "loss": 0.7511,
2274
+ "step": 2980
2275
+ },
2276
+ {
2277
+ "epoch": 53.0,
2278
+ "learning_rate": 2.5625625625625625e-05,
2279
+ "loss": 0.5523,
2280
+ "step": 2990
2281
+ },
2282
+ {
2283
+ "epoch": 53.01,
2284
+ "learning_rate": 2.552552552552553e-05,
2285
+ "loss": 0.3819,
2286
+ "step": 3000
2287
+ },
2288
+ {
2289
+ "epoch": 53.01,
2290
+ "learning_rate": 2.5425425425425426e-05,
2291
+ "loss": 0.5585,
2292
+ "step": 3010
2293
+ },
2294
+ {
2295
+ "epoch": 53.01,
2296
+ "learning_rate": 2.532532532532533e-05,
2297
+ "loss": 0.4851,
2298
+ "step": 3020
2299
+ },
2300
+ {
2301
+ "epoch": 53.01,
2302
+ "eval_accuracy": 0.6589861751152074,
2303
+ "eval_loss": 1.1415479183197021,
2304
+ "eval_runtime": 166.6314,
2305
+ "eval_samples_per_second": 1.302,
2306
+ "eval_steps_per_second": 0.168,
2307
+ "step": 3024
2308
+ },
2309
+ {
2310
+ "epoch": 54.0,
2311
+ "learning_rate": 2.5225225225225222e-05,
2312
+ "loss": 0.564,
2313
+ "step": 3030
2314
+ },
2315
+ {
2316
+ "epoch": 54.0,
2317
+ "learning_rate": 2.5125125125125126e-05,
2318
+ "loss": 0.4419,
2319
+ "step": 3040
2320
+ },
2321
+ {
2322
+ "epoch": 54.0,
2323
+ "learning_rate": 2.502502502502503e-05,
2324
+ "loss": 0.3716,
2325
+ "step": 3050
2326
+ },
2327
+ {
2328
+ "epoch": 54.01,
2329
+ "learning_rate": 2.4924924924924926e-05,
2330
+ "loss": 0.6022,
2331
+ "step": 3060
2332
+ },
2333
+ {
2334
+ "epoch": 54.01,
2335
+ "learning_rate": 2.4824824824824826e-05,
2336
+ "loss": 0.5008,
2337
+ "step": 3070
2338
+ },
2339
+ {
2340
+ "epoch": 54.01,
2341
+ "learning_rate": 2.4724724724724727e-05,
2342
+ "loss": 0.3038,
2343
+ "step": 3080
2344
+ },
2345
+ {
2346
+ "epoch": 54.01,
2347
+ "eval_accuracy": 0.695852534562212,
2348
+ "eval_loss": 0.9412721991539001,
2349
+ "eval_runtime": 163.6724,
2350
+ "eval_samples_per_second": 1.326,
2351
+ "eval_steps_per_second": 0.171,
2352
+ "step": 3080
2353
+ },
2354
+ {
2355
+ "epoch": 55.0,
2356
+ "learning_rate": 2.4624624624624627e-05,
2357
+ "loss": 0.5041,
2358
+ "step": 3090
2359
+ },
2360
+ {
2361
+ "epoch": 55.0,
2362
+ "learning_rate": 2.4524524524524527e-05,
2363
+ "loss": 0.4752,
2364
+ "step": 3100
2365
+ },
2366
+ {
2367
+ "epoch": 55.01,
2368
+ "learning_rate": 2.4424424424424424e-05,
2369
+ "loss": 0.4335,
2370
+ "step": 3110
2371
+ },
2372
+ {
2373
+ "epoch": 55.01,
2374
+ "learning_rate": 2.4324324324324327e-05,
2375
+ "loss": 0.5474,
2376
+ "step": 3120
2377
+ },
2378
+ {
2379
+ "epoch": 55.01,
2380
+ "learning_rate": 2.4224224224224227e-05,
2381
+ "loss": 0.6075,
2382
+ "step": 3130
2383
+ },
2384
+ {
2385
+ "epoch": 55.01,
2386
+ "eval_accuracy": 0.6129032258064516,
2387
+ "eval_loss": 1.0466532707214355,
2388
+ "eval_runtime": 164.2606,
2389
+ "eval_samples_per_second": 1.321,
2390
+ "eval_steps_per_second": 0.17,
2391
+ "step": 3136
2392
+ },
2393
+ {
2394
+ "epoch": 56.0,
2395
+ "learning_rate": 2.4124124124124124e-05,
2396
+ "loss": 0.4485,
2397
+ "step": 3140
2398
+ },
2399
+ {
2400
+ "epoch": 56.0,
2401
+ "learning_rate": 2.4024024024024024e-05,
2402
+ "loss": 0.5252,
2403
+ "step": 3150
2404
+ },
2405
+ {
2406
+ "epoch": 56.0,
2407
+ "learning_rate": 2.3923923923923924e-05,
2408
+ "loss": 0.6121,
2409
+ "step": 3160
2410
+ },
2411
+ {
2412
+ "epoch": 56.01,
2413
+ "learning_rate": 2.3823823823823824e-05,
2414
+ "loss": 0.3327,
2415
+ "step": 3170
2416
+ },
2417
+ {
2418
+ "epoch": 56.01,
2419
+ "learning_rate": 2.3723723723723725e-05,
2420
+ "loss": 0.4494,
2421
+ "step": 3180
2422
+ },
2423
+ {
2424
+ "epoch": 56.01,
2425
+ "learning_rate": 2.3623623623623625e-05,
2426
+ "loss": 0.4474,
2427
+ "step": 3190
2428
+ },
2429
+ {
2430
+ "epoch": 56.01,
2431
+ "eval_accuracy": 0.6866359447004609,
2432
+ "eval_loss": 0.8436079025268555,
2433
+ "eval_runtime": 167.7622,
2434
+ "eval_samples_per_second": 1.293,
2435
+ "eval_steps_per_second": 0.167,
2436
+ "step": 3192
2437
+ },
2438
+ {
2439
+ "epoch": 57.0,
2440
+ "learning_rate": 2.3523523523523525e-05,
2441
+ "loss": 0.3922,
2442
+ "step": 3200
2443
+ },
2444
+ {
2445
+ "epoch": 57.0,
2446
+ "learning_rate": 2.3423423423423425e-05,
2447
+ "loss": 0.5241,
2448
+ "step": 3210
2449
+ },
2450
+ {
2451
+ "epoch": 57.01,
2452
+ "learning_rate": 2.3323323323323322e-05,
2453
+ "loss": 0.3707,
2454
+ "step": 3220
2455
+ },
2456
+ {
2457
+ "epoch": 57.01,
2458
+ "learning_rate": 2.3223223223223225e-05,
2459
+ "loss": 0.6444,
2460
+ "step": 3230
2461
+ },
2462
+ {
2463
+ "epoch": 57.01,
2464
+ "learning_rate": 2.3123123123123125e-05,
2465
+ "loss": 0.3711,
2466
+ "step": 3240
2467
+ },
2468
+ {
2469
+ "epoch": 57.01,
2470
+ "eval_accuracy": 0.6774193548387096,
2471
+ "eval_loss": 0.8994067311286926,
2472
+ "eval_runtime": 173.6457,
2473
+ "eval_samples_per_second": 1.25,
2474
+ "eval_steps_per_second": 0.161,
2475
+ "step": 3248
2476
+ },
2477
+ {
2478
+ "epoch": 58.0,
2479
+ "learning_rate": 2.3023023023023026e-05,
2480
+ "loss": 0.4191,
2481
+ "step": 3250
2482
+ },
2483
+ {
2484
+ "epoch": 58.0,
2485
+ "learning_rate": 2.2922922922922922e-05,
2486
+ "loss": 0.4935,
2487
+ "step": 3260
2488
+ },
2489
+ {
2490
+ "epoch": 58.0,
2491
+ "learning_rate": 2.2822822822822822e-05,
2492
+ "loss": 0.56,
2493
+ "step": 3270
2494
+ },
2495
+ {
2496
+ "epoch": 58.01,
2497
+ "learning_rate": 2.2722722722722726e-05,
2498
+ "loss": 0.283,
2499
+ "step": 3280
2500
+ },
2501
+ {
2502
+ "epoch": 58.01,
2503
+ "learning_rate": 2.2622622622622626e-05,
2504
+ "loss": 0.4957,
2505
+ "step": 3290
2506
+ },
2507
+ {
2508
+ "epoch": 58.01,
2509
+ "learning_rate": 2.2522522522522523e-05,
2510
+ "loss": 0.5279,
2511
+ "step": 3300
2512
+ },
2513
+ {
2514
+ "epoch": 58.01,
2515
+ "eval_accuracy": 0.7188940092165899,
2516
+ "eval_loss": 0.885903537273407,
2517
+ "eval_runtime": 176.7148,
2518
+ "eval_samples_per_second": 1.228,
2519
+ "eval_steps_per_second": 0.158,
2520
+ "step": 3304
2521
+ },
2522
+ {
2523
+ "epoch": 59.0,
2524
+ "learning_rate": 2.2422422422422423e-05,
2525
+ "loss": 0.4729,
2526
+ "step": 3310
2527
+ },
2528
+ {
2529
+ "epoch": 59.0,
2530
+ "learning_rate": 2.2322322322322323e-05,
2531
+ "loss": 0.3717,
2532
+ "step": 3320
2533
+ },
2534
+ {
2535
+ "epoch": 59.0,
2536
+ "learning_rate": 2.2222222222222223e-05,
2537
+ "loss": 0.538,
2538
+ "step": 3330
2539
+ },
2540
+ {
2541
+ "epoch": 59.01,
2542
+ "learning_rate": 2.2122122122122123e-05,
2543
+ "loss": 0.3848,
2544
+ "step": 3340
2545
+ },
2546
+ {
2547
+ "epoch": 59.01,
2548
+ "learning_rate": 2.2022022022022024e-05,
2549
+ "loss": 0.5991,
2550
+ "step": 3350
2551
+ },
2552
+ {
2553
+ "epoch": 59.01,
2554
+ "learning_rate": 2.1921921921921924e-05,
2555
+ "loss": 0.6032,
2556
+ "step": 3360
2557
+ },
2558
+ {
2559
+ "epoch": 59.01,
2560
+ "eval_accuracy": 0.6497695852534562,
2561
+ "eval_loss": 1.293091058731079,
2562
+ "eval_runtime": 170.7811,
2563
+ "eval_samples_per_second": 1.271,
2564
+ "eval_steps_per_second": 0.164,
2565
+ "step": 3360
2566
+ },
2567
+ {
2568
+ "epoch": 60.0,
2569
+ "learning_rate": 2.1821821821821824e-05,
2570
+ "loss": 0.419,
2571
+ "step": 3370
2572
+ },
2573
+ {
2574
+ "epoch": 60.0,
2575
+ "learning_rate": 2.172172172172172e-05,
2576
+ "loss": 0.3919,
2577
+ "step": 3380
2578
+ },
2579
+ {
2580
+ "epoch": 60.01,
2581
+ "learning_rate": 2.1621621621621624e-05,
2582
+ "loss": 0.38,
2583
+ "step": 3390
2584
+ },
2585
+ {
2586
+ "epoch": 60.01,
2587
+ "learning_rate": 2.1521521521521524e-05,
2588
+ "loss": 0.469,
2589
+ "step": 3400
2590
+ },
2591
+ {
2592
+ "epoch": 60.01,
2593
+ "learning_rate": 2.142142142142142e-05,
2594
+ "loss": 0.3282,
2595
+ "step": 3410
2596
+ },
2597
+ {
2598
+ "epoch": 60.01,
2599
+ "eval_accuracy": 0.7142857142857143,
2600
+ "eval_loss": 0.9435374140739441,
2601
+ "eval_runtime": 175.2331,
2602
+ "eval_samples_per_second": 1.238,
2603
+ "eval_steps_per_second": 0.16,
2604
+ "step": 3416
2605
+ },
2606
+ {
2607
+ "epoch": 61.0,
2608
+ "learning_rate": 2.132132132132132e-05,
2609
+ "loss": 0.4183,
2610
+ "step": 3420
2611
+ },
2612
+ {
2613
+ "epoch": 61.0,
2614
+ "learning_rate": 2.122122122122122e-05,
2615
+ "loss": 0.5195,
2616
+ "step": 3430
2617
+ },
2618
+ {
2619
+ "epoch": 61.0,
2620
+ "learning_rate": 2.1121121121121125e-05,
2621
+ "loss": 0.5613,
2622
+ "step": 3440
2623
+ },
2624
+ {
2625
+ "epoch": 61.01,
2626
+ "learning_rate": 2.102102102102102e-05,
2627
+ "loss": 0.5026,
2628
+ "step": 3450
2629
+ },
2630
+ {
2631
+ "epoch": 61.01,
2632
+ "learning_rate": 2.0920920920920922e-05,
2633
+ "loss": 0.4604,
2634
+ "step": 3460
2635
+ },
2636
+ {
2637
+ "epoch": 61.01,
2638
+ "learning_rate": 2.0820820820820822e-05,
2639
+ "loss": 0.3506,
2640
+ "step": 3470
2641
+ },
2642
+ {
2643
+ "epoch": 61.01,
2644
+ "eval_accuracy": 0.6728110599078341,
2645
+ "eval_loss": 1.0970582962036133,
2646
+ "eval_runtime": 167.732,
2647
+ "eval_samples_per_second": 1.294,
2648
+ "eval_steps_per_second": 0.167,
2649
+ "step": 3472
2650
+ },
2651
+ {
2652
+ "epoch": 62.0,
2653
+ "learning_rate": 2.0720720720720722e-05,
2654
+ "loss": 0.3312,
2655
+ "step": 3480
2656
+ },
2657
+ {
2658
+ "epoch": 62.0,
2659
+ "learning_rate": 2.062062062062062e-05,
2660
+ "loss": 0.2965,
2661
+ "step": 3490
2662
+ },
2663
+ {
2664
+ "epoch": 62.01,
2665
+ "learning_rate": 2.0520520520520522e-05,
2666
+ "loss": 0.396,
2667
+ "step": 3500
2668
+ },
2669
+ {
2670
+ "epoch": 62.01,
2671
+ "learning_rate": 2.0420420420420422e-05,
2672
+ "loss": 0.5294,
2673
+ "step": 3510
2674
+ },
2675
+ {
2676
+ "epoch": 62.01,
2677
+ "learning_rate": 2.0320320320320323e-05,
2678
+ "loss": 0.3169,
2679
+ "step": 3520
2680
+ },
2681
+ {
2682
+ "epoch": 62.01,
2683
+ "eval_accuracy": 0.7511520737327189,
2684
+ "eval_loss": 0.910149335861206,
2685
+ "eval_runtime": 165.7118,
2686
+ "eval_samples_per_second": 1.31,
2687
+ "eval_steps_per_second": 0.169,
2688
+ "step": 3528
2689
+ },
2690
+ {
2691
+ "epoch": 63.0,
2692
+ "learning_rate": 2.022022022022022e-05,
2693
+ "loss": 0.5156,
2694
+ "step": 3530
2695
+ },
2696
+ {
2697
+ "epoch": 63.0,
2698
+ "learning_rate": 2.012012012012012e-05,
2699
+ "loss": 0.4815,
2700
+ "step": 3540
2701
+ },
2702
+ {
2703
+ "epoch": 63.0,
2704
+ "learning_rate": 2.0020020020020023e-05,
2705
+ "loss": 0.3874,
2706
+ "step": 3550
2707
+ },
2708
+ {
2709
+ "epoch": 63.01,
2710
+ "learning_rate": 1.9919919919919923e-05,
2711
+ "loss": 0.5157,
2712
+ "step": 3560
2713
+ },
2714
+ {
2715
+ "epoch": 63.01,
2716
+ "learning_rate": 1.981981981981982e-05,
2717
+ "loss": 0.5023,
2718
+ "step": 3570
2719
+ },
2720
+ {
2721
+ "epoch": 63.01,
2722
+ "learning_rate": 1.971971971971972e-05,
2723
+ "loss": 0.438,
2724
+ "step": 3580
2725
+ },
2726
+ {
2727
+ "epoch": 63.01,
2728
+ "eval_accuracy": 0.6359447004608295,
2729
+ "eval_loss": 1.4072048664093018,
2730
+ "eval_runtime": 163.2516,
2731
+ "eval_samples_per_second": 1.329,
2732
+ "eval_steps_per_second": 0.172,
2733
+ "step": 3584
2734
+ },
2735
+ {
2736
+ "epoch": 64.0,
2737
+ "learning_rate": 1.961961961961962e-05,
2738
+ "loss": 0.4035,
2739
+ "step": 3590
2740
+ },
2741
+ {
2742
+ "epoch": 64.0,
2743
+ "learning_rate": 1.951951951951952e-05,
2744
+ "loss": 0.2573,
2745
+ "step": 3600
2746
+ },
2747
+ {
2748
+ "epoch": 64.0,
2749
+ "learning_rate": 1.941941941941942e-05,
2750
+ "loss": 0.3321,
2751
+ "step": 3610
2752
+ },
2753
+ {
2754
+ "epoch": 64.01,
2755
+ "learning_rate": 1.931931931931932e-05,
2756
+ "loss": 0.6128,
2757
+ "step": 3620
2758
+ },
2759
+ {
2760
+ "epoch": 64.01,
2761
+ "learning_rate": 1.921921921921922e-05,
2762
+ "loss": 0.5059,
2763
+ "step": 3630
2764
+ },
2765
+ {
2766
+ "epoch": 64.01,
2767
+ "learning_rate": 1.911911911911912e-05,
2768
+ "loss": 0.5208,
2769
+ "step": 3640
2770
+ },
2771
+ {
2772
+ "epoch": 64.01,
2773
+ "eval_accuracy": 0.6543778801843319,
2774
+ "eval_loss": 1.2648274898529053,
2775
+ "eval_runtime": 163.0954,
2776
+ "eval_samples_per_second": 1.331,
2777
+ "eval_steps_per_second": 0.172,
2778
+ "step": 3640
2779
+ },
2780
+ {
2781
+ "epoch": 65.0,
2782
+ "learning_rate": 1.9019019019019018e-05,
2783
+ "loss": 0.3918,
2784
+ "step": 3650
2785
+ },
2786
+ {
2787
+ "epoch": 65.0,
2788
+ "learning_rate": 1.891891891891892e-05,
2789
+ "loss": 0.407,
2790
+ "step": 3660
2791
+ },
2792
+ {
2793
+ "epoch": 65.01,
2794
+ "learning_rate": 1.881881881881882e-05,
2795
+ "loss": 0.3251,
2796
+ "step": 3670
2797
+ },
2798
+ {
2799
+ "epoch": 65.01,
2800
+ "learning_rate": 1.8718718718718718e-05,
2801
+ "loss": 0.4584,
2802
+ "step": 3680
2803
+ },
2804
+ {
2805
+ "epoch": 65.01,
2806
+ "learning_rate": 1.8618618618618618e-05,
2807
+ "loss": 0.4563,
2808
+ "step": 3690
2809
+ },
2810
+ {
2811
+ "epoch": 65.01,
2812
+ "eval_accuracy": 0.6497695852534562,
2813
+ "eval_loss": 1.1162357330322266,
2814
+ "eval_runtime": 163.7463,
2815
+ "eval_samples_per_second": 1.325,
2816
+ "eval_steps_per_second": 0.171,
2817
+ "step": 3696
2818
+ },
2819
+ {
2820
+ "epoch": 66.0,
2821
+ "learning_rate": 1.8518518518518518e-05,
2822
+ "loss": 0.2968,
2823
+ "step": 3700
2824
+ },
2825
+ {
2826
+ "epoch": 66.0,
2827
+ "learning_rate": 1.8418418418418422e-05,
2828
+ "loss": 0.6683,
2829
+ "step": 3710
2830
+ },
2831
+ {
2832
+ "epoch": 66.0,
2833
+ "learning_rate": 1.831831831831832e-05,
2834
+ "loss": 0.4281,
2835
+ "step": 3720
2836
+ },
2837
+ {
2838
+ "epoch": 66.01,
2839
+ "learning_rate": 1.821821821821822e-05,
2840
+ "loss": 0.2642,
2841
+ "step": 3730
2842
+ },
2843
+ {
2844
+ "epoch": 66.01,
2845
+ "learning_rate": 1.811811811811812e-05,
2846
+ "loss": 0.4064,
2847
+ "step": 3740
2848
+ },
2849
+ {
2850
+ "epoch": 66.01,
2851
+ "learning_rate": 1.801801801801802e-05,
2852
+ "loss": 0.6693,
2853
+ "step": 3750
2854
+ },
2855
+ {
2856
+ "epoch": 66.01,
2857
+ "eval_accuracy": 0.5576036866359447,
2858
+ "eval_loss": 1.8557851314544678,
2859
+ "eval_runtime": 162.9605,
2860
+ "eval_samples_per_second": 1.332,
2861
+ "eval_steps_per_second": 0.172,
2862
+ "step": 3752
2863
+ },
2864
+ {
2865
+ "epoch": 67.0,
2866
+ "learning_rate": 1.7917917917917916e-05,
2867
+ "loss": 0.3636,
2868
+ "step": 3760
2869
+ },
2870
+ {
2871
+ "epoch": 67.0,
2872
+ "learning_rate": 1.781781781781782e-05,
2873
+ "loss": 0.3325,
2874
+ "step": 3770
2875
+ },
2876
+ {
2877
+ "epoch": 67.01,
2878
+ "learning_rate": 1.771771771771772e-05,
2879
+ "loss": 0.5104,
2880
+ "step": 3780
2881
+ },
2882
+ {
2883
+ "epoch": 67.01,
2884
+ "learning_rate": 1.761761761761762e-05,
2885
+ "loss": 0.3475,
2886
+ "step": 3790
2887
+ },
2888
+ {
2889
+ "epoch": 67.01,
2890
+ "learning_rate": 1.7517517517517516e-05,
2891
+ "loss": 0.5599,
2892
+ "step": 3800
2893
+ },
2894
+ {
2895
+ "epoch": 67.01,
2896
+ "eval_accuracy": 0.5391705069124424,
2897
+ "eval_loss": 1.6573548316955566,
2898
+ "eval_runtime": 163.7298,
2899
+ "eval_samples_per_second": 1.325,
2900
+ "eval_steps_per_second": 0.171,
2901
+ "step": 3808
2902
+ },
2903
+ {
2904
+ "epoch": 68.0,
2905
+ "learning_rate": 1.7417417417417416e-05,
2906
+ "loss": 0.385,
2907
+ "step": 3810
2908
+ },
2909
+ {
2910
+ "epoch": 68.0,
2911
+ "learning_rate": 1.731731731731732e-05,
2912
+ "loss": 0.3942,
2913
+ "step": 3820
2914
+ },
2915
+ {
2916
+ "epoch": 68.0,
2917
+ "learning_rate": 1.721721721721722e-05,
2918
+ "loss": 0.4411,
2919
+ "step": 3830
2920
+ },
2921
+ {
2922
+ "epoch": 68.01,
2923
+ "learning_rate": 1.7117117117117117e-05,
2924
+ "loss": 0.5076,
2925
+ "step": 3840
2926
+ },
2927
+ {
2928
+ "epoch": 68.01,
2929
+ "learning_rate": 1.7017017017017017e-05,
2930
+ "loss": 0.49,
2931
+ "step": 3850
2932
+ },
2933
+ {
2934
+ "epoch": 68.01,
2935
+ "learning_rate": 1.6916916916916917e-05,
2936
+ "loss": 0.4751,
2937
+ "step": 3860
2938
+ },
2939
+ {
2940
+ "epoch": 68.01,
2941
+ "eval_accuracy": 0.6129032258064516,
2942
+ "eval_loss": 1.188300609588623,
2943
+ "eval_runtime": 164.9656,
2944
+ "eval_samples_per_second": 1.315,
2945
+ "eval_steps_per_second": 0.17,
2946
+ "step": 3864
2947
+ },
2948
+ {
2949
+ "epoch": 69.0,
2950
+ "learning_rate": 1.6816816816816817e-05,
2951
+ "loss": 0.4068,
2952
+ "step": 3870
2953
+ },
2954
+ {
2955
+ "epoch": 69.0,
2956
+ "learning_rate": 1.6716716716716717e-05,
2957
+ "loss": 0.3716,
2958
+ "step": 3880
2959
+ },
2960
+ {
2961
+ "epoch": 69.0,
2962
+ "learning_rate": 1.6616616616616618e-05,
2963
+ "loss": 0.5685,
2964
+ "step": 3890
2965
+ },
2966
+ {
2967
+ "epoch": 69.01,
2968
+ "learning_rate": 1.6516516516516518e-05,
2969
+ "loss": 0.4182,
2970
+ "step": 3900
2971
+ },
2972
+ {
2973
+ "epoch": 69.01,
2974
+ "learning_rate": 1.6416416416416418e-05,
2975
+ "loss": 0.542,
2976
+ "step": 3910
2977
+ },
2978
+ {
2979
+ "epoch": 69.01,
2980
+ "learning_rate": 1.6316316316316315e-05,
2981
+ "loss": 0.6489,
2982
+ "step": 3920
2983
+ },
2984
+ {
2985
+ "epoch": 69.01,
2986
+ "eval_accuracy": 0.6129032258064516,
2987
+ "eval_loss": 1.2733248472213745,
2988
+ "eval_runtime": 172.0622,
2989
+ "eval_samples_per_second": 1.261,
2990
+ "eval_steps_per_second": 0.163,
2991
+ "step": 3920
2992
+ },
2993
+ {
2994
+ "epoch": 70.0,
2995
+ "learning_rate": 1.6216216216216218e-05,
2996
+ "loss": 0.3482,
2997
+ "step": 3930
2998
+ },
2999
+ {
3000
+ "epoch": 70.0,
3001
+ "learning_rate": 1.6116116116116118e-05,
3002
+ "loss": 0.4185,
3003
+ "step": 3940
3004
+ },
3005
+ {
3006
+ "epoch": 70.01,
3007
+ "learning_rate": 1.6016016016016015e-05,
3008
+ "loss": 0.3264,
3009
+ "step": 3950
3010
+ },
3011
+ {
3012
+ "epoch": 70.01,
3013
+ "learning_rate": 1.5915915915915915e-05,
3014
+ "loss": 0.3664,
3015
+ "step": 3960
3016
+ },
3017
+ {
3018
+ "epoch": 70.01,
3019
+ "learning_rate": 1.5815815815815815e-05,
3020
+ "loss": 0.4229,
3021
+ "step": 3970
3022
+ },
3023
+ {
3024
+ "epoch": 70.01,
3025
+ "eval_accuracy": 0.6682027649769585,
3026
+ "eval_loss": 1.0993900299072266,
3027
+ "eval_runtime": 169.5904,
3028
+ "eval_samples_per_second": 1.28,
3029
+ "eval_steps_per_second": 0.165,
3030
+ "step": 3976
3031
+ },
3032
+ {
3033
+ "epoch": 71.0,
3034
+ "learning_rate": 1.571571571571572e-05,
3035
+ "loss": 0.2035,
3036
+ "step": 3980
3037
+ },
3038
+ {
3039
+ "epoch": 71.0,
3040
+ "learning_rate": 1.5615615615615616e-05,
3041
+ "loss": 0.4886,
3042
+ "step": 3990
3043
+ },
3044
+ {
3045
+ "epoch": 71.0,
3046
+ "learning_rate": 1.5515515515515516e-05,
3047
+ "loss": 0.4031,
3048
+ "step": 4000
3049
+ },
3050
+ {
3051
+ "epoch": 71.01,
3052
+ "learning_rate": 1.5415415415415416e-05,
3053
+ "loss": 0.3078,
3054
+ "step": 4010
3055
+ },
3056
+ {
3057
+ "epoch": 71.01,
3058
+ "learning_rate": 1.5315315315315316e-05,
3059
+ "loss": 0.5506,
3060
+ "step": 4020
3061
+ },
3062
+ {
3063
+ "epoch": 71.01,
3064
+ "learning_rate": 1.5215215215215218e-05,
3065
+ "loss": 0.4194,
3066
+ "step": 4030
3067
+ },
3068
+ {
3069
+ "epoch": 71.01,
3070
+ "eval_accuracy": 0.6175115207373272,
3071
+ "eval_loss": 1.1464142799377441,
3072
+ "eval_runtime": 176.0399,
3073
+ "eval_samples_per_second": 1.233,
3074
+ "eval_steps_per_second": 0.159,
3075
+ "step": 4032
3076
+ },
3077
+ {
3078
+ "epoch": 72.0,
3079
+ "learning_rate": 1.5115115115115116e-05,
3080
+ "loss": 0.3864,
3081
+ "step": 4040
3082
+ },
3083
+ {
3084
+ "epoch": 72.0,
3085
+ "learning_rate": 1.5015015015015016e-05,
3086
+ "loss": 0.5229,
3087
+ "step": 4050
3088
+ },
3089
+ {
3090
+ "epoch": 72.01,
3091
+ "learning_rate": 1.4914914914914915e-05,
3092
+ "loss": 0.4735,
3093
+ "step": 4060
3094
+ },
3095
+ {
3096
+ "epoch": 72.01,
3097
+ "learning_rate": 1.4814814814814815e-05,
3098
+ "loss": 0.2941,
3099
+ "step": 4070
3100
+ },
3101
+ {
3102
+ "epoch": 72.01,
3103
+ "learning_rate": 1.4714714714714713e-05,
3104
+ "loss": 0.2121,
3105
+ "step": 4080
3106
+ },
3107
+ {
3108
+ "epoch": 72.01,
3109
+ "eval_accuracy": 0.6175115207373272,
3110
+ "eval_loss": 1.179811716079712,
3111
+ "eval_runtime": 174.9502,
3112
+ "eval_samples_per_second": 1.24,
3113
+ "eval_steps_per_second": 0.16,
3114
+ "step": 4088
3115
+ },
3116
+ {
3117
+ "epoch": 73.0,
3118
+ "learning_rate": 1.4614614614614617e-05,
3119
+ "loss": 0.3443,
3120
+ "step": 4090
3121
+ },
3122
+ {
3123
+ "epoch": 73.0,
3124
+ "learning_rate": 1.4514514514514515e-05,
3125
+ "loss": 0.4008,
3126
+ "step": 4100
3127
+ },
3128
+ {
3129
+ "epoch": 73.0,
3130
+ "learning_rate": 1.4414414414414416e-05,
3131
+ "loss": 0.4189,
3132
+ "step": 4110
3133
+ },
3134
+ {
3135
+ "epoch": 73.01,
3136
+ "learning_rate": 1.4314314314314314e-05,
3137
+ "loss": 0.5514,
3138
+ "step": 4120
3139
+ },
3140
+ {
3141
+ "epoch": 73.01,
3142
+ "learning_rate": 1.4214214214214214e-05,
3143
+ "loss": 0.3123,
3144
+ "step": 4130
3145
+ },
3146
+ {
3147
+ "epoch": 73.01,
3148
+ "learning_rate": 1.4114114114114116e-05,
3149
+ "loss": 0.4106,
3150
+ "step": 4140
3151
+ },
3152
+ {
3153
+ "epoch": 73.01,
3154
+ "eval_accuracy": 0.5806451612903226,
3155
+ "eval_loss": 1.329359531402588,
3156
+ "eval_runtime": 169.6509,
3157
+ "eval_samples_per_second": 1.279,
3158
+ "eval_steps_per_second": 0.165,
3159
+ "step": 4144
3160
+ },
3161
+ {
3162
+ "epoch": 74.0,
3163
+ "learning_rate": 1.4014014014014016e-05,
3164
+ "loss": 0.2142,
3165
+ "step": 4150
3166
+ },
3167
+ {
3168
+ "epoch": 74.0,
3169
+ "learning_rate": 1.3913913913913915e-05,
3170
+ "loss": 0.3547,
3171
+ "step": 4160
3172
+ },
3173
+ {
3174
+ "epoch": 74.0,
3175
+ "learning_rate": 1.3813813813813815e-05,
3176
+ "loss": 0.2571,
3177
+ "step": 4170
3178
+ },
3179
+ {
3180
+ "epoch": 74.01,
3181
+ "learning_rate": 1.3713713713713713e-05,
3182
+ "loss": 0.4924,
3183
+ "step": 4180
3184
+ },
3185
+ {
3186
+ "epoch": 74.01,
3187
+ "learning_rate": 1.3613613613613615e-05,
3188
+ "loss": 0.2901,
3189
+ "step": 4190
3190
+ },
3191
+ {
3192
+ "epoch": 74.01,
3193
+ "learning_rate": 1.3513513513513515e-05,
3194
+ "loss": 0.3962,
3195
+ "step": 4200
3196
+ },
3197
+ {
3198
+ "epoch": 74.01,
3199
+ "eval_accuracy": 0.6359447004608295,
3200
+ "eval_loss": 1.4209370613098145,
3201
+ "eval_runtime": 169.3625,
3202
+ "eval_samples_per_second": 1.281,
3203
+ "eval_steps_per_second": 0.165,
3204
+ "step": 4200
3205
+ },
3206
+ {
3207
+ "epoch": 75.0,
3208
+ "learning_rate": 1.3413413413413414e-05,
3209
+ "loss": 0.3685,
3210
+ "step": 4210
3211
+ },
3212
+ {
3213
+ "epoch": 75.0,
3214
+ "learning_rate": 1.3313313313313314e-05,
3215
+ "loss": 0.2953,
3216
+ "step": 4220
3217
+ },
3218
+ {
3219
+ "epoch": 75.01,
3220
+ "learning_rate": 1.3213213213213214e-05,
3221
+ "loss": 0.3889,
3222
+ "step": 4230
3223
+ },
3224
+ {
3225
+ "epoch": 75.01,
3226
+ "learning_rate": 1.3113113113113112e-05,
3227
+ "loss": 0.4059,
3228
+ "step": 4240
3229
+ },
3230
+ {
3231
+ "epoch": 75.01,
3232
+ "learning_rate": 1.3013013013013014e-05,
3233
+ "loss": 0.2963,
3234
+ "step": 4250
3235
+ },
3236
+ {
3237
+ "epoch": 75.01,
3238
+ "eval_accuracy": 0.5944700460829493,
3239
+ "eval_loss": 1.5015925168991089,
3240
+ "eval_runtime": 162.4547,
3241
+ "eval_samples_per_second": 1.336,
3242
+ "eval_steps_per_second": 0.172,
3243
+ "step": 4256
3244
+ },
3245
+ {
3246
+ "epoch": 76.0,
3247
+ "learning_rate": 1.2912912912912914e-05,
3248
+ "loss": 0.2218,
3249
+ "step": 4260
3250
+ },
3251
+ {
3252
+ "epoch": 76.0,
3253
+ "learning_rate": 1.2812812812812813e-05,
3254
+ "loss": 0.4315,
3255
+ "step": 4270
3256
+ },
3257
+ {
3258
+ "epoch": 76.0,
3259
+ "learning_rate": 1.2712712712712713e-05,
3260
+ "loss": 0.4317,
3261
+ "step": 4280
3262
+ },
3263
+ {
3264
+ "epoch": 76.01,
3265
+ "learning_rate": 1.2612612612612611e-05,
3266
+ "loss": 0.4733,
3267
+ "step": 4290
3268
+ },
3269
+ {
3270
+ "epoch": 76.01,
3271
+ "learning_rate": 1.2512512512512515e-05,
3272
+ "loss": 0.4233,
3273
+ "step": 4300
3274
+ },
3275
+ {
3276
+ "epoch": 76.01,
3277
+ "learning_rate": 1.2412412412412413e-05,
3278
+ "loss": 0.5436,
3279
+ "step": 4310
3280
+ },
3281
+ {
3282
+ "epoch": 76.01,
3283
+ "eval_accuracy": 0.5483870967741935,
3284
+ "eval_loss": 1.5647104978561401,
3285
+ "eval_runtime": 164.1354,
3286
+ "eval_samples_per_second": 1.322,
3287
+ "eval_steps_per_second": 0.171,
3288
+ "step": 4312
3289
+ },
3290
+ {
3291
+ "epoch": 77.0,
3292
+ "learning_rate": 1.2312312312312313e-05,
3293
+ "loss": 0.3,
3294
+ "step": 4320
3295
+ },
3296
+ {
3297
+ "epoch": 77.0,
3298
+ "learning_rate": 1.2212212212212212e-05,
3299
+ "loss": 0.3433,
3300
+ "step": 4330
3301
+ },
3302
+ {
3303
+ "epoch": 77.01,
3304
+ "learning_rate": 1.2112112112112114e-05,
3305
+ "loss": 0.2487,
3306
+ "step": 4340
3307
+ },
3308
+ {
3309
+ "epoch": 77.01,
3310
+ "learning_rate": 1.2012012012012012e-05,
3311
+ "loss": 0.4711,
3312
+ "step": 4350
3313
+ },
3314
+ {
3315
+ "epoch": 77.01,
3316
+ "learning_rate": 1.1911911911911912e-05,
3317
+ "loss": 0.4115,
3318
+ "step": 4360
3319
+ },
3320
+ {
3321
+ "epoch": 77.01,
3322
+ "eval_accuracy": 0.6036866359447005,
3323
+ "eval_loss": 1.4308525323867798,
3324
+ "eval_runtime": 160.3268,
3325
+ "eval_samples_per_second": 1.353,
3326
+ "eval_steps_per_second": 0.175,
3327
+ "step": 4368
3328
+ },
3329
+ {
3330
+ "epoch": 78.0,
3331
+ "learning_rate": 1.1811811811811812e-05,
3332
+ "loss": 0.278,
3333
+ "step": 4370
3334
+ },
3335
+ {
3336
+ "epoch": 78.0,
3337
+ "learning_rate": 1.1711711711711713e-05,
3338
+ "loss": 0.3355,
3339
+ "step": 4380
3340
+ },
3341
+ {
3342
+ "epoch": 78.0,
3343
+ "learning_rate": 1.1611611611611613e-05,
3344
+ "loss": 0.3502,
3345
+ "step": 4390
3346
+ },
3347
+ {
3348
+ "epoch": 78.01,
3349
+ "learning_rate": 1.1511511511511513e-05,
3350
+ "loss": 0.3967,
3351
+ "step": 4400
3352
+ },
3353
+ {
3354
+ "epoch": 78.01,
3355
+ "learning_rate": 1.1411411411411411e-05,
3356
+ "loss": 0.3376,
3357
+ "step": 4410
3358
+ },
3359
+ {
3360
+ "epoch": 78.01,
3361
+ "learning_rate": 1.1311311311311313e-05,
3362
+ "loss": 0.1635,
3363
+ "step": 4420
3364
+ },
3365
+ {
3366
+ "epoch": 78.01,
3367
+ "eval_accuracy": 0.6451612903225806,
3368
+ "eval_loss": 1.3660060167312622,
3369
+ "eval_runtime": 163.3833,
3370
+ "eval_samples_per_second": 1.328,
3371
+ "eval_steps_per_second": 0.171,
3372
+ "step": 4424
3373
+ },
3374
+ {
3375
+ "epoch": 79.0,
3376
+ "learning_rate": 1.1211211211211212e-05,
3377
+ "loss": 0.3182,
3378
+ "step": 4430
3379
+ },
3380
+ {
3381
+ "epoch": 79.0,
3382
+ "learning_rate": 1.1111111111111112e-05,
3383
+ "loss": 0.3935,
3384
+ "step": 4440
3385
+ },
3386
+ {
3387
+ "epoch": 79.0,
3388
+ "learning_rate": 1.1011011011011012e-05,
3389
+ "loss": 0.3312,
3390
+ "step": 4450
3391
+ },
3392
+ {
3393
+ "epoch": 79.01,
3394
+ "learning_rate": 1.0910910910910912e-05,
3395
+ "loss": 0.4817,
3396
+ "step": 4460
3397
+ },
3398
+ {
3399
+ "epoch": 79.01,
3400
+ "learning_rate": 1.0810810810810812e-05,
3401
+ "loss": 0.3807,
3402
+ "step": 4470
3403
+ },
3404
+ {
3405
+ "epoch": 79.01,
3406
+ "learning_rate": 1.071071071071071e-05,
3407
+ "loss": 0.2931,
3408
+ "step": 4480
3409
+ },
3410
+ {
3411
+ "epoch": 79.01,
3412
+ "eval_accuracy": 0.6497695852534562,
3413
+ "eval_loss": 1.3298723697662354,
3414
+ "eval_runtime": 163.0047,
3415
+ "eval_samples_per_second": 1.331,
3416
+ "eval_steps_per_second": 0.172,
3417
+ "step": 4480
3418
+ },
3419
+ {
3420
+ "epoch": 80.0,
3421
+ "learning_rate": 1.061061061061061e-05,
3422
+ "loss": 0.2781,
3423
+ "step": 4490
3424
+ },
3425
+ {
3426
+ "epoch": 80.0,
3427
+ "learning_rate": 1.051051051051051e-05,
3428
+ "loss": 0.258,
3429
+ "step": 4500
3430
+ },
3431
+ {
3432
+ "epoch": 80.01,
3433
+ "learning_rate": 1.0410410410410411e-05,
3434
+ "loss": 0.4798,
3435
+ "step": 4510
3436
+ },
3437
+ {
3438
+ "epoch": 80.01,
3439
+ "learning_rate": 1.031031031031031e-05,
3440
+ "loss": 0.4413,
3441
+ "step": 4520
3442
+ },
3443
+ {
3444
+ "epoch": 80.01,
3445
+ "learning_rate": 1.0210210210210211e-05,
3446
+ "loss": 0.5154,
3447
+ "step": 4530
3448
+ },
3449
+ {
3450
+ "epoch": 80.01,
3451
+ "eval_accuracy": 0.5806451612903226,
3452
+ "eval_loss": 1.6550365686416626,
3453
+ "eval_runtime": 166.6749,
3454
+ "eval_samples_per_second": 1.302,
3455
+ "eval_steps_per_second": 0.168,
3456
+ "step": 4536
3457
+ },
3458
+ {
3459
+ "epoch": 81.0,
3460
+ "learning_rate": 1.011011011011011e-05,
3461
+ "loss": 0.5175,
3462
+ "step": 4540
3463
+ },
3464
+ {
3465
+ "epoch": 81.0,
3466
+ "learning_rate": 1.0010010010010011e-05,
3467
+ "loss": 0.3942,
3468
+ "step": 4550
3469
+ },
3470
+ {
3471
+ "epoch": 81.0,
3472
+ "learning_rate": 9.90990990990991e-06,
3473
+ "loss": 0.2373,
3474
+ "step": 4560
3475
+ },
3476
+ {
3477
+ "epoch": 81.01,
3478
+ "learning_rate": 9.80980980980981e-06,
3479
+ "loss": 0.3044,
3480
+ "step": 4570
3481
+ },
3482
+ {
3483
+ "epoch": 81.01,
3484
+ "learning_rate": 9.70970970970971e-06,
3485
+ "loss": 0.229,
3486
+ "step": 4580
3487
+ },
3488
+ {
3489
+ "epoch": 81.01,
3490
+ "learning_rate": 9.60960960960961e-06,
3491
+ "loss": 0.2993,
3492
+ "step": 4590
3493
+ },
3494
+ {
3495
+ "epoch": 81.01,
3496
+ "eval_accuracy": 0.5990783410138248,
3497
+ "eval_loss": 1.6520466804504395,
3498
+ "eval_runtime": 169.6637,
3499
+ "eval_samples_per_second": 1.279,
3500
+ "eval_steps_per_second": 0.165,
3501
+ "step": 4592
3502
+ },
3503
+ {
3504
+ "epoch": 82.0,
3505
+ "learning_rate": 9.509509509509509e-06,
3506
+ "loss": 0.4553,
3507
+ "step": 4600
3508
+ },
3509
+ {
3510
+ "epoch": 82.0,
3511
+ "learning_rate": 9.40940940940941e-06,
3512
+ "loss": 0.338,
3513
+ "step": 4610
3514
+ },
3515
+ {
3516
+ "epoch": 82.01,
3517
+ "learning_rate": 9.309309309309309e-06,
3518
+ "loss": 0.2887,
3519
+ "step": 4620
3520
+ },
3521
+ {
3522
+ "epoch": 82.01,
3523
+ "learning_rate": 9.209209209209211e-06,
3524
+ "loss": 0.2972,
3525
+ "step": 4630
3526
+ },
3527
+ {
3528
+ "epoch": 82.01,
3529
+ "learning_rate": 9.10910910910911e-06,
3530
+ "loss": 0.4391,
3531
+ "step": 4640
3532
+ },
3533
+ {
3534
+ "epoch": 82.01,
3535
+ "eval_accuracy": 0.6405529953917051,
3536
+ "eval_loss": 1.3823057413101196,
3537
+ "eval_runtime": 174.0025,
3538
+ "eval_samples_per_second": 1.247,
3539
+ "eval_steps_per_second": 0.161,
3540
+ "step": 4648
3541
+ },
3542
+ {
3543
+ "epoch": 83.0,
3544
+ "learning_rate": 9.00900900900901e-06,
3545
+ "loss": 0.5297,
3546
+ "step": 4650
3547
+ },
3548
+ {
3549
+ "epoch": 83.0,
3550
+ "learning_rate": 8.90890890890891e-06,
3551
+ "loss": 0.4287,
3552
+ "step": 4660
3553
+ },
3554
+ {
3555
+ "epoch": 83.0,
3556
+ "learning_rate": 8.80880880880881e-06,
3557
+ "loss": 0.3487,
3558
+ "step": 4670
3559
+ },
3560
+ {
3561
+ "epoch": 83.01,
3562
+ "learning_rate": 8.708708708708708e-06,
3563
+ "loss": 0.3783,
3564
+ "step": 4680
3565
+ },
3566
+ {
3567
+ "epoch": 83.01,
3568
+ "learning_rate": 8.60860860860861e-06,
3569
+ "loss": 0.4584,
3570
+ "step": 4690
3571
+ },
3572
+ {
3573
+ "epoch": 83.01,
3574
+ "learning_rate": 8.508508508508508e-06,
3575
+ "loss": 0.485,
3576
+ "step": 4700
3577
+ },
3578
+ {
3579
+ "epoch": 83.01,
3580
+ "eval_accuracy": 0.6036866359447005,
3581
+ "eval_loss": 1.4859918355941772,
3582
+ "eval_runtime": 169.7438,
3583
+ "eval_samples_per_second": 1.278,
3584
+ "eval_steps_per_second": 0.165,
3585
+ "step": 4704
3586
+ },
3587
+ {
3588
+ "epoch": 84.0,
3589
+ "learning_rate": 8.408408408408409e-06,
3590
+ "loss": 0.3618,
3591
+ "step": 4710
3592
+ },
3593
+ {
3594
+ "epoch": 84.0,
3595
+ "learning_rate": 8.308308308308309e-06,
3596
+ "loss": 0.3709,
3597
+ "step": 4720
3598
+ },
3599
+ {
3600
+ "epoch": 84.0,
3601
+ "learning_rate": 8.208208208208209e-06,
3602
+ "loss": 0.3175,
3603
+ "step": 4730
3604
+ },
3605
+ {
3606
+ "epoch": 84.01,
3607
+ "learning_rate": 8.108108108108109e-06,
3608
+ "loss": 0.305,
3609
+ "step": 4740
3610
+ },
3611
+ {
3612
+ "epoch": 84.01,
3613
+ "learning_rate": 8.008008008008007e-06,
3614
+ "loss": 0.4123,
3615
+ "step": 4750
3616
+ },
3617
+ {
3618
+ "epoch": 84.01,
3619
+ "learning_rate": 7.907907907907908e-06,
3620
+ "loss": 0.3313,
3621
+ "step": 4760
3622
+ },
3623
+ {
3624
+ "epoch": 84.01,
3625
+ "eval_accuracy": 0.6175115207373272,
3626
+ "eval_loss": 1.3875089883804321,
3627
+ "eval_runtime": 173.3931,
3628
+ "eval_samples_per_second": 1.251,
3629
+ "eval_steps_per_second": 0.161,
3630
+ "step": 4760
3631
+ },
3632
+ {
3633
+ "epoch": 85.0,
3634
+ "learning_rate": 7.807807807807808e-06,
3635
+ "loss": 0.3653,
3636
+ "step": 4770
3637
+ },
3638
+ {
3639
+ "epoch": 85.0,
3640
+ "learning_rate": 7.707707707707708e-06,
3641
+ "loss": 0.3763,
3642
+ "step": 4780
3643
+ },
3644
+ {
3645
+ "epoch": 85.01,
3646
+ "learning_rate": 7.607607607607609e-06,
3647
+ "loss": 0.5182,
3648
+ "step": 4790
3649
+ },
3650
+ {
3651
+ "epoch": 85.01,
3652
+ "learning_rate": 7.507507507507508e-06,
3653
+ "loss": 0.552,
3654
+ "step": 4800
3655
+ },
3656
+ {
3657
+ "epoch": 85.01,
3658
+ "learning_rate": 7.4074074074074075e-06,
3659
+ "loss": 0.4194,
3660
+ "step": 4810
3661
+ },
3662
+ {
3663
+ "epoch": 85.01,
3664
+ "eval_accuracy": 0.5898617511520737,
3665
+ "eval_loss": 1.4334131479263306,
3666
+ "eval_runtime": 172.5224,
3667
+ "eval_samples_per_second": 1.258,
3668
+ "eval_steps_per_second": 0.162,
3669
+ "step": 4816
3670
+ },
3671
+ {
3672
+ "epoch": 86.0,
3673
+ "learning_rate": 7.3073073073073085e-06,
3674
+ "loss": 0.3314,
3675
+ "step": 4820
3676
+ },
3677
+ {
3678
+ "epoch": 86.0,
3679
+ "learning_rate": 7.207207207207208e-06,
3680
+ "loss": 0.288,
3681
+ "step": 4830
3682
+ },
3683
+ {
3684
+ "epoch": 86.0,
3685
+ "learning_rate": 7.107107107107107e-06,
3686
+ "loss": 0.3408,
3687
+ "step": 4840
3688
+ },
3689
+ {
3690
+ "epoch": 86.01,
3691
+ "learning_rate": 7.007007007007008e-06,
3692
+ "loss": 0.3747,
3693
+ "step": 4850
3694
+ },
3695
+ {
3696
+ "epoch": 86.01,
3697
+ "learning_rate": 6.906906906906907e-06,
3698
+ "loss": 0.2595,
3699
+ "step": 4860
3700
+ },
3701
+ {
3702
+ "epoch": 86.01,
3703
+ "learning_rate": 6.8068068068068075e-06,
3704
+ "loss": 0.4515,
3705
+ "step": 4870
3706
+ },
3707
+ {
3708
+ "epoch": 86.01,
3709
+ "eval_accuracy": 0.5990783410138248,
3710
+ "eval_loss": 1.6488864421844482,
3711
+ "eval_runtime": 168.4269,
3712
+ "eval_samples_per_second": 1.288,
3713
+ "eval_steps_per_second": 0.166,
3714
+ "step": 4872
3715
+ },
3716
+ {
3717
+ "epoch": 87.0,
3718
+ "learning_rate": 6.706706706706707e-06,
3719
+ "loss": 0.4239,
3720
+ "step": 4880
3721
+ },
3722
+ {
3723
+ "epoch": 87.0,
3724
+ "learning_rate": 6.606606606606607e-06,
3725
+ "loss": 0.2477,
3726
+ "step": 4890
3727
+ },
3728
+ {
3729
+ "epoch": 87.01,
3730
+ "learning_rate": 6.506506506506507e-06,
3731
+ "loss": 0.3937,
3732
+ "step": 4900
3733
+ },
3734
+ {
3735
+ "epoch": 87.01,
3736
+ "learning_rate": 6.406406406406406e-06,
3737
+ "loss": 0.4013,
3738
+ "step": 4910
3739
+ },
3740
+ {
3741
+ "epoch": 87.01,
3742
+ "learning_rate": 6.306306306306306e-06,
3743
+ "loss": 0.3283,
3744
+ "step": 4920
3745
+ },
3746
+ {
3747
+ "epoch": 87.01,
3748
+ "eval_accuracy": 0.6082949308755761,
3749
+ "eval_loss": 1.4548600912094116,
3750
+ "eval_runtime": 165.7438,
3751
+ "eval_samples_per_second": 1.309,
3752
+ "eval_steps_per_second": 0.169,
3753
+ "step": 4928
3754
+ },
3755
+ {
3756
+ "epoch": 88.0,
3757
+ "learning_rate": 6.206206206206207e-06,
3758
+ "loss": 0.3104,
3759
+ "step": 4930
3760
+ },
3761
+ {
3762
+ "epoch": 88.0,
3763
+ "learning_rate": 6.106106106106106e-06,
3764
+ "loss": 0.3391,
3765
+ "step": 4940
3766
+ },
3767
+ {
3768
+ "epoch": 88.0,
3769
+ "learning_rate": 6.006006006006006e-06,
3770
+ "loss": 0.3999,
3771
+ "step": 4950
3772
+ },
3773
+ {
3774
+ "epoch": 88.01,
3775
+ "learning_rate": 5.905905905905906e-06,
3776
+ "loss": 0.3572,
3777
+ "step": 4960
3778
+ },
3779
+ {
3780
+ "epoch": 88.01,
3781
+ "learning_rate": 5.805805805805806e-06,
3782
+ "loss": 0.4948,
3783
+ "step": 4970
3784
+ },
3785
+ {
3786
+ "epoch": 88.01,
3787
+ "learning_rate": 5.705705705705706e-06,
3788
+ "loss": 0.1914,
3789
+ "step": 4980
3790
+ },
3791
+ {
3792
+ "epoch": 88.01,
3793
+ "eval_accuracy": 0.6267281105990783,
3794
+ "eval_loss": 1.3415180444717407,
3795
+ "eval_runtime": 160.989,
3796
+ "eval_samples_per_second": 1.348,
3797
+ "eval_steps_per_second": 0.174,
3798
+ "step": 4984
3799
+ },
3800
+ {
3801
+ "epoch": 89.0,
3802
+ "learning_rate": 5.605605605605606e-06,
3803
+ "loss": 0.5032,
3804
+ "step": 4990
3805
+ },
3806
+ {
3807
+ "epoch": 89.0,
3808
+ "learning_rate": 5.505505505505506e-06,
3809
+ "loss": 0.2588,
3810
+ "step": 5000
3811
+ },
3812
+ {
3813
+ "epoch": 89.0,
3814
+ "learning_rate": 5.405405405405406e-06,
3815
+ "loss": 0.3718,
3816
+ "step": 5010
3817
+ },
3818
+ {
3819
+ "epoch": 89.01,
3820
+ "learning_rate": 5.305305305305305e-06,
3821
+ "loss": 0.2464,
3822
+ "step": 5020
3823
+ },
3824
+ {
3825
+ "epoch": 89.01,
3826
+ "learning_rate": 5.2052052052052055e-06,
3827
+ "loss": 0.1357,
3828
+ "step": 5030
3829
+ },
3830
+ {
3831
+ "epoch": 89.01,
3832
+ "learning_rate": 5.105105105105106e-06,
3833
+ "loss": 0.2142,
3834
+ "step": 5040
3835
+ },
3836
+ {
3837
+ "epoch": 89.01,
3838
+ "eval_accuracy": 0.6267281105990783,
3839
+ "eval_loss": 1.642616629600525,
3840
+ "eval_runtime": 164.3339,
3841
+ "eval_samples_per_second": 1.32,
3842
+ "eval_steps_per_second": 0.17,
3843
+ "step": 5040
3844
+ },
3845
+ {
3846
+ "epoch": 90.0,
3847
+ "learning_rate": 5.005005005005006e-06,
3848
+ "loss": 0.3905,
3849
+ "step": 5050
3850
+ },
3851
+ {
3852
+ "epoch": 90.0,
3853
+ "learning_rate": 4.904904904904905e-06,
3854
+ "loss": 0.2482,
3855
+ "step": 5060
3856
+ },
3857
+ {
3858
+ "epoch": 90.01,
3859
+ "learning_rate": 4.804804804804805e-06,
3860
+ "loss": 0.3922,
3861
+ "step": 5070
3862
+ },
3863
+ {
3864
+ "epoch": 90.01,
3865
+ "learning_rate": 4.704704704704705e-06,
3866
+ "loss": 0.2728,
3867
+ "step": 5080
3868
+ },
3869
+ {
3870
+ "epoch": 90.01,
3871
+ "learning_rate": 4.6046046046046055e-06,
3872
+ "loss": 0.3121,
3873
+ "step": 5090
3874
+ },
3875
+ {
3876
+ "epoch": 90.01,
3877
+ "eval_accuracy": 0.6036866359447005,
3878
+ "eval_loss": 1.699904441833496,
3879
+ "eval_runtime": 160.7204,
3880
+ "eval_samples_per_second": 1.35,
3881
+ "eval_steps_per_second": 0.174,
3882
+ "step": 5096
3883
+ },
3884
+ {
3885
+ "epoch": 91.0,
3886
+ "learning_rate": 4.504504504504505e-06,
3887
+ "loss": 0.4005,
3888
+ "step": 5100
3889
+ },
3890
+ {
3891
+ "epoch": 91.0,
3892
+ "learning_rate": 4.404404404404405e-06,
3893
+ "loss": 0.3077,
3894
+ "step": 5110
3895
+ },
3896
+ {
3897
+ "epoch": 91.0,
3898
+ "learning_rate": 4.304304304304305e-06,
3899
+ "loss": 0.3005,
3900
+ "step": 5120
3901
+ },
3902
+ {
3903
+ "epoch": 91.01,
3904
+ "learning_rate": 4.204204204204204e-06,
3905
+ "loss": 0.357,
3906
+ "step": 5130
3907
+ },
3908
+ {
3909
+ "epoch": 91.01,
3910
+ "learning_rate": 4.1041041041041045e-06,
3911
+ "loss": 0.2725,
3912
+ "step": 5140
3913
+ },
3914
+ {
3915
+ "epoch": 91.01,
3916
+ "learning_rate": 4.004004004004004e-06,
3917
+ "loss": 0.367,
3918
+ "step": 5150
3919
+ },
3920
+ {
3921
+ "epoch": 91.01,
3922
+ "eval_accuracy": 0.6082949308755761,
3923
+ "eval_loss": 1.4683284759521484,
3924
+ "eval_runtime": 162.3117,
3925
+ "eval_samples_per_second": 1.337,
3926
+ "eval_steps_per_second": 0.173,
3927
+ "step": 5152
3928
+ },
3929
+ {
3930
+ "epoch": 92.0,
3931
+ "learning_rate": 3.903903903903904e-06,
3932
+ "loss": 0.2292,
3933
+ "step": 5160
3934
+ },
3935
+ {
3936
+ "epoch": 92.0,
3937
+ "learning_rate": 3.8038038038038044e-06,
3938
+ "loss": 0.4217,
3939
+ "step": 5170
3940
+ },
3941
+ {
3942
+ "epoch": 92.01,
3943
+ "learning_rate": 3.7037037037037037e-06,
3944
+ "loss": 0.473,
3945
+ "step": 5180
3946
+ },
3947
+ {
3948
+ "epoch": 92.01,
3949
+ "learning_rate": 3.603603603603604e-06,
3950
+ "loss": 0.2874,
3951
+ "step": 5190
3952
+ },
3953
+ {
3954
+ "epoch": 92.01,
3955
+ "learning_rate": 3.503503503503504e-06,
3956
+ "loss": 0.178,
3957
+ "step": 5200
3958
+ },
3959
+ {
3960
+ "epoch": 92.01,
3961
+ "eval_accuracy": 0.6267281105990783,
3962
+ "eval_loss": 1.4664562940597534,
3963
+ "eval_runtime": 166.2787,
3964
+ "eval_samples_per_second": 1.305,
3965
+ "eval_steps_per_second": 0.168,
3966
+ "step": 5208
3967
+ },
3968
+ {
3969
+ "epoch": 93.0,
3970
+ "learning_rate": 3.4034034034034037e-06,
3971
+ "loss": 0.2857,
3972
+ "step": 5210
3973
+ },
3974
+ {
3975
+ "epoch": 93.0,
3976
+ "learning_rate": 3.3033033033033035e-06,
3977
+ "loss": 0.4529,
3978
+ "step": 5220
3979
+ },
3980
+ {
3981
+ "epoch": 93.0,
3982
+ "learning_rate": 3.203203203203203e-06,
3983
+ "loss": 0.3539,
3984
+ "step": 5230
3985
+ },
3986
+ {
3987
+ "epoch": 93.01,
3988
+ "learning_rate": 3.1031031031031033e-06,
3989
+ "loss": 0.508,
3990
+ "step": 5240
3991
+ },
3992
+ {
3993
+ "epoch": 93.01,
3994
+ "learning_rate": 3.003003003003003e-06,
3995
+ "loss": 0.3495,
3996
+ "step": 5250
3997
+ },
3998
+ {
3999
+ "epoch": 93.01,
4000
+ "learning_rate": 2.902902902902903e-06,
4001
+ "loss": 0.3972,
4002
+ "step": 5260
4003
+ },
4004
+ {
4005
+ "epoch": 93.01,
4006
+ "eval_accuracy": 0.6451612903225806,
4007
+ "eval_loss": 1.3464295864105225,
4008
+ "eval_runtime": 177.3754,
4009
+ "eval_samples_per_second": 1.223,
4010
+ "eval_steps_per_second": 0.158,
4011
+ "step": 5264
4012
+ },
4013
+ {
4014
+ "epoch": 94.0,
4015
+ "learning_rate": 2.802802802802803e-06,
4016
+ "loss": 0.239,
4017
+ "step": 5270
4018
+ },
4019
+ {
4020
+ "epoch": 94.0,
4021
+ "learning_rate": 2.702702702702703e-06,
4022
+ "loss": 0.3102,
4023
+ "step": 5280
4024
+ },
4025
+ {
4026
+ "epoch": 94.0,
4027
+ "learning_rate": 2.6026026026026027e-06,
4028
+ "loss": 0.2617,
4029
+ "step": 5290
4030
+ },
4031
+ {
4032
+ "epoch": 94.01,
4033
+ "learning_rate": 2.502502502502503e-06,
4034
+ "loss": 0.2582,
4035
+ "step": 5300
4036
+ },
4037
+ {
4038
+ "epoch": 94.01,
4039
+ "learning_rate": 2.4024024024024026e-06,
4040
+ "loss": 0.3483,
4041
+ "step": 5310
4042
+ },
4043
+ {
4044
+ "epoch": 94.01,
4045
+ "learning_rate": 2.3023023023023027e-06,
4046
+ "loss": 0.224,
4047
+ "step": 5320
4048
+ },
4049
+ {
4050
+ "epoch": 94.01,
4051
+ "eval_accuracy": 0.6175115207373272,
4052
+ "eval_loss": 1.5009006261825562,
4053
+ "eval_runtime": 173.0007,
4054
+ "eval_samples_per_second": 1.254,
4055
+ "eval_steps_per_second": 0.162,
4056
+ "step": 5320
4057
+ },
4058
+ {
4059
+ "epoch": 95.0,
4060
+ "learning_rate": 2.2022022022022024e-06,
4061
+ "loss": 0.2503,
4062
+ "step": 5330
4063
+ },
4064
+ {
4065
+ "epoch": 95.0,
4066
+ "learning_rate": 2.102102102102102e-06,
4067
+ "loss": 0.4842,
4068
+ "step": 5340
4069
+ },
4070
+ {
4071
+ "epoch": 95.01,
4072
+ "learning_rate": 2.002002002002002e-06,
4073
+ "loss": 0.2556,
4074
+ "step": 5350
4075
+ },
4076
+ {
4077
+ "epoch": 95.01,
4078
+ "learning_rate": 1.9019019019019022e-06,
4079
+ "loss": 0.3167,
4080
+ "step": 5360
4081
+ },
4082
+ {
4083
+ "epoch": 95.01,
4084
+ "learning_rate": 1.801801801801802e-06,
4085
+ "loss": 0.1848,
4086
+ "step": 5370
4087
+ },
4088
+ {
4089
+ "epoch": 95.01,
4090
+ "eval_accuracy": 0.6129032258064516,
4091
+ "eval_loss": 1.5068354606628418,
4092
+ "eval_runtime": 170.4309,
4093
+ "eval_samples_per_second": 1.273,
4094
+ "eval_steps_per_second": 0.164,
4095
+ "step": 5376
4096
+ },
4097
+ {
4098
+ "epoch": 96.0,
4099
+ "learning_rate": 1.7017017017017019e-06,
4100
+ "loss": 0.3195,
4101
+ "step": 5380
4102
+ },
4103
+ {
4104
+ "epoch": 96.0,
4105
+ "learning_rate": 1.6016016016016016e-06,
4106
+ "loss": 0.3012,
4107
+ "step": 5390
4108
+ },
4109
+ {
4110
+ "epoch": 96.0,
4111
+ "learning_rate": 1.5015015015015015e-06,
4112
+ "loss": 0.3382,
4113
+ "step": 5400
4114
+ },
4115
+ {
4116
+ "epoch": 96.01,
4117
+ "learning_rate": 1.4014014014014014e-06,
4118
+ "loss": 0.232,
4119
+ "step": 5410
4120
+ },
4121
+ {
4122
+ "epoch": 96.01,
4123
+ "learning_rate": 1.3013013013013014e-06,
4124
+ "loss": 0.29,
4125
+ "step": 5420
4126
+ },
4127
+ {
4128
+ "epoch": 96.01,
4129
+ "learning_rate": 1.2012012012012013e-06,
4130
+ "loss": 0.2776,
4131
+ "step": 5430
4132
+ },
4133
+ {
4134
+ "epoch": 96.01,
4135
+ "eval_accuracy": 0.6175115207373272,
4136
+ "eval_loss": 1.538283348083496,
4137
+ "eval_runtime": 170.4441,
4138
+ "eval_samples_per_second": 1.273,
4139
+ "eval_steps_per_second": 0.164,
4140
+ "step": 5432
4141
+ },
4142
+ {
4143
+ "epoch": 97.0,
4144
+ "learning_rate": 1.1011011011011012e-06,
4145
+ "loss": 0.31,
4146
+ "step": 5440
4147
+ },
4148
+ {
4149
+ "epoch": 97.0,
4150
+ "learning_rate": 1.001001001001001e-06,
4151
+ "loss": 0.3648,
4152
+ "step": 5450
4153
+ },
4154
+ {
4155
+ "epoch": 97.01,
4156
+ "learning_rate": 9.00900900900901e-07,
4157
+ "loss": 0.3484,
4158
+ "step": 5460
4159
+ },
4160
+ {
4161
+ "epoch": 97.01,
4162
+ "learning_rate": 8.008008008008008e-07,
4163
+ "loss": 0.2007,
4164
+ "step": 5470
4165
+ },
4166
+ {
4167
+ "epoch": 97.01,
4168
+ "learning_rate": 7.007007007007007e-07,
4169
+ "loss": 0.3506,
4170
+ "step": 5480
4171
+ },
4172
+ {
4173
+ "epoch": 97.01,
4174
+ "eval_accuracy": 0.6129032258064516,
4175
+ "eval_loss": 1.5355850458145142,
4176
+ "eval_runtime": 174.6844,
4177
+ "eval_samples_per_second": 1.242,
4178
+ "eval_steps_per_second": 0.16,
4179
+ "step": 5488
4180
+ },
4181
+ {
4182
+ "epoch": 98.0,
4183
+ "learning_rate": 6.006006006006006e-07,
4184
+ "loss": 0.4143,
4185
+ "step": 5490
4186
+ },
4187
+ {
4188
+ "epoch": 98.0,
4189
+ "learning_rate": 5.005005005005005e-07,
4190
+ "loss": 0.3519,
4191
+ "step": 5500
4192
+ },
4193
+ {
4194
+ "epoch": 98.0,
4195
+ "learning_rate": 4.004004004004004e-07,
4196
+ "loss": 0.2268,
4197
+ "step": 5510
4198
+ },
4199
+ {
4200
+ "epoch": 98.01,
4201
+ "learning_rate": 3.003003003003003e-07,
4202
+ "loss": 0.1346,
4203
+ "step": 5520
4204
+ },
4205
+ {
4206
+ "epoch": 98.01,
4207
+ "learning_rate": 2.002002002002002e-07,
4208
+ "loss": 0.2987,
4209
+ "step": 5530
4210
+ },
4211
+ {
4212
+ "epoch": 98.01,
4213
+ "learning_rate": 1.001001001001001e-07,
4214
+ "loss": 0.401,
4215
+ "step": 5540
4216
+ },
4217
+ {
4218
+ "epoch": 98.01,
4219
+ "eval_accuracy": 0.6175115207373272,
4220
+ "eval_loss": 1.5504214763641357,
4221
+ "eval_runtime": 168.626,
4222
+ "eval_samples_per_second": 1.287,
4223
+ "eval_steps_per_second": 0.166,
4224
+ "step": 5544
4225
+ },
4226
+ {
4227
+ "epoch": 99.0,
4228
+ "learning_rate": 0.0,
4229
+ "loss": 0.3466,
4230
+ "step": 5550
4231
+ },
4232
+ {
4233
+ "epoch": 99.0,
4234
+ "eval_accuracy": 0.6175115207373272,
4235
+ "eval_loss": 1.5504581928253174,
4236
+ "eval_runtime": 169.0024,
4237
+ "eval_samples_per_second": 1.284,
4238
+ "eval_steps_per_second": 0.166,
4239
+ "step": 5550
4240
+ },
4241
+ {
4242
+ "epoch": 99.0,
4243
+ "step": 5550,
4244
+ "total_flos": 5.520338427328414e+19,
4245
+ "train_loss": 0.6062898898339486,
4246
+ "train_runtime": 58981.2354,
4247
+ "train_samples_per_second": 0.753,
4248
+ "train_steps_per_second": 0.094
4249
+ },
4250
+ {
4251
+ "epoch": 99.0,
4252
+ "eval_accuracy": 0.7685185185185185,
4253
+ "eval_loss": 0.7077057361602783,
4254
+ "eval_runtime": 174.9386,
4255
+ "eval_samples_per_second": 1.235,
4256
+ "eval_steps_per_second": 0.154,
4257
+ "step": 5550
4258
+ },
4259
+ {
4260
+ "epoch": 99.0,
4261
+ "eval_accuracy": 0.7685185185185185,
4262
+ "eval_loss": 0.7077056765556335,
4263
+ "eval_runtime": 166.0464,
4264
+ "eval_samples_per_second": 1.301,
4265
+ "eval_steps_per_second": 0.163,
4266
+ "step": 5550
4267
+ }
4268
+ ],
4269
+ "logging_steps": 10,
4270
+ "max_steps": 5550,
4271
+ "num_input_tokens_seen": 0,
4272
+ "num_train_epochs": 9223372036854775807,
4273
+ "save_steps": 500,
4274
+ "total_flos": 5.520338427328414e+19,
4275
+ "train_batch_size": 8,
4276
+ "trial_name": null,
4277
+ "trial_params": null
4278
+ }