Jeremiah Zhou commited on
Commit
2d09fc6
1 Parent(s): 180bfeb

End of training

Browse files
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9152609448429384,
4
+ "eval_combined_score": 0.9009873932600381,
5
+ "eval_f1": 0.8867138416771377,
6
+ "eval_loss": 0.44352227449417114,
7
+ "eval_runtime": 83.9681,
8
+ "eval_samples": 40430,
9
+ "eval_samples_per_second": 481.492,
10
+ "eval_steps_per_second": 60.19,
11
+ "train_loss": 0.1477880122620598,
12
+ "train_runtime": 22868.2253,
13
+ "train_samples": 363846,
14
+ "train_samples_per_second": 159.105,
15
+ "train_steps_per_second": 9.944
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9152609448429384,
4
+ "eval_combined_score": 0.9009873932600381,
5
+ "eval_f1": 0.8867138416771377,
6
+ "eval_loss": 0.44352227449417114,
7
+ "eval_runtime": 83.9681,
8
+ "eval_samples": 40430,
9
+ "eval_samples_per_second": 481.492,
10
+ "eval_steps_per_second": 60.19
11
+ }
runs/Jun15_16-32-40_pikachu/events.out.tfevents.1655305279.pikachu.277660.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bf022ea76197e88d5b66fb861abd10606dceb39fd3964080a0539a7231f3906
3
+ size 475
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 0.1477880122620598,
4
+ "train_runtime": 22868.2253,
5
+ "train_samples": 363846,
6
+ "train_samples_per_second": 159.105,
7
+ "train_steps_per_second": 9.944
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2859 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9152609448429384,
3
+ "best_model_checkpoint": "./fine-tune/roberta-base/qqp/checkpoint-159187",
4
+ "epoch": 10.0,
5
+ "global_step": 227410,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.02,
12
+ "learning_rate": 7.328691828508612e-07,
13
+ "loss": 0.6927,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.04,
18
+ "learning_rate": 1.4657383657017225e-06,
19
+ "loss": 0.5606,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.07,
24
+ "learning_rate": 2.1986075485525834e-06,
25
+ "loss": 0.4399,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.09,
30
+ "learning_rate": 2.931476731403445e-06,
31
+ "loss": 0.4018,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.11,
36
+ "learning_rate": 3.6643459142543057e-06,
37
+ "loss": 0.3965,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.13,
42
+ "learning_rate": 4.397215097105167e-06,
43
+ "loss": 0.3687,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.15,
48
+ "learning_rate": 5.130084279956028e-06,
49
+ "loss": 0.3724,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.18,
54
+ "learning_rate": 5.86295346280689e-06,
55
+ "loss": 0.3498,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.2,
60
+ "learning_rate": 6.59582264565775e-06,
61
+ "loss": 0.366,
62
+ "step": 4500
63
+ },
64
+ {
65
+ "epoch": 0.22,
66
+ "learning_rate": 7.328691828508611e-06,
67
+ "loss": 0.3437,
68
+ "step": 5000
69
+ },
70
+ {
71
+ "epoch": 0.24,
72
+ "learning_rate": 8.061561011359473e-06,
73
+ "loss": 0.3419,
74
+ "step": 5500
75
+ },
76
+ {
77
+ "epoch": 0.26,
78
+ "learning_rate": 8.794430194210334e-06,
79
+ "loss": 0.3382,
80
+ "step": 6000
81
+ },
82
+ {
83
+ "epoch": 0.29,
84
+ "learning_rate": 9.527299377061196e-06,
85
+ "loss": 0.3371,
86
+ "step": 6500
87
+ },
88
+ {
89
+ "epoch": 0.31,
90
+ "learning_rate": 1.0260168559912056e-05,
91
+ "loss": 0.341,
92
+ "step": 7000
93
+ },
94
+ {
95
+ "epoch": 0.33,
96
+ "learning_rate": 1.0993037742762918e-05,
97
+ "loss": 0.3278,
98
+ "step": 7500
99
+ },
100
+ {
101
+ "epoch": 0.35,
102
+ "learning_rate": 1.172590692561378e-05,
103
+ "loss": 0.3338,
104
+ "step": 8000
105
+ },
106
+ {
107
+ "epoch": 0.37,
108
+ "learning_rate": 1.245877610846464e-05,
109
+ "loss": 0.3308,
110
+ "step": 8500
111
+ },
112
+ {
113
+ "epoch": 0.4,
114
+ "learning_rate": 1.31916452913155e-05,
115
+ "loss": 0.3375,
116
+ "step": 9000
117
+ },
118
+ {
119
+ "epoch": 0.42,
120
+ "learning_rate": 1.3924514474166362e-05,
121
+ "loss": 0.3292,
122
+ "step": 9500
123
+ },
124
+ {
125
+ "epoch": 0.44,
126
+ "learning_rate": 1.4657383657017223e-05,
127
+ "loss": 0.3196,
128
+ "step": 10000
129
+ },
130
+ {
131
+ "epoch": 0.46,
132
+ "learning_rate": 1.5390252839868085e-05,
133
+ "loss": 0.3253,
134
+ "step": 10500
135
+ },
136
+ {
137
+ "epoch": 0.48,
138
+ "learning_rate": 1.6123122022718947e-05,
139
+ "loss": 0.3102,
140
+ "step": 11000
141
+ },
142
+ {
143
+ "epoch": 0.51,
144
+ "learning_rate": 1.6855991205569805e-05,
145
+ "loss": 0.3211,
146
+ "step": 11500
147
+ },
148
+ {
149
+ "epoch": 0.53,
150
+ "learning_rate": 1.7588860388420667e-05,
151
+ "loss": 0.3095,
152
+ "step": 12000
153
+ },
154
+ {
155
+ "epoch": 0.55,
156
+ "learning_rate": 1.832172957127153e-05,
157
+ "loss": 0.2994,
158
+ "step": 12500
159
+ },
160
+ {
161
+ "epoch": 0.57,
162
+ "learning_rate": 1.905459875412239e-05,
163
+ "loss": 0.3151,
164
+ "step": 13000
165
+ },
166
+ {
167
+ "epoch": 0.59,
168
+ "learning_rate": 1.9787467936973253e-05,
169
+ "loss": 0.3009,
170
+ "step": 13500
171
+ },
172
+ {
173
+ "epoch": 0.62,
174
+ "learning_rate": 1.9966785956541065e-05,
175
+ "loss": 0.3044,
176
+ "step": 14000
177
+ },
178
+ {
179
+ "epoch": 0.64,
180
+ "learning_rate": 1.992000561364115e-05,
181
+ "loss": 0.3051,
182
+ "step": 14500
183
+ },
184
+ {
185
+ "epoch": 0.66,
186
+ "learning_rate": 1.9873225270741237e-05,
187
+ "loss": 0.3045,
188
+ "step": 15000
189
+ },
190
+ {
191
+ "epoch": 0.68,
192
+ "learning_rate": 1.982644492784132e-05,
193
+ "loss": 0.315,
194
+ "step": 15500
195
+ },
196
+ {
197
+ "epoch": 0.7,
198
+ "learning_rate": 1.977966458494141e-05,
199
+ "loss": 0.3165,
200
+ "step": 16000
201
+ },
202
+ {
203
+ "epoch": 0.73,
204
+ "learning_rate": 1.9732884242041496e-05,
205
+ "loss": 0.2932,
206
+ "step": 16500
207
+ },
208
+ {
209
+ "epoch": 0.75,
210
+ "learning_rate": 1.9686103899141583e-05,
211
+ "loss": 0.3014,
212
+ "step": 17000
213
+ },
214
+ {
215
+ "epoch": 0.77,
216
+ "learning_rate": 1.963932355624167e-05,
217
+ "loss": 0.3052,
218
+ "step": 17500
219
+ },
220
+ {
221
+ "epoch": 0.79,
222
+ "learning_rate": 1.9592543213341755e-05,
223
+ "loss": 0.293,
224
+ "step": 18000
225
+ },
226
+ {
227
+ "epoch": 0.81,
228
+ "learning_rate": 1.9545762870441842e-05,
229
+ "loss": 0.29,
230
+ "step": 18500
231
+ },
232
+ {
233
+ "epoch": 0.84,
234
+ "learning_rate": 1.9498982527541926e-05,
235
+ "loss": 0.2814,
236
+ "step": 19000
237
+ },
238
+ {
239
+ "epoch": 0.86,
240
+ "learning_rate": 1.9452202184642014e-05,
241
+ "loss": 0.2884,
242
+ "step": 19500
243
+ },
244
+ {
245
+ "epoch": 0.88,
246
+ "learning_rate": 1.94054218417421e-05,
247
+ "loss": 0.2901,
248
+ "step": 20000
249
+ },
250
+ {
251
+ "epoch": 0.9,
252
+ "learning_rate": 1.935864149884219e-05,
253
+ "loss": 0.2833,
254
+ "step": 20500
255
+ },
256
+ {
257
+ "epoch": 0.92,
258
+ "learning_rate": 1.9311861155942276e-05,
259
+ "loss": 0.2877,
260
+ "step": 21000
261
+ },
262
+ {
263
+ "epoch": 0.95,
264
+ "learning_rate": 1.926508081304236e-05,
265
+ "loss": 0.2935,
266
+ "step": 21500
267
+ },
268
+ {
269
+ "epoch": 0.97,
270
+ "learning_rate": 1.9218300470142448e-05,
271
+ "loss": 0.2783,
272
+ "step": 22000
273
+ },
274
+ {
275
+ "epoch": 0.99,
276
+ "learning_rate": 1.9171520127242532e-05,
277
+ "loss": 0.2751,
278
+ "step": 22500
279
+ },
280
+ {
281
+ "epoch": 1.0,
282
+ "eval_accuracy": 0.8904773682908731,
283
+ "eval_combined_score": 0.8708585188561804,
284
+ "eval_f1": 0.8512396694214877,
285
+ "eval_loss": 0.3056511878967285,
286
+ "eval_runtime": 72.5914,
287
+ "eval_samples_per_second": 556.953,
288
+ "eval_steps_per_second": 69.623,
289
+ "step": 22741
290
+ },
291
+ {
292
+ "epoch": 1.01,
293
+ "learning_rate": 1.912473978434262e-05,
294
+ "loss": 0.2699,
295
+ "step": 23000
296
+ },
297
+ {
298
+ "epoch": 1.03,
299
+ "learning_rate": 1.9077959441442707e-05,
300
+ "loss": 0.2617,
301
+ "step": 23500
302
+ },
303
+ {
304
+ "epoch": 1.06,
305
+ "learning_rate": 1.9031179098542794e-05,
306
+ "loss": 0.246,
307
+ "step": 24000
308
+ },
309
+ {
310
+ "epoch": 1.08,
311
+ "learning_rate": 1.898439875564288e-05,
312
+ "loss": 0.2603,
313
+ "step": 24500
314
+ },
315
+ {
316
+ "epoch": 1.1,
317
+ "learning_rate": 1.8937618412742966e-05,
318
+ "loss": 0.2488,
319
+ "step": 25000
320
+ },
321
+ {
322
+ "epoch": 1.12,
323
+ "learning_rate": 1.8890838069843053e-05,
324
+ "loss": 0.2394,
325
+ "step": 25500
326
+ },
327
+ {
328
+ "epoch": 1.14,
329
+ "learning_rate": 1.884405772694314e-05,
330
+ "loss": 0.2558,
331
+ "step": 26000
332
+ },
333
+ {
334
+ "epoch": 1.17,
335
+ "learning_rate": 1.8797277384043228e-05,
336
+ "loss": 0.2536,
337
+ "step": 26500
338
+ },
339
+ {
340
+ "epoch": 1.19,
341
+ "learning_rate": 1.8750497041143315e-05,
342
+ "loss": 0.2459,
343
+ "step": 27000
344
+ },
345
+ {
346
+ "epoch": 1.21,
347
+ "learning_rate": 1.87037166982434e-05,
348
+ "loss": 0.2451,
349
+ "step": 27500
350
+ },
351
+ {
352
+ "epoch": 1.23,
353
+ "learning_rate": 1.8656936355343487e-05,
354
+ "loss": 0.2547,
355
+ "step": 28000
356
+ },
357
+ {
358
+ "epoch": 1.25,
359
+ "learning_rate": 1.861015601244357e-05,
360
+ "loss": 0.2571,
361
+ "step": 28500
362
+ },
363
+ {
364
+ "epoch": 1.28,
365
+ "learning_rate": 1.856337566954366e-05,
366
+ "loss": 0.2622,
367
+ "step": 29000
368
+ },
369
+ {
370
+ "epoch": 1.3,
371
+ "learning_rate": 1.8516595326643746e-05,
372
+ "loss": 0.2642,
373
+ "step": 29500
374
+ },
375
+ {
376
+ "epoch": 1.32,
377
+ "learning_rate": 1.8469814983743833e-05,
378
+ "loss": 0.2524,
379
+ "step": 30000
380
+ },
381
+ {
382
+ "epoch": 1.34,
383
+ "learning_rate": 1.842303464084392e-05,
384
+ "loss": 0.2604,
385
+ "step": 30500
386
+ },
387
+ {
388
+ "epoch": 1.36,
389
+ "learning_rate": 1.8376254297944005e-05,
390
+ "loss": 0.2476,
391
+ "step": 31000
392
+ },
393
+ {
394
+ "epoch": 1.39,
395
+ "learning_rate": 1.8329473955044092e-05,
396
+ "loss": 0.2494,
397
+ "step": 31500
398
+ },
399
+ {
400
+ "epoch": 1.41,
401
+ "learning_rate": 1.8282693612144176e-05,
402
+ "loss": 0.2513,
403
+ "step": 32000
404
+ },
405
+ {
406
+ "epoch": 1.43,
407
+ "learning_rate": 1.8235913269244264e-05,
408
+ "loss": 0.254,
409
+ "step": 32500
410
+ },
411
+ {
412
+ "epoch": 1.45,
413
+ "learning_rate": 1.818913292634435e-05,
414
+ "loss": 0.2505,
415
+ "step": 33000
416
+ },
417
+ {
418
+ "epoch": 1.47,
419
+ "learning_rate": 1.814235258344444e-05,
420
+ "loss": 0.249,
421
+ "step": 33500
422
+ },
423
+ {
424
+ "epoch": 1.5,
425
+ "learning_rate": 1.8095572240544526e-05,
426
+ "loss": 0.2469,
427
+ "step": 34000
428
+ },
429
+ {
430
+ "epoch": 1.52,
431
+ "learning_rate": 1.804879189764461e-05,
432
+ "loss": 0.2538,
433
+ "step": 34500
434
+ },
435
+ {
436
+ "epoch": 1.54,
437
+ "learning_rate": 1.8002011554744698e-05,
438
+ "loss": 0.24,
439
+ "step": 35000
440
+ },
441
+ {
442
+ "epoch": 1.56,
443
+ "learning_rate": 1.7955231211844785e-05,
444
+ "loss": 0.2488,
445
+ "step": 35500
446
+ },
447
+ {
448
+ "epoch": 1.58,
449
+ "learning_rate": 1.7908450868944873e-05,
450
+ "loss": 0.2552,
451
+ "step": 36000
452
+ },
453
+ {
454
+ "epoch": 1.61,
455
+ "learning_rate": 1.7861670526044957e-05,
456
+ "loss": 0.2485,
457
+ "step": 36500
458
+ },
459
+ {
460
+ "epoch": 1.63,
461
+ "learning_rate": 1.7814890183145044e-05,
462
+ "loss": 0.2518,
463
+ "step": 37000
464
+ },
465
+ {
466
+ "epoch": 1.65,
467
+ "learning_rate": 1.776810984024513e-05,
468
+ "loss": 0.2316,
469
+ "step": 37500
470
+ },
471
+ {
472
+ "epoch": 1.67,
473
+ "learning_rate": 1.7721329497345216e-05,
474
+ "loss": 0.2382,
475
+ "step": 38000
476
+ },
477
+ {
478
+ "epoch": 1.69,
479
+ "learning_rate": 1.7674549154445303e-05,
480
+ "loss": 0.2435,
481
+ "step": 38500
482
+ },
483
+ {
484
+ "epoch": 1.71,
485
+ "learning_rate": 1.762776881154539e-05,
486
+ "loss": 0.2359,
487
+ "step": 39000
488
+ },
489
+ {
490
+ "epoch": 1.74,
491
+ "learning_rate": 1.7580988468645478e-05,
492
+ "loss": 0.2439,
493
+ "step": 39500
494
+ },
495
+ {
496
+ "epoch": 1.76,
497
+ "learning_rate": 1.7534208125745562e-05,
498
+ "loss": 0.2386,
499
+ "step": 40000
500
+ },
501
+ {
502
+ "epoch": 1.78,
503
+ "learning_rate": 1.748742778284565e-05,
504
+ "loss": 0.2404,
505
+ "step": 40500
506
+ },
507
+ {
508
+ "epoch": 1.8,
509
+ "learning_rate": 1.7440647439945737e-05,
510
+ "loss": 0.2534,
511
+ "step": 41000
512
+ },
513
+ {
514
+ "epoch": 1.82,
515
+ "learning_rate": 1.739386709704582e-05,
516
+ "loss": 0.2416,
517
+ "step": 41500
518
+ },
519
+ {
520
+ "epoch": 1.85,
521
+ "learning_rate": 1.734708675414591e-05,
522
+ "loss": 0.2517,
523
+ "step": 42000
524
+ },
525
+ {
526
+ "epoch": 1.87,
527
+ "learning_rate": 1.7300306411245996e-05,
528
+ "loss": 0.2508,
529
+ "step": 42500
530
+ },
531
+ {
532
+ "epoch": 1.89,
533
+ "learning_rate": 1.7253526068346083e-05,
534
+ "loss": 0.2515,
535
+ "step": 43000
536
+ },
537
+ {
538
+ "epoch": 1.91,
539
+ "learning_rate": 1.7206745725446168e-05,
540
+ "loss": 0.2507,
541
+ "step": 43500
542
+ },
543
+ {
544
+ "epoch": 1.93,
545
+ "learning_rate": 1.7159965382546255e-05,
546
+ "loss": 0.2528,
547
+ "step": 44000
548
+ },
549
+ {
550
+ "epoch": 1.96,
551
+ "learning_rate": 1.7113185039646342e-05,
552
+ "loss": 0.2471,
553
+ "step": 44500
554
+ },
555
+ {
556
+ "epoch": 1.98,
557
+ "learning_rate": 1.7066404696746426e-05,
558
+ "loss": 0.2443,
559
+ "step": 45000
560
+ },
561
+ {
562
+ "epoch": 2.0,
563
+ "eval_accuracy": 0.9004699480583725,
564
+ "eval_combined_score": 0.8857477945420068,
565
+ "eval_f1": 0.8710256410256411,
566
+ "eval_loss": 0.2529826760292053,
567
+ "eval_runtime": 71.9777,
568
+ "eval_samples_per_second": 561.702,
569
+ "eval_steps_per_second": 70.216,
570
+ "step": 45482
571
+ },
572
+ {
573
+ "epoch": 2.0,
574
+ "learning_rate": 1.7019624353846514e-05,
575
+ "loss": 0.2582,
576
+ "step": 45500
577
+ },
578
+ {
579
+ "epoch": 2.02,
580
+ "learning_rate": 1.69728440109466e-05,
581
+ "loss": 0.2023,
582
+ "step": 46000
583
+ },
584
+ {
585
+ "epoch": 2.04,
586
+ "learning_rate": 1.692606366804669e-05,
587
+ "loss": 0.2099,
588
+ "step": 46500
589
+ },
590
+ {
591
+ "epoch": 2.07,
592
+ "learning_rate": 1.6879283325146776e-05,
593
+ "loss": 0.2071,
594
+ "step": 47000
595
+ },
596
+ {
597
+ "epoch": 2.09,
598
+ "learning_rate": 1.683250298224686e-05,
599
+ "loss": 0.213,
600
+ "step": 47500
601
+ },
602
+ {
603
+ "epoch": 2.11,
604
+ "learning_rate": 1.6785722639346948e-05,
605
+ "loss": 0.2089,
606
+ "step": 48000
607
+ },
608
+ {
609
+ "epoch": 2.13,
610
+ "learning_rate": 1.6738942296447035e-05,
611
+ "loss": 0.2201,
612
+ "step": 48500
613
+ },
614
+ {
615
+ "epoch": 2.15,
616
+ "learning_rate": 1.6692161953547123e-05,
617
+ "loss": 0.1979,
618
+ "step": 49000
619
+ },
620
+ {
621
+ "epoch": 2.18,
622
+ "learning_rate": 1.6645381610647207e-05,
623
+ "loss": 0.2059,
624
+ "step": 49500
625
+ },
626
+ {
627
+ "epoch": 2.2,
628
+ "learning_rate": 1.6598601267747294e-05,
629
+ "loss": 0.2146,
630
+ "step": 50000
631
+ },
632
+ {
633
+ "epoch": 2.22,
634
+ "learning_rate": 1.655182092484738e-05,
635
+ "loss": 0.2008,
636
+ "step": 50500
637
+ },
638
+ {
639
+ "epoch": 2.24,
640
+ "learning_rate": 1.6505040581947466e-05,
641
+ "loss": 0.205,
642
+ "step": 51000
643
+ },
644
+ {
645
+ "epoch": 2.26,
646
+ "learning_rate": 1.6458260239047553e-05,
647
+ "loss": 0.2135,
648
+ "step": 51500
649
+ },
650
+ {
651
+ "epoch": 2.29,
652
+ "learning_rate": 1.641147989614764e-05,
653
+ "loss": 0.2099,
654
+ "step": 52000
655
+ },
656
+ {
657
+ "epoch": 2.31,
658
+ "learning_rate": 1.6364699553247728e-05,
659
+ "loss": 0.1999,
660
+ "step": 52500
661
+ },
662
+ {
663
+ "epoch": 2.33,
664
+ "learning_rate": 1.6317919210347812e-05,
665
+ "loss": 0.2104,
666
+ "step": 53000
667
+ },
668
+ {
669
+ "epoch": 2.35,
670
+ "learning_rate": 1.62711388674479e-05,
671
+ "loss": 0.2135,
672
+ "step": 53500
673
+ },
674
+ {
675
+ "epoch": 2.37,
676
+ "learning_rate": 1.6224358524547987e-05,
677
+ "loss": 0.2066,
678
+ "step": 54000
679
+ },
680
+ {
681
+ "epoch": 2.4,
682
+ "learning_rate": 1.617757818164807e-05,
683
+ "loss": 0.2128,
684
+ "step": 54500
685
+ },
686
+ {
687
+ "epoch": 2.42,
688
+ "learning_rate": 1.613079783874816e-05,
689
+ "loss": 0.2079,
690
+ "step": 55000
691
+ },
692
+ {
693
+ "epoch": 2.44,
694
+ "learning_rate": 1.6084017495848246e-05,
695
+ "loss": 0.2142,
696
+ "step": 55500
697
+ },
698
+ {
699
+ "epoch": 2.46,
700
+ "learning_rate": 1.6037237152948333e-05,
701
+ "loss": 0.2079,
702
+ "step": 56000
703
+ },
704
+ {
705
+ "epoch": 2.48,
706
+ "learning_rate": 1.5990456810048418e-05,
707
+ "loss": 0.2065,
708
+ "step": 56500
709
+ },
710
+ {
711
+ "epoch": 2.51,
712
+ "learning_rate": 1.5943676467148505e-05,
713
+ "loss": 0.2166,
714
+ "step": 57000
715
+ },
716
+ {
717
+ "epoch": 2.53,
718
+ "learning_rate": 1.5896896124248592e-05,
719
+ "loss": 0.2135,
720
+ "step": 57500
721
+ },
722
+ {
723
+ "epoch": 2.55,
724
+ "learning_rate": 1.585011578134868e-05,
725
+ "loss": 0.2072,
726
+ "step": 58000
727
+ },
728
+ {
729
+ "epoch": 2.57,
730
+ "learning_rate": 1.5803335438448767e-05,
731
+ "loss": 0.2132,
732
+ "step": 58500
733
+ },
734
+ {
735
+ "epoch": 2.59,
736
+ "learning_rate": 1.575655509554885e-05,
737
+ "loss": 0.2114,
738
+ "step": 59000
739
+ },
740
+ {
741
+ "epoch": 2.62,
742
+ "learning_rate": 1.570977475264894e-05,
743
+ "loss": 0.2172,
744
+ "step": 59500
745
+ },
746
+ {
747
+ "epoch": 2.64,
748
+ "learning_rate": 1.5662994409749023e-05,
749
+ "loss": 0.198,
750
+ "step": 60000
751
+ },
752
+ {
753
+ "epoch": 2.66,
754
+ "learning_rate": 1.561621406684911e-05,
755
+ "loss": 0.2013,
756
+ "step": 60500
757
+ },
758
+ {
759
+ "epoch": 2.68,
760
+ "learning_rate": 1.5569433723949198e-05,
761
+ "loss": 0.2076,
762
+ "step": 61000
763
+ },
764
+ {
765
+ "epoch": 2.7,
766
+ "learning_rate": 1.5522653381049285e-05,
767
+ "loss": 0.2113,
768
+ "step": 61500
769
+ },
770
+ {
771
+ "epoch": 2.73,
772
+ "learning_rate": 1.5475873038149373e-05,
773
+ "loss": 0.2132,
774
+ "step": 62000
775
+ },
776
+ {
777
+ "epoch": 2.75,
778
+ "learning_rate": 1.5429092695249457e-05,
779
+ "loss": 0.2152,
780
+ "step": 62500
781
+ },
782
+ {
783
+ "epoch": 2.77,
784
+ "learning_rate": 1.5382312352349544e-05,
785
+ "loss": 0.2082,
786
+ "step": 63000
787
+ },
788
+ {
789
+ "epoch": 2.79,
790
+ "learning_rate": 1.533553200944963e-05,
791
+ "loss": 0.216,
792
+ "step": 63500
793
+ },
794
+ {
795
+ "epoch": 2.81,
796
+ "learning_rate": 1.5288751666549716e-05,
797
+ "loss": 0.2154,
798
+ "step": 64000
799
+ },
800
+ {
801
+ "epoch": 2.84,
802
+ "learning_rate": 1.5241971323649805e-05,
803
+ "loss": 0.2087,
804
+ "step": 64500
805
+ },
806
+ {
807
+ "epoch": 2.86,
808
+ "learning_rate": 1.519519098074989e-05,
809
+ "loss": 0.1985,
810
+ "step": 65000
811
+ },
812
+ {
813
+ "epoch": 2.88,
814
+ "learning_rate": 1.5148410637849978e-05,
815
+ "loss": 0.2117,
816
+ "step": 65500
817
+ },
818
+ {
819
+ "epoch": 2.9,
820
+ "learning_rate": 1.5101630294950062e-05,
821
+ "loss": 0.214,
822
+ "step": 66000
823
+ },
824
+ {
825
+ "epoch": 2.92,
826
+ "learning_rate": 1.505484995205015e-05,
827
+ "loss": 0.1941,
828
+ "step": 66500
829
+ },
830
+ {
831
+ "epoch": 2.95,
832
+ "learning_rate": 1.5008069609150235e-05,
833
+ "loss": 0.208,
834
+ "step": 67000
835
+ },
836
+ {
837
+ "epoch": 2.97,
838
+ "learning_rate": 1.4961289266250323e-05,
839
+ "loss": 0.2205,
840
+ "step": 67500
841
+ },
842
+ {
843
+ "epoch": 2.99,
844
+ "learning_rate": 1.491450892335041e-05,
845
+ "loss": 0.2157,
846
+ "step": 68000
847
+ },
848
+ {
849
+ "epoch": 3.0,
850
+ "eval_accuracy": 0.9069502844422459,
851
+ "eval_combined_score": 0.8919079780420185,
852
+ "eval_f1": 0.876865671641791,
853
+ "eval_loss": 0.2643309533596039,
854
+ "eval_runtime": 71.5158,
855
+ "eval_samples_per_second": 565.33,
856
+ "eval_steps_per_second": 70.67,
857
+ "step": 68223
858
+ },
859
+ {
860
+ "epoch": 3.01,
861
+ "learning_rate": 1.4867728580450496e-05,
862
+ "loss": 0.1811,
863
+ "step": 68500
864
+ },
865
+ {
866
+ "epoch": 3.03,
867
+ "learning_rate": 1.4820948237550584e-05,
868
+ "loss": 0.1629,
869
+ "step": 69000
870
+ },
871
+ {
872
+ "epoch": 3.06,
873
+ "learning_rate": 1.4774167894650668e-05,
874
+ "loss": 0.1615,
875
+ "step": 69500
876
+ },
877
+ {
878
+ "epoch": 3.08,
879
+ "learning_rate": 1.4727387551750755e-05,
880
+ "loss": 0.1736,
881
+ "step": 70000
882
+ },
883
+ {
884
+ "epoch": 3.1,
885
+ "learning_rate": 1.468060720885084e-05,
886
+ "loss": 0.1738,
887
+ "step": 70500
888
+ },
889
+ {
890
+ "epoch": 3.12,
891
+ "learning_rate": 1.4633826865950928e-05,
892
+ "loss": 0.1739,
893
+ "step": 71000
894
+ },
895
+ {
896
+ "epoch": 3.14,
897
+ "learning_rate": 1.4587046523051016e-05,
898
+ "loss": 0.1712,
899
+ "step": 71500
900
+ },
901
+ {
902
+ "epoch": 3.17,
903
+ "learning_rate": 1.4540266180151101e-05,
904
+ "loss": 0.1846,
905
+ "step": 72000
906
+ },
907
+ {
908
+ "epoch": 3.19,
909
+ "learning_rate": 1.4493485837251189e-05,
910
+ "loss": 0.1683,
911
+ "step": 72500
912
+ },
913
+ {
914
+ "epoch": 3.21,
915
+ "learning_rate": 1.4446705494351275e-05,
916
+ "loss": 0.168,
917
+ "step": 73000
918
+ },
919
+ {
920
+ "epoch": 3.23,
921
+ "learning_rate": 1.4399925151451362e-05,
922
+ "loss": 0.1761,
923
+ "step": 73500
924
+ },
925
+ {
926
+ "epoch": 3.25,
927
+ "learning_rate": 1.4353144808551446e-05,
928
+ "loss": 0.1691,
929
+ "step": 74000
930
+ },
931
+ {
932
+ "epoch": 3.28,
933
+ "learning_rate": 1.4306364465651534e-05,
934
+ "loss": 0.1766,
935
+ "step": 74500
936
+ },
937
+ {
938
+ "epoch": 3.3,
939
+ "learning_rate": 1.4259584122751621e-05,
940
+ "loss": 0.1738,
941
+ "step": 75000
942
+ },
943
+ {
944
+ "epoch": 3.32,
945
+ "learning_rate": 1.4212803779851707e-05,
946
+ "loss": 0.1738,
947
+ "step": 75500
948
+ },
949
+ {
950
+ "epoch": 3.34,
951
+ "learning_rate": 1.4166023436951794e-05,
952
+ "loss": 0.1839,
953
+ "step": 76000
954
+ },
955
+ {
956
+ "epoch": 3.36,
957
+ "learning_rate": 1.411924309405188e-05,
958
+ "loss": 0.1791,
959
+ "step": 76500
960
+ },
961
+ {
962
+ "epoch": 3.39,
963
+ "learning_rate": 1.4072462751151967e-05,
964
+ "loss": 0.1773,
965
+ "step": 77000
966
+ },
967
+ {
968
+ "epoch": 3.41,
969
+ "learning_rate": 1.4025682408252053e-05,
970
+ "loss": 0.1736,
971
+ "step": 77500
972
+ },
973
+ {
974
+ "epoch": 3.43,
975
+ "learning_rate": 1.397890206535214e-05,
976
+ "loss": 0.1695,
977
+ "step": 78000
978
+ },
979
+ {
980
+ "epoch": 3.45,
981
+ "learning_rate": 1.3932121722452228e-05,
982
+ "loss": 0.1785,
983
+ "step": 78500
984
+ },
985
+ {
986
+ "epoch": 3.47,
987
+ "learning_rate": 1.3885341379552312e-05,
988
+ "loss": 0.1712,
989
+ "step": 79000
990
+ },
991
+ {
992
+ "epoch": 3.5,
993
+ "learning_rate": 1.38385610366524e-05,
994
+ "loss": 0.1847,
995
+ "step": 79500
996
+ },
997
+ {
998
+ "epoch": 3.52,
999
+ "learning_rate": 1.3791780693752485e-05,
1000
+ "loss": 0.181,
1001
+ "step": 80000
1002
+ },
1003
+ {
1004
+ "epoch": 3.54,
1005
+ "learning_rate": 1.3745000350852573e-05,
1006
+ "loss": 0.179,
1007
+ "step": 80500
1008
+ },
1009
+ {
1010
+ "epoch": 3.56,
1011
+ "learning_rate": 1.3698220007952659e-05,
1012
+ "loss": 0.1827,
1013
+ "step": 81000
1014
+ },
1015
+ {
1016
+ "epoch": 3.58,
1017
+ "learning_rate": 1.3651439665052746e-05,
1018
+ "loss": 0.1797,
1019
+ "step": 81500
1020
+ },
1021
+ {
1022
+ "epoch": 3.61,
1023
+ "learning_rate": 1.3604659322152834e-05,
1024
+ "loss": 0.1757,
1025
+ "step": 82000
1026
+ },
1027
+ {
1028
+ "epoch": 3.63,
1029
+ "learning_rate": 1.355787897925292e-05,
1030
+ "loss": 0.1703,
1031
+ "step": 82500
1032
+ },
1033
+ {
1034
+ "epoch": 3.65,
1035
+ "learning_rate": 1.3511098636353007e-05,
1036
+ "loss": 0.1751,
1037
+ "step": 83000
1038
+ },
1039
+ {
1040
+ "epoch": 3.67,
1041
+ "learning_rate": 1.346431829345309e-05,
1042
+ "loss": 0.1886,
1043
+ "step": 83500
1044
+ },
1045
+ {
1046
+ "epoch": 3.69,
1047
+ "learning_rate": 1.3417537950553178e-05,
1048
+ "loss": 0.1864,
1049
+ "step": 84000
1050
+ },
1051
+ {
1052
+ "epoch": 3.72,
1053
+ "learning_rate": 1.3370757607653264e-05,
1054
+ "loss": 0.1758,
1055
+ "step": 84500
1056
+ },
1057
+ {
1058
+ "epoch": 3.74,
1059
+ "learning_rate": 1.3323977264753351e-05,
1060
+ "loss": 0.1819,
1061
+ "step": 85000
1062
+ },
1063
+ {
1064
+ "epoch": 3.76,
1065
+ "learning_rate": 1.3277196921853439e-05,
1066
+ "loss": 0.171,
1067
+ "step": 85500
1068
+ },
1069
+ {
1070
+ "epoch": 3.78,
1071
+ "learning_rate": 1.3230416578953525e-05,
1072
+ "loss": 0.1789,
1073
+ "step": 86000
1074
+ },
1075
+ {
1076
+ "epoch": 3.8,
1077
+ "learning_rate": 1.3183636236053612e-05,
1078
+ "loss": 0.1836,
1079
+ "step": 86500
1080
+ },
1081
+ {
1082
+ "epoch": 3.83,
1083
+ "learning_rate": 1.3136855893153698e-05,
1084
+ "loss": 0.1841,
1085
+ "step": 87000
1086
+ },
1087
+ {
1088
+ "epoch": 3.85,
1089
+ "learning_rate": 1.3090075550253785e-05,
1090
+ "loss": 0.1843,
1091
+ "step": 87500
1092
+ },
1093
+ {
1094
+ "epoch": 3.87,
1095
+ "learning_rate": 1.304329520735387e-05,
1096
+ "loss": 0.1845,
1097
+ "step": 88000
1098
+ },
1099
+ {
1100
+ "epoch": 3.89,
1101
+ "learning_rate": 1.2996514864453957e-05,
1102
+ "loss": 0.1849,
1103
+ "step": 88500
1104
+ },
1105
+ {
1106
+ "epoch": 3.91,
1107
+ "learning_rate": 1.2949734521554044e-05,
1108
+ "loss": 0.1822,
1109
+ "step": 89000
1110
+ },
1111
+ {
1112
+ "epoch": 3.94,
1113
+ "learning_rate": 1.290295417865413e-05,
1114
+ "loss": 0.1859,
1115
+ "step": 89500
1116
+ },
1117
+ {
1118
+ "epoch": 3.96,
1119
+ "learning_rate": 1.2856173835754218e-05,
1120
+ "loss": 0.1928,
1121
+ "step": 90000
1122
+ },
1123
+ {
1124
+ "epoch": 3.98,
1125
+ "learning_rate": 1.2809393492854303e-05,
1126
+ "loss": 0.1838,
1127
+ "step": 90500
1128
+ },
1129
+ {
1130
+ "epoch": 4.0,
1131
+ "eval_accuracy": 0.9109077417759089,
1132
+ "eval_combined_score": 0.8961948553575517,
1133
+ "eval_f1": 0.8814819689391945,
1134
+ "eval_loss": 0.28062957525253296,
1135
+ "eval_runtime": 73.9944,
1136
+ "eval_samples_per_second": 546.393,
1137
+ "eval_steps_per_second": 68.302,
1138
+ "step": 90964
1139
+ },
1140
+ {
1141
+ "epoch": 4.0,
1142
+ "learning_rate": 1.276261314995439e-05,
1143
+ "loss": 0.1721,
1144
+ "step": 91000
1145
+ },
1146
+ {
1147
+ "epoch": 4.02,
1148
+ "learning_rate": 1.2715832807054476e-05,
1149
+ "loss": 0.1357,
1150
+ "step": 91500
1151
+ },
1152
+ {
1153
+ "epoch": 4.05,
1154
+ "learning_rate": 1.2669052464154564e-05,
1155
+ "loss": 0.1472,
1156
+ "step": 92000
1157
+ },
1158
+ {
1159
+ "epoch": 4.07,
1160
+ "learning_rate": 1.2622272121254651e-05,
1161
+ "loss": 0.1466,
1162
+ "step": 92500
1163
+ },
1164
+ {
1165
+ "epoch": 4.09,
1166
+ "learning_rate": 1.2575491778354735e-05,
1167
+ "loss": 0.1449,
1168
+ "step": 93000
1169
+ },
1170
+ {
1171
+ "epoch": 4.11,
1172
+ "learning_rate": 1.2528711435454823e-05,
1173
+ "loss": 0.1462,
1174
+ "step": 93500
1175
+ },
1176
+ {
1177
+ "epoch": 4.13,
1178
+ "learning_rate": 1.2481931092554909e-05,
1179
+ "loss": 0.1412,
1180
+ "step": 94000
1181
+ },
1182
+ {
1183
+ "epoch": 4.16,
1184
+ "learning_rate": 1.2435150749654996e-05,
1185
+ "loss": 0.1477,
1186
+ "step": 94500
1187
+ },
1188
+ {
1189
+ "epoch": 4.18,
1190
+ "learning_rate": 1.2388370406755084e-05,
1191
+ "loss": 0.1565,
1192
+ "step": 95000
1193
+ },
1194
+ {
1195
+ "epoch": 4.2,
1196
+ "learning_rate": 1.234159006385517e-05,
1197
+ "loss": 0.1465,
1198
+ "step": 95500
1199
+ },
1200
+ {
1201
+ "epoch": 4.22,
1202
+ "learning_rate": 1.2294809720955257e-05,
1203
+ "loss": 0.1398,
1204
+ "step": 96000
1205
+ },
1206
+ {
1207
+ "epoch": 4.24,
1208
+ "learning_rate": 1.224802937805534e-05,
1209
+ "loss": 0.1419,
1210
+ "step": 96500
1211
+ },
1212
+ {
1213
+ "epoch": 4.27,
1214
+ "learning_rate": 1.220124903515543e-05,
1215
+ "loss": 0.1407,
1216
+ "step": 97000
1217
+ },
1218
+ {
1219
+ "epoch": 4.29,
1220
+ "learning_rate": 1.2154468692255514e-05,
1221
+ "loss": 0.1563,
1222
+ "step": 97500
1223
+ },
1224
+ {
1225
+ "epoch": 4.31,
1226
+ "learning_rate": 1.2107688349355601e-05,
1227
+ "loss": 0.1552,
1228
+ "step": 98000
1229
+ },
1230
+ {
1231
+ "epoch": 4.33,
1232
+ "learning_rate": 1.2060908006455689e-05,
1233
+ "loss": 0.1483,
1234
+ "step": 98500
1235
+ },
1236
+ {
1237
+ "epoch": 4.35,
1238
+ "learning_rate": 1.2014127663555775e-05,
1239
+ "loss": 0.1568,
1240
+ "step": 99000
1241
+ },
1242
+ {
1243
+ "epoch": 4.38,
1244
+ "learning_rate": 1.1967347320655862e-05,
1245
+ "loss": 0.1423,
1246
+ "step": 99500
1247
+ },
1248
+ {
1249
+ "epoch": 4.4,
1250
+ "learning_rate": 1.1920566977755948e-05,
1251
+ "loss": 0.1531,
1252
+ "step": 100000
1253
+ },
1254
+ {
1255
+ "epoch": 4.42,
1256
+ "learning_rate": 1.1873786634856035e-05,
1257
+ "loss": 0.1465,
1258
+ "step": 100500
1259
+ },
1260
+ {
1261
+ "epoch": 4.44,
1262
+ "learning_rate": 1.182700629195612e-05,
1263
+ "loss": 0.1551,
1264
+ "step": 101000
1265
+ },
1266
+ {
1267
+ "epoch": 4.46,
1268
+ "learning_rate": 1.1780225949056207e-05,
1269
+ "loss": 0.1492,
1270
+ "step": 101500
1271
+ },
1272
+ {
1273
+ "epoch": 4.49,
1274
+ "learning_rate": 1.1733445606156294e-05,
1275
+ "loss": 0.1456,
1276
+ "step": 102000
1277
+ },
1278
+ {
1279
+ "epoch": 4.51,
1280
+ "learning_rate": 1.168666526325638e-05,
1281
+ "loss": 0.1488,
1282
+ "step": 102500
1283
+ },
1284
+ {
1285
+ "epoch": 4.53,
1286
+ "learning_rate": 1.1639884920356468e-05,
1287
+ "loss": 0.141,
1288
+ "step": 103000
1289
+ },
1290
+ {
1291
+ "epoch": 4.55,
1292
+ "learning_rate": 1.1593104577456553e-05,
1293
+ "loss": 0.1518,
1294
+ "step": 103500
1295
+ },
1296
+ {
1297
+ "epoch": 4.57,
1298
+ "learning_rate": 1.154632423455664e-05,
1299
+ "loss": 0.1471,
1300
+ "step": 104000
1301
+ },
1302
+ {
1303
+ "epoch": 4.6,
1304
+ "learning_rate": 1.1499543891656727e-05,
1305
+ "loss": 0.1646,
1306
+ "step": 104500
1307
+ },
1308
+ {
1309
+ "epoch": 4.62,
1310
+ "learning_rate": 1.1452763548756814e-05,
1311
+ "loss": 0.1495,
1312
+ "step": 105000
1313
+ },
1314
+ {
1315
+ "epoch": 4.64,
1316
+ "learning_rate": 1.1405983205856901e-05,
1317
+ "loss": 0.1424,
1318
+ "step": 105500
1319
+ },
1320
+ {
1321
+ "epoch": 4.66,
1322
+ "learning_rate": 1.1359202862956985e-05,
1323
+ "loss": 0.1441,
1324
+ "step": 106000
1325
+ },
1326
+ {
1327
+ "epoch": 4.68,
1328
+ "learning_rate": 1.1312422520057073e-05,
1329
+ "loss": 0.1548,
1330
+ "step": 106500
1331
+ },
1332
+ {
1333
+ "epoch": 4.71,
1334
+ "learning_rate": 1.1265642177157159e-05,
1335
+ "loss": 0.1555,
1336
+ "step": 107000
1337
+ },
1338
+ {
1339
+ "epoch": 4.73,
1340
+ "learning_rate": 1.1218861834257246e-05,
1341
+ "loss": 0.1497,
1342
+ "step": 107500
1343
+ },
1344
+ {
1345
+ "epoch": 4.75,
1346
+ "learning_rate": 1.1172081491357332e-05,
1347
+ "loss": 0.1628,
1348
+ "step": 108000
1349
+ },
1350
+ {
1351
+ "epoch": 4.77,
1352
+ "learning_rate": 1.112530114845742e-05,
1353
+ "loss": 0.1454,
1354
+ "step": 108500
1355
+ },
1356
+ {
1357
+ "epoch": 4.79,
1358
+ "learning_rate": 1.1078520805557507e-05,
1359
+ "loss": 0.1394,
1360
+ "step": 109000
1361
+ },
1362
+ {
1363
+ "epoch": 4.82,
1364
+ "learning_rate": 1.1031740462657593e-05,
1365
+ "loss": 0.1535,
1366
+ "step": 109500
1367
+ },
1368
+ {
1369
+ "epoch": 4.84,
1370
+ "learning_rate": 1.098496011975768e-05,
1371
+ "loss": 0.1521,
1372
+ "step": 110000
1373
+ },
1374
+ {
1375
+ "epoch": 4.86,
1376
+ "learning_rate": 1.0938179776857764e-05,
1377
+ "loss": 0.1429,
1378
+ "step": 110500
1379
+ },
1380
+ {
1381
+ "epoch": 4.88,
1382
+ "learning_rate": 1.0891399433957852e-05,
1383
+ "loss": 0.1428,
1384
+ "step": 111000
1385
+ },
1386
+ {
1387
+ "epoch": 4.9,
1388
+ "learning_rate": 1.0844619091057937e-05,
1389
+ "loss": 0.1519,
1390
+ "step": 111500
1391
+ },
1392
+ {
1393
+ "epoch": 4.93,
1394
+ "learning_rate": 1.0797838748158025e-05,
1395
+ "loss": 0.1456,
1396
+ "step": 112000
1397
+ },
1398
+ {
1399
+ "epoch": 4.95,
1400
+ "learning_rate": 1.0751058405258112e-05,
1401
+ "loss": 0.1496,
1402
+ "step": 112500
1403
+ },
1404
+ {
1405
+ "epoch": 4.97,
1406
+ "learning_rate": 1.0704278062358198e-05,
1407
+ "loss": 0.1513,
1408
+ "step": 113000
1409
+ },
1410
+ {
1411
+ "epoch": 4.99,
1412
+ "learning_rate": 1.0657497719458285e-05,
1413
+ "loss": 0.146,
1414
+ "step": 113500
1415
+ },
1416
+ {
1417
+ "epoch": 5.0,
1418
+ "eval_accuracy": 0.9112540192926045,
1419
+ "eval_combined_score": 0.8960850249134777,
1420
+ "eval_f1": 0.8809160305343511,
1421
+ "eval_loss": 0.3276694118976593,
1422
+ "eval_runtime": 71.6884,
1423
+ "eval_samples_per_second": 563.969,
1424
+ "eval_steps_per_second": 70.5,
1425
+ "step": 113705
1426
+ },
1427
+ {
1428
+ "epoch": 5.01,
1429
+ "learning_rate": 1.0610717376558371e-05,
1430
+ "loss": 0.128,
1431
+ "step": 114000
1432
+ },
1433
+ {
1434
+ "epoch": 5.03,
1435
+ "learning_rate": 1.0563937033658459e-05,
1436
+ "loss": 0.116,
1437
+ "step": 114500
1438
+ },
1439
+ {
1440
+ "epoch": 5.06,
1441
+ "learning_rate": 1.0517156690758543e-05,
1442
+ "loss": 0.1148,
1443
+ "step": 115000
1444
+ },
1445
+ {
1446
+ "epoch": 5.08,
1447
+ "learning_rate": 1.047037634785863e-05,
1448
+ "loss": 0.1074,
1449
+ "step": 115500
1450
+ },
1451
+ {
1452
+ "epoch": 5.1,
1453
+ "learning_rate": 1.0423596004958718e-05,
1454
+ "loss": 0.114,
1455
+ "step": 116000
1456
+ },
1457
+ {
1458
+ "epoch": 5.12,
1459
+ "learning_rate": 1.0376815662058803e-05,
1460
+ "loss": 0.1092,
1461
+ "step": 116500
1462
+ },
1463
+ {
1464
+ "epoch": 5.14,
1465
+ "learning_rate": 1.033003531915889e-05,
1466
+ "loss": 0.1183,
1467
+ "step": 117000
1468
+ },
1469
+ {
1470
+ "epoch": 5.17,
1471
+ "learning_rate": 1.0283254976258977e-05,
1472
+ "loss": 0.1137,
1473
+ "step": 117500
1474
+ },
1475
+ {
1476
+ "epoch": 5.19,
1477
+ "learning_rate": 1.0236474633359064e-05,
1478
+ "loss": 0.1144,
1479
+ "step": 118000
1480
+ },
1481
+ {
1482
+ "epoch": 5.21,
1483
+ "learning_rate": 1.018969429045915e-05,
1484
+ "loss": 0.1203,
1485
+ "step": 118500
1486
+ },
1487
+ {
1488
+ "epoch": 5.23,
1489
+ "learning_rate": 1.0142913947559237e-05,
1490
+ "loss": 0.1175,
1491
+ "step": 119000
1492
+ },
1493
+ {
1494
+ "epoch": 5.25,
1495
+ "learning_rate": 1.0096133604659325e-05,
1496
+ "loss": 0.1159,
1497
+ "step": 119500
1498
+ },
1499
+ {
1500
+ "epoch": 5.28,
1501
+ "learning_rate": 1.0049353261759409e-05,
1502
+ "loss": 0.1237,
1503
+ "step": 120000
1504
+ },
1505
+ {
1506
+ "epoch": 5.3,
1507
+ "learning_rate": 1.0002572918859496e-05,
1508
+ "loss": 0.1203,
1509
+ "step": 120500
1510
+ },
1511
+ {
1512
+ "epoch": 5.32,
1513
+ "learning_rate": 9.955792575959584e-06,
1514
+ "loss": 0.115,
1515
+ "step": 121000
1516
+ },
1517
+ {
1518
+ "epoch": 5.34,
1519
+ "learning_rate": 9.90901223305967e-06,
1520
+ "loss": 0.1209,
1521
+ "step": 121500
1522
+ },
1523
+ {
1524
+ "epoch": 5.36,
1525
+ "learning_rate": 9.862231890159755e-06,
1526
+ "loss": 0.1205,
1527
+ "step": 122000
1528
+ },
1529
+ {
1530
+ "epoch": 5.39,
1531
+ "learning_rate": 9.815451547259843e-06,
1532
+ "loss": 0.1253,
1533
+ "step": 122500
1534
+ },
1535
+ {
1536
+ "epoch": 5.41,
1537
+ "learning_rate": 9.768671204359928e-06,
1538
+ "loss": 0.121,
1539
+ "step": 123000
1540
+ },
1541
+ {
1542
+ "epoch": 5.43,
1543
+ "learning_rate": 9.721890861460014e-06,
1544
+ "loss": 0.1167,
1545
+ "step": 123500
1546
+ },
1547
+ {
1548
+ "epoch": 5.45,
1549
+ "learning_rate": 9.675110518560102e-06,
1550
+ "loss": 0.1199,
1551
+ "step": 124000
1552
+ },
1553
+ {
1554
+ "epoch": 5.47,
1555
+ "learning_rate": 9.628330175660189e-06,
1556
+ "loss": 0.1216,
1557
+ "step": 124500
1558
+ },
1559
+ {
1560
+ "epoch": 5.5,
1561
+ "learning_rate": 9.581549832760275e-06,
1562
+ "loss": 0.1203,
1563
+ "step": 125000
1564
+ },
1565
+ {
1566
+ "epoch": 5.52,
1567
+ "learning_rate": 9.534769489860362e-06,
1568
+ "loss": 0.1055,
1569
+ "step": 125500
1570
+ },
1571
+ {
1572
+ "epoch": 5.54,
1573
+ "learning_rate": 9.487989146960448e-06,
1574
+ "loss": 0.1211,
1575
+ "step": 126000
1576
+ },
1577
+ {
1578
+ "epoch": 5.56,
1579
+ "learning_rate": 9.441208804060534e-06,
1580
+ "loss": 0.1259,
1581
+ "step": 126500
1582
+ },
1583
+ {
1584
+ "epoch": 5.58,
1585
+ "learning_rate": 9.394428461160621e-06,
1586
+ "loss": 0.1184,
1587
+ "step": 127000
1588
+ },
1589
+ {
1590
+ "epoch": 5.61,
1591
+ "learning_rate": 9.347648118260709e-06,
1592
+ "loss": 0.1151,
1593
+ "step": 127500
1594
+ },
1595
+ {
1596
+ "epoch": 5.63,
1597
+ "learning_rate": 9.300867775360794e-06,
1598
+ "loss": 0.1182,
1599
+ "step": 128000
1600
+ },
1601
+ {
1602
+ "epoch": 5.65,
1603
+ "learning_rate": 9.25408743246088e-06,
1604
+ "loss": 0.1234,
1605
+ "step": 128500
1606
+ },
1607
+ {
1608
+ "epoch": 5.67,
1609
+ "learning_rate": 9.207307089560968e-06,
1610
+ "loss": 0.124,
1611
+ "step": 129000
1612
+ },
1613
+ {
1614
+ "epoch": 5.69,
1615
+ "learning_rate": 9.160526746661053e-06,
1616
+ "loss": 0.1233,
1617
+ "step": 129500
1618
+ },
1619
+ {
1620
+ "epoch": 5.72,
1621
+ "learning_rate": 9.11374640376114e-06,
1622
+ "loss": 0.1249,
1623
+ "step": 130000
1624
+ },
1625
+ {
1626
+ "epoch": 5.74,
1627
+ "learning_rate": 9.066966060861227e-06,
1628
+ "loss": 0.1294,
1629
+ "step": 130500
1630
+ },
1631
+ {
1632
+ "epoch": 5.76,
1633
+ "learning_rate": 9.020185717961314e-06,
1634
+ "loss": 0.1195,
1635
+ "step": 131000
1636
+ },
1637
+ {
1638
+ "epoch": 5.78,
1639
+ "learning_rate": 8.9734053750614e-06,
1640
+ "loss": 0.121,
1641
+ "step": 131500
1642
+ },
1643
+ {
1644
+ "epoch": 5.8,
1645
+ "learning_rate": 8.926625032161487e-06,
1646
+ "loss": 0.1222,
1647
+ "step": 132000
1648
+ },
1649
+ {
1650
+ "epoch": 5.83,
1651
+ "learning_rate": 8.879844689261573e-06,
1652
+ "loss": 0.1253,
1653
+ "step": 132500
1654
+ },
1655
+ {
1656
+ "epoch": 5.85,
1657
+ "learning_rate": 8.833064346361659e-06,
1658
+ "loss": 0.1286,
1659
+ "step": 133000
1660
+ },
1661
+ {
1662
+ "epoch": 5.87,
1663
+ "learning_rate": 8.786284003461746e-06,
1664
+ "loss": 0.1246,
1665
+ "step": 133500
1666
+ },
1667
+ {
1668
+ "epoch": 5.89,
1669
+ "learning_rate": 8.739503660561832e-06,
1670
+ "loss": 0.1243,
1671
+ "step": 134000
1672
+ },
1673
+ {
1674
+ "epoch": 5.91,
1675
+ "learning_rate": 8.69272331766192e-06,
1676
+ "loss": 0.1237,
1677
+ "step": 134500
1678
+ },
1679
+ {
1680
+ "epoch": 5.94,
1681
+ "learning_rate": 8.645942974762007e-06,
1682
+ "loss": 0.1131,
1683
+ "step": 135000
1684
+ },
1685
+ {
1686
+ "epoch": 5.96,
1687
+ "learning_rate": 8.599162631862093e-06,
1688
+ "loss": 0.126,
1689
+ "step": 135500
1690
+ },
1691
+ {
1692
+ "epoch": 5.98,
1693
+ "learning_rate": 8.552382288962178e-06,
1694
+ "loss": 0.1262,
1695
+ "step": 136000
1696
+ },
1697
+ {
1698
+ "epoch": 6.0,
1699
+ "eval_accuracy": 0.9112787534009399,
1700
+ "eval_combined_score": 0.8962263843527706,
1701
+ "eval_f1": 0.8811740153046013,
1702
+ "eval_loss": 0.3939257562160492,
1703
+ "eval_runtime": 71.8403,
1704
+ "eval_samples_per_second": 562.776,
1705
+ "eval_steps_per_second": 70.35,
1706
+ "step": 136446
1707
+ },
1708
+ {
1709
+ "epoch": 6.0,
1710
+ "learning_rate": 8.505601946062266e-06,
1711
+ "loss": 0.1149,
1712
+ "step": 136500
1713
+ },
1714
+ {
1715
+ "epoch": 6.02,
1716
+ "learning_rate": 8.458821603162352e-06,
1717
+ "loss": 0.0775,
1718
+ "step": 137000
1719
+ },
1720
+ {
1721
+ "epoch": 6.05,
1722
+ "learning_rate": 8.412041260262437e-06,
1723
+ "loss": 0.0807,
1724
+ "step": 137500
1725
+ },
1726
+ {
1727
+ "epoch": 6.07,
1728
+ "learning_rate": 8.365260917362525e-06,
1729
+ "loss": 0.0882,
1730
+ "step": 138000
1731
+ },
1732
+ {
1733
+ "epoch": 6.09,
1734
+ "learning_rate": 8.318480574462612e-06,
1735
+ "loss": 0.0916,
1736
+ "step": 138500
1737
+ },
1738
+ {
1739
+ "epoch": 6.11,
1740
+ "learning_rate": 8.271700231562698e-06,
1741
+ "loss": 0.0919,
1742
+ "step": 139000
1743
+ },
1744
+ {
1745
+ "epoch": 6.13,
1746
+ "learning_rate": 8.224919888662784e-06,
1747
+ "loss": 0.0863,
1748
+ "step": 139500
1749
+ },
1750
+ {
1751
+ "epoch": 6.16,
1752
+ "learning_rate": 8.178139545762871e-06,
1753
+ "loss": 0.0955,
1754
+ "step": 140000
1755
+ },
1756
+ {
1757
+ "epoch": 6.18,
1758
+ "learning_rate": 8.131359202862957e-06,
1759
+ "loss": 0.0889,
1760
+ "step": 140500
1761
+ },
1762
+ {
1763
+ "epoch": 6.2,
1764
+ "learning_rate": 8.084578859963044e-06,
1765
+ "loss": 0.0957,
1766
+ "step": 141000
1767
+ },
1768
+ {
1769
+ "epoch": 6.22,
1770
+ "learning_rate": 8.037798517063132e-06,
1771
+ "loss": 0.1,
1772
+ "step": 141500
1773
+ },
1774
+ {
1775
+ "epoch": 6.24,
1776
+ "learning_rate": 7.991018174163218e-06,
1777
+ "loss": 0.0933,
1778
+ "step": 142000
1779
+ },
1780
+ {
1781
+ "epoch": 6.27,
1782
+ "learning_rate": 7.944237831263303e-06,
1783
+ "loss": 0.0844,
1784
+ "step": 142500
1785
+ },
1786
+ {
1787
+ "epoch": 6.29,
1788
+ "learning_rate": 7.89745748836339e-06,
1789
+ "loss": 0.0914,
1790
+ "step": 143000
1791
+ },
1792
+ {
1793
+ "epoch": 6.31,
1794
+ "learning_rate": 7.850677145463477e-06,
1795
+ "loss": 0.0885,
1796
+ "step": 143500
1797
+ },
1798
+ {
1799
+ "epoch": 6.33,
1800
+ "learning_rate": 7.803896802563562e-06,
1801
+ "loss": 0.0895,
1802
+ "step": 144000
1803
+ },
1804
+ {
1805
+ "epoch": 6.35,
1806
+ "learning_rate": 7.75711645966365e-06,
1807
+ "loss": 0.0869,
1808
+ "step": 144500
1809
+ },
1810
+ {
1811
+ "epoch": 6.38,
1812
+ "learning_rate": 7.710336116763737e-06,
1813
+ "loss": 0.0933,
1814
+ "step": 145000
1815
+ },
1816
+ {
1817
+ "epoch": 6.4,
1818
+ "learning_rate": 7.663555773863823e-06,
1819
+ "loss": 0.0841,
1820
+ "step": 145500
1821
+ },
1822
+ {
1823
+ "epoch": 6.42,
1824
+ "learning_rate": 7.61677543096391e-06,
1825
+ "loss": 0.0908,
1826
+ "step": 146000
1827
+ },
1828
+ {
1829
+ "epoch": 6.44,
1830
+ "learning_rate": 7.569995088063996e-06,
1831
+ "loss": 0.0942,
1832
+ "step": 146500
1833
+ },
1834
+ {
1835
+ "epoch": 6.46,
1836
+ "learning_rate": 7.523214745164083e-06,
1837
+ "loss": 0.0872,
1838
+ "step": 147000
1839
+ },
1840
+ {
1841
+ "epoch": 6.49,
1842
+ "learning_rate": 7.476434402264169e-06,
1843
+ "loss": 0.0894,
1844
+ "step": 147500
1845
+ },
1846
+ {
1847
+ "epoch": 6.51,
1848
+ "learning_rate": 7.429654059364255e-06,
1849
+ "loss": 0.0888,
1850
+ "step": 148000
1851
+ },
1852
+ {
1853
+ "epoch": 6.53,
1854
+ "learning_rate": 7.382873716464343e-06,
1855
+ "loss": 0.093,
1856
+ "step": 148500
1857
+ },
1858
+ {
1859
+ "epoch": 6.55,
1860
+ "learning_rate": 7.336093373564429e-06,
1861
+ "loss": 0.0885,
1862
+ "step": 149000
1863
+ },
1864
+ {
1865
+ "epoch": 6.57,
1866
+ "learning_rate": 7.289313030664516e-06,
1867
+ "loss": 0.092,
1868
+ "step": 149500
1869
+ },
1870
+ {
1871
+ "epoch": 6.6,
1872
+ "learning_rate": 7.242532687764602e-06,
1873
+ "loss": 0.0956,
1874
+ "step": 150000
1875
+ },
1876
+ {
1877
+ "epoch": 6.62,
1878
+ "learning_rate": 7.195752344864688e-06,
1879
+ "loss": 0.1009,
1880
+ "step": 150500
1881
+ },
1882
+ {
1883
+ "epoch": 6.64,
1884
+ "learning_rate": 7.148972001964775e-06,
1885
+ "loss": 0.0956,
1886
+ "step": 151000
1887
+ },
1888
+ {
1889
+ "epoch": 6.66,
1890
+ "learning_rate": 7.102191659064862e-06,
1891
+ "loss": 0.0908,
1892
+ "step": 151500
1893
+ },
1894
+ {
1895
+ "epoch": 6.68,
1896
+ "learning_rate": 7.055411316164949e-06,
1897
+ "loss": 0.0878,
1898
+ "step": 152000
1899
+ },
1900
+ {
1901
+ "epoch": 6.71,
1902
+ "learning_rate": 7.008630973265035e-06,
1903
+ "loss": 0.1011,
1904
+ "step": 152500
1905
+ },
1906
+ {
1907
+ "epoch": 6.73,
1908
+ "learning_rate": 6.961850630365121e-06,
1909
+ "loss": 0.0898,
1910
+ "step": 153000
1911
+ },
1912
+ {
1913
+ "epoch": 6.75,
1914
+ "learning_rate": 6.915070287465208e-06,
1915
+ "loss": 0.0948,
1916
+ "step": 153500
1917
+ },
1918
+ {
1919
+ "epoch": 6.77,
1920
+ "learning_rate": 6.868289944565294e-06,
1921
+ "loss": 0.0908,
1922
+ "step": 154000
1923
+ },
1924
+ {
1925
+ "epoch": 6.79,
1926
+ "learning_rate": 6.82150960166538e-06,
1927
+ "loss": 0.0935,
1928
+ "step": 154500
1929
+ },
1930
+ {
1931
+ "epoch": 6.82,
1932
+ "learning_rate": 6.774729258765468e-06,
1933
+ "loss": 0.0987,
1934
+ "step": 155000
1935
+ },
1936
+ {
1937
+ "epoch": 6.84,
1938
+ "learning_rate": 6.727948915865554e-06,
1939
+ "loss": 0.0971,
1940
+ "step": 155500
1941
+ },
1942
+ {
1943
+ "epoch": 6.86,
1944
+ "learning_rate": 6.681168572965641e-06,
1945
+ "loss": 0.0894,
1946
+ "step": 156000
1947
+ },
1948
+ {
1949
+ "epoch": 6.88,
1950
+ "learning_rate": 6.634388230065727e-06,
1951
+ "loss": 0.0925,
1952
+ "step": 156500
1953
+ },
1954
+ {
1955
+ "epoch": 6.9,
1956
+ "learning_rate": 6.587607887165813e-06,
1957
+ "loss": 0.0831,
1958
+ "step": 157000
1959
+ },
1960
+ {
1961
+ "epoch": 6.93,
1962
+ "learning_rate": 6.5408275442659e-06,
1963
+ "loss": 0.1021,
1964
+ "step": 157500
1965
+ },
1966
+ {
1967
+ "epoch": 6.95,
1968
+ "learning_rate": 6.4940472013659864e-06,
1969
+ "loss": 0.0974,
1970
+ "step": 158000
1971
+ },
1972
+ {
1973
+ "epoch": 6.97,
1974
+ "learning_rate": 6.447266858466074e-06,
1975
+ "loss": 0.0993,
1976
+ "step": 158500
1977
+ },
1978
+ {
1979
+ "epoch": 6.99,
1980
+ "learning_rate": 6.40048651556616e-06,
1981
+ "loss": 0.0867,
1982
+ "step": 159000
1983
+ },
1984
+ {
1985
+ "epoch": 7.0,
1986
+ "eval_accuracy": 0.9152609448429384,
1987
+ "eval_combined_score": 0.9009873932600381,
1988
+ "eval_f1": 0.8867138416771377,
1989
+ "eval_loss": 0.44352227449417114,
1990
+ "eval_runtime": 71.6368,
1991
+ "eval_samples_per_second": 564.374,
1992
+ "eval_steps_per_second": 70.55,
1993
+ "step": 159187
1994
+ },
1995
+ {
1996
+ "epoch": 7.01,
1997
+ "learning_rate": 6.353706172666246e-06,
1998
+ "loss": 0.0648,
1999
+ "step": 159500
2000
+ },
2001
+ {
2002
+ "epoch": 7.04,
2003
+ "learning_rate": 6.306925829766333e-06,
2004
+ "loss": 0.0651,
2005
+ "step": 160000
2006
+ },
2007
+ {
2008
+ "epoch": 7.06,
2009
+ "learning_rate": 6.2601454868664195e-06,
2010
+ "loss": 0.0692,
2011
+ "step": 160500
2012
+ },
2013
+ {
2014
+ "epoch": 7.08,
2015
+ "learning_rate": 6.213365143966505e-06,
2016
+ "loss": 0.0699,
2017
+ "step": 161000
2018
+ },
2019
+ {
2020
+ "epoch": 7.1,
2021
+ "learning_rate": 6.166584801066592e-06,
2022
+ "loss": 0.0668,
2023
+ "step": 161500
2024
+ },
2025
+ {
2026
+ "epoch": 7.12,
2027
+ "learning_rate": 6.119804458166679e-06,
2028
+ "loss": 0.0606,
2029
+ "step": 162000
2030
+ },
2031
+ {
2032
+ "epoch": 7.15,
2033
+ "learning_rate": 6.073024115266766e-06,
2034
+ "loss": 0.0728,
2035
+ "step": 162500
2036
+ },
2037
+ {
2038
+ "epoch": 7.17,
2039
+ "learning_rate": 6.0262437723668525e-06,
2040
+ "loss": 0.0627,
2041
+ "step": 163000
2042
+ },
2043
+ {
2044
+ "epoch": 7.19,
2045
+ "learning_rate": 5.979463429466938e-06,
2046
+ "loss": 0.0691,
2047
+ "step": 163500
2048
+ },
2049
+ {
2050
+ "epoch": 7.21,
2051
+ "learning_rate": 5.932683086567025e-06,
2052
+ "loss": 0.0653,
2053
+ "step": 164000
2054
+ },
2055
+ {
2056
+ "epoch": 7.23,
2057
+ "learning_rate": 5.8859027436671114e-06,
2058
+ "loss": 0.0659,
2059
+ "step": 164500
2060
+ },
2061
+ {
2062
+ "epoch": 7.26,
2063
+ "learning_rate": 5.839122400767197e-06,
2064
+ "loss": 0.0613,
2065
+ "step": 165000
2066
+ },
2067
+ {
2068
+ "epoch": 7.28,
2069
+ "learning_rate": 5.7923420578672855e-06,
2070
+ "loss": 0.0599,
2071
+ "step": 165500
2072
+ },
2073
+ {
2074
+ "epoch": 7.3,
2075
+ "learning_rate": 5.745561714967371e-06,
2076
+ "loss": 0.0723,
2077
+ "step": 166000
2078
+ },
2079
+ {
2080
+ "epoch": 7.32,
2081
+ "learning_rate": 5.698781372067458e-06,
2082
+ "loss": 0.0709,
2083
+ "step": 166500
2084
+ },
2085
+ {
2086
+ "epoch": 7.34,
2087
+ "learning_rate": 5.6520010291675445e-06,
2088
+ "loss": 0.0714,
2089
+ "step": 167000
2090
+ },
2091
+ {
2092
+ "epoch": 7.37,
2093
+ "learning_rate": 5.60522068626763e-06,
2094
+ "loss": 0.0637,
2095
+ "step": 167500
2096
+ },
2097
+ {
2098
+ "epoch": 7.39,
2099
+ "learning_rate": 5.558440343367717e-06,
2100
+ "loss": 0.0726,
2101
+ "step": 168000
2102
+ },
2103
+ {
2104
+ "epoch": 7.41,
2105
+ "learning_rate": 5.5116600004678034e-06,
2106
+ "loss": 0.0632,
2107
+ "step": 168500
2108
+ },
2109
+ {
2110
+ "epoch": 7.43,
2111
+ "learning_rate": 5.464879657567891e-06,
2112
+ "loss": 0.0735,
2113
+ "step": 169000
2114
+ },
2115
+ {
2116
+ "epoch": 7.45,
2117
+ "learning_rate": 5.4180993146679775e-06,
2118
+ "loss": 0.0693,
2119
+ "step": 169500
2120
+ },
2121
+ {
2122
+ "epoch": 7.48,
2123
+ "learning_rate": 5.371318971768063e-06,
2124
+ "loss": 0.07,
2125
+ "step": 170000
2126
+ },
2127
+ {
2128
+ "epoch": 7.5,
2129
+ "learning_rate": 5.32453862886815e-06,
2130
+ "loss": 0.0699,
2131
+ "step": 170500
2132
+ },
2133
+ {
2134
+ "epoch": 7.52,
2135
+ "learning_rate": 5.2777582859682365e-06,
2136
+ "loss": 0.0617,
2137
+ "step": 171000
2138
+ },
2139
+ {
2140
+ "epoch": 7.54,
2141
+ "learning_rate": 5.230977943068323e-06,
2142
+ "loss": 0.056,
2143
+ "step": 171500
2144
+ },
2145
+ {
2146
+ "epoch": 7.56,
2147
+ "learning_rate": 5.184197600168409e-06,
2148
+ "loss": 0.0644,
2149
+ "step": 172000
2150
+ },
2151
+ {
2152
+ "epoch": 7.59,
2153
+ "learning_rate": 5.137417257268496e-06,
2154
+ "loss": 0.0694,
2155
+ "step": 172500
2156
+ },
2157
+ {
2158
+ "epoch": 7.61,
2159
+ "learning_rate": 5.090636914368583e-06,
2160
+ "loss": 0.0642,
2161
+ "step": 173000
2162
+ },
2163
+ {
2164
+ "epoch": 7.63,
2165
+ "learning_rate": 5.0438565714686695e-06,
2166
+ "loss": 0.0673,
2167
+ "step": 173500
2168
+ },
2169
+ {
2170
+ "epoch": 7.65,
2171
+ "learning_rate": 4.997076228568756e-06,
2172
+ "loss": 0.0682,
2173
+ "step": 174000
2174
+ },
2175
+ {
2176
+ "epoch": 7.67,
2177
+ "learning_rate": 4.950295885668842e-06,
2178
+ "loss": 0.0671,
2179
+ "step": 174500
2180
+ },
2181
+ {
2182
+ "epoch": 7.7,
2183
+ "learning_rate": 4.903515542768929e-06,
2184
+ "loss": 0.0734,
2185
+ "step": 175000
2186
+ },
2187
+ {
2188
+ "epoch": 7.72,
2189
+ "learning_rate": 4.856735199869015e-06,
2190
+ "loss": 0.0665,
2191
+ "step": 175500
2192
+ },
2193
+ {
2194
+ "epoch": 7.74,
2195
+ "learning_rate": 4.809954856969102e-06,
2196
+ "loss": 0.0644,
2197
+ "step": 176000
2198
+ },
2199
+ {
2200
+ "epoch": 7.76,
2201
+ "learning_rate": 4.763174514069189e-06,
2202
+ "loss": 0.0681,
2203
+ "step": 176500
2204
+ },
2205
+ {
2206
+ "epoch": 7.78,
2207
+ "learning_rate": 4.716394171169275e-06,
2208
+ "loss": 0.0755,
2209
+ "step": 177000
2210
+ },
2211
+ {
2212
+ "epoch": 7.81,
2213
+ "learning_rate": 4.6696138282693615e-06,
2214
+ "loss": 0.0758,
2215
+ "step": 177500
2216
+ },
2217
+ {
2218
+ "epoch": 7.83,
2219
+ "learning_rate": 4.622833485369448e-06,
2220
+ "loss": 0.0633,
2221
+ "step": 178000
2222
+ },
2223
+ {
2224
+ "epoch": 7.85,
2225
+ "learning_rate": 4.576053142469535e-06,
2226
+ "loss": 0.0603,
2227
+ "step": 178500
2228
+ },
2229
+ {
2230
+ "epoch": 7.87,
2231
+ "learning_rate": 4.529272799569621e-06,
2232
+ "loss": 0.0703,
2233
+ "step": 179000
2234
+ },
2235
+ {
2236
+ "epoch": 7.89,
2237
+ "learning_rate": 4.482492456669708e-06,
2238
+ "loss": 0.062,
2239
+ "step": 179500
2240
+ },
2241
+ {
2242
+ "epoch": 7.92,
2243
+ "learning_rate": 4.4357121137697945e-06,
2244
+ "loss": 0.0733,
2245
+ "step": 180000
2246
+ },
2247
+ {
2248
+ "epoch": 7.94,
2249
+ "learning_rate": 4.388931770869881e-06,
2250
+ "loss": 0.0717,
2251
+ "step": 180500
2252
+ },
2253
+ {
2254
+ "epoch": 7.96,
2255
+ "learning_rate": 4.342151427969967e-06,
2256
+ "loss": 0.0612,
2257
+ "step": 181000
2258
+ },
2259
+ {
2260
+ "epoch": 7.98,
2261
+ "learning_rate": 4.295371085070054e-06,
2262
+ "loss": 0.0757,
2263
+ "step": 181500
2264
+ },
2265
+ {
2266
+ "epoch": 8.0,
2267
+ "eval_accuracy": 0.9147415285678951,
2268
+ "eval_combined_score": 0.8995565346371729,
2269
+ "eval_f1": 0.8843715407064506,
2270
+ "eval_loss": 0.4811749756336212,
2271
+ "eval_runtime": 72.7906,
2272
+ "eval_samples_per_second": 555.429,
2273
+ "eval_steps_per_second": 69.432,
2274
+ "step": 181928
2275
+ },
2276
+ {
2277
+ "epoch": 8.0,
2278
+ "learning_rate": 4.248590742170141e-06,
2279
+ "loss": 0.0629,
2280
+ "step": 182000
2281
+ },
2282
+ {
2283
+ "epoch": 8.03,
2284
+ "learning_rate": 4.201810399270227e-06,
2285
+ "loss": 0.0438,
2286
+ "step": 182500
2287
+ },
2288
+ {
2289
+ "epoch": 8.05,
2290
+ "learning_rate": 4.155030056370313e-06,
2291
+ "loss": 0.0419,
2292
+ "step": 183000
2293
+ },
2294
+ {
2295
+ "epoch": 8.07,
2296
+ "learning_rate": 4.1082497134704e-06,
2297
+ "loss": 0.0343,
2298
+ "step": 183500
2299
+ },
2300
+ {
2301
+ "epoch": 8.09,
2302
+ "learning_rate": 4.0614693705704865e-06,
2303
+ "loss": 0.0557,
2304
+ "step": 184000
2305
+ },
2306
+ {
2307
+ "epoch": 8.11,
2308
+ "learning_rate": 4.014689027670573e-06,
2309
+ "loss": 0.052,
2310
+ "step": 184500
2311
+ },
2312
+ {
2313
+ "epoch": 8.14,
2314
+ "learning_rate": 3.96790868477066e-06,
2315
+ "loss": 0.0446,
2316
+ "step": 185000
2317
+ },
2318
+ {
2319
+ "epoch": 8.16,
2320
+ "learning_rate": 3.921128341870746e-06,
2321
+ "loss": 0.0473,
2322
+ "step": 185500
2323
+ },
2324
+ {
2325
+ "epoch": 8.18,
2326
+ "learning_rate": 3.874347998970833e-06,
2327
+ "loss": 0.0455,
2328
+ "step": 186000
2329
+ },
2330
+ {
2331
+ "epoch": 8.2,
2332
+ "learning_rate": 3.827567656070919e-06,
2333
+ "loss": 0.0486,
2334
+ "step": 186500
2335
+ },
2336
+ {
2337
+ "epoch": 8.22,
2338
+ "learning_rate": 3.780787313171006e-06,
2339
+ "loss": 0.0412,
2340
+ "step": 187000
2341
+ },
2342
+ {
2343
+ "epoch": 8.25,
2344
+ "learning_rate": 3.7340069702710923e-06,
2345
+ "loss": 0.0502,
2346
+ "step": 187500
2347
+ },
2348
+ {
2349
+ "epoch": 8.27,
2350
+ "learning_rate": 3.687226627371179e-06,
2351
+ "loss": 0.0435,
2352
+ "step": 188000
2353
+ },
2354
+ {
2355
+ "epoch": 8.29,
2356
+ "learning_rate": 3.6404462844712655e-06,
2357
+ "loss": 0.0391,
2358
+ "step": 188500
2359
+ },
2360
+ {
2361
+ "epoch": 8.31,
2362
+ "learning_rate": 3.593665941571352e-06,
2363
+ "loss": 0.0476,
2364
+ "step": 189000
2365
+ },
2366
+ {
2367
+ "epoch": 8.33,
2368
+ "learning_rate": 3.5468855986714383e-06,
2369
+ "loss": 0.0468,
2370
+ "step": 189500
2371
+ },
2372
+ {
2373
+ "epoch": 8.35,
2374
+ "learning_rate": 3.500105255771525e-06,
2375
+ "loss": 0.043,
2376
+ "step": 190000
2377
+ },
2378
+ {
2379
+ "epoch": 8.38,
2380
+ "learning_rate": 3.453324912871612e-06,
2381
+ "loss": 0.0457,
2382
+ "step": 190500
2383
+ },
2384
+ {
2385
+ "epoch": 8.4,
2386
+ "learning_rate": 3.406544569971698e-06,
2387
+ "loss": 0.0445,
2388
+ "step": 191000
2389
+ },
2390
+ {
2391
+ "epoch": 8.42,
2392
+ "learning_rate": 3.3597642270717847e-06,
2393
+ "loss": 0.0532,
2394
+ "step": 191500
2395
+ },
2396
+ {
2397
+ "epoch": 8.44,
2398
+ "learning_rate": 3.3129838841718713e-06,
2399
+ "loss": 0.049,
2400
+ "step": 192000
2401
+ },
2402
+ {
2403
+ "epoch": 8.46,
2404
+ "learning_rate": 3.266203541271958e-06,
2405
+ "loss": 0.0451,
2406
+ "step": 192500
2407
+ },
2408
+ {
2409
+ "epoch": 8.49,
2410
+ "learning_rate": 3.219423198372044e-06,
2411
+ "loss": 0.0493,
2412
+ "step": 193000
2413
+ },
2414
+ {
2415
+ "epoch": 8.51,
2416
+ "learning_rate": 3.172642855472131e-06,
2417
+ "loss": 0.0387,
2418
+ "step": 193500
2419
+ },
2420
+ {
2421
+ "epoch": 8.53,
2422
+ "learning_rate": 3.1258625125722173e-06,
2423
+ "loss": 0.0597,
2424
+ "step": 194000
2425
+ },
2426
+ {
2427
+ "epoch": 8.55,
2428
+ "learning_rate": 3.079082169672304e-06,
2429
+ "loss": 0.0549,
2430
+ "step": 194500
2431
+ },
2432
+ {
2433
+ "epoch": 8.57,
2434
+ "learning_rate": 3.03230182677239e-06,
2435
+ "loss": 0.0455,
2436
+ "step": 195000
2437
+ },
2438
+ {
2439
+ "epoch": 8.6,
2440
+ "learning_rate": 2.985521483872477e-06,
2441
+ "loss": 0.0487,
2442
+ "step": 195500
2443
+ },
2444
+ {
2445
+ "epoch": 8.62,
2446
+ "learning_rate": 2.9387411409725637e-06,
2447
+ "loss": 0.0485,
2448
+ "step": 196000
2449
+ },
2450
+ {
2451
+ "epoch": 8.64,
2452
+ "learning_rate": 2.89196079807265e-06,
2453
+ "loss": 0.0471,
2454
+ "step": 196500
2455
+ },
2456
+ {
2457
+ "epoch": 8.66,
2458
+ "learning_rate": 2.845180455172737e-06,
2459
+ "loss": 0.0508,
2460
+ "step": 197000
2461
+ },
2462
+ {
2463
+ "epoch": 8.68,
2464
+ "learning_rate": 2.798400112272823e-06,
2465
+ "loss": 0.0489,
2466
+ "step": 197500
2467
+ },
2468
+ {
2469
+ "epoch": 8.71,
2470
+ "learning_rate": 2.7516197693729097e-06,
2471
+ "loss": 0.0498,
2472
+ "step": 198000
2473
+ },
2474
+ {
2475
+ "epoch": 8.73,
2476
+ "learning_rate": 2.704839426472996e-06,
2477
+ "loss": 0.0457,
2478
+ "step": 198500
2479
+ },
2480
+ {
2481
+ "epoch": 8.75,
2482
+ "learning_rate": 2.658059083573083e-06,
2483
+ "loss": 0.0434,
2484
+ "step": 199000
2485
+ },
2486
+ {
2487
+ "epoch": 8.77,
2488
+ "learning_rate": 2.6112787406731695e-06,
2489
+ "loss": 0.0394,
2490
+ "step": 199500
2491
+ },
2492
+ {
2493
+ "epoch": 8.79,
2494
+ "learning_rate": 2.5644983977732557e-06,
2495
+ "loss": 0.0422,
2496
+ "step": 200000
2497
+ },
2498
+ {
2499
+ "epoch": 8.82,
2500
+ "learning_rate": 2.5177180548733427e-06,
2501
+ "loss": 0.0505,
2502
+ "step": 200500
2503
+ },
2504
+ {
2505
+ "epoch": 8.84,
2506
+ "learning_rate": 2.470937711973429e-06,
2507
+ "loss": 0.0574,
2508
+ "step": 201000
2509
+ },
2510
+ {
2511
+ "epoch": 8.86,
2512
+ "learning_rate": 2.4241573690735155e-06,
2513
+ "loss": 0.0445,
2514
+ "step": 201500
2515
+ },
2516
+ {
2517
+ "epoch": 8.88,
2518
+ "learning_rate": 2.377377026173602e-06,
2519
+ "loss": 0.0565,
2520
+ "step": 202000
2521
+ },
2522
+ {
2523
+ "epoch": 8.9,
2524
+ "learning_rate": 2.3305966832736887e-06,
2525
+ "loss": 0.0519,
2526
+ "step": 202500
2527
+ },
2528
+ {
2529
+ "epoch": 8.93,
2530
+ "learning_rate": 2.283816340373775e-06,
2531
+ "loss": 0.0508,
2532
+ "step": 203000
2533
+ },
2534
+ {
2535
+ "epoch": 8.95,
2536
+ "learning_rate": 2.237035997473862e-06,
2537
+ "loss": 0.0467,
2538
+ "step": 203500
2539
+ },
2540
+ {
2541
+ "epoch": 8.97,
2542
+ "learning_rate": 2.190255654573948e-06,
2543
+ "loss": 0.0466,
2544
+ "step": 204000
2545
+ },
2546
+ {
2547
+ "epoch": 8.99,
2548
+ "learning_rate": 2.1434753116740347e-06,
2549
+ "loss": 0.0479,
2550
+ "step": 204500
2551
+ },
2552
+ {
2553
+ "epoch": 9.0,
2554
+ "eval_accuracy": 0.9150878060845906,
2555
+ "eval_combined_score": 0.9010856587900834,
2556
+ "eval_f1": 0.8870835114955762,
2557
+ "eval_loss": 0.5081153512001038,
2558
+ "eval_runtime": 92.6456,
2559
+ "eval_samples_per_second": 436.394,
2560
+ "eval_steps_per_second": 54.552,
2561
+ "step": 204669
2562
+ },
2563
+ {
2564
+ "epoch": 9.01,
2565
+ "learning_rate": 2.0966949687741213e-06,
2566
+ "loss": 0.0354,
2567
+ "step": 205000
2568
+ },
2569
+ {
2570
+ "epoch": 9.04,
2571
+ "learning_rate": 2.049914625874208e-06,
2572
+ "loss": 0.037,
2573
+ "step": 205500
2574
+ },
2575
+ {
2576
+ "epoch": 9.06,
2577
+ "learning_rate": 2.0031342829742946e-06,
2578
+ "loss": 0.0354,
2579
+ "step": 206000
2580
+ },
2581
+ {
2582
+ "epoch": 9.08,
2583
+ "learning_rate": 1.9563539400743807e-06,
2584
+ "loss": 0.029,
2585
+ "step": 206500
2586
+ },
2587
+ {
2588
+ "epoch": 9.1,
2589
+ "learning_rate": 1.9095735971744673e-06,
2590
+ "loss": 0.0363,
2591
+ "step": 207000
2592
+ },
2593
+ {
2594
+ "epoch": 9.12,
2595
+ "learning_rate": 1.862793254274554e-06,
2596
+ "loss": 0.0331,
2597
+ "step": 207500
2598
+ },
2599
+ {
2600
+ "epoch": 9.15,
2601
+ "learning_rate": 1.8160129113746405e-06,
2602
+ "loss": 0.03,
2603
+ "step": 208000
2604
+ },
2605
+ {
2606
+ "epoch": 9.17,
2607
+ "learning_rate": 1.7692325684747272e-06,
2608
+ "loss": 0.0314,
2609
+ "step": 208500
2610
+ },
2611
+ {
2612
+ "epoch": 9.19,
2613
+ "learning_rate": 1.7224522255748135e-06,
2614
+ "loss": 0.033,
2615
+ "step": 209000
2616
+ },
2617
+ {
2618
+ "epoch": 9.21,
2619
+ "learning_rate": 1.6756718826749001e-06,
2620
+ "loss": 0.0307,
2621
+ "step": 209500
2622
+ },
2623
+ {
2624
+ "epoch": 9.23,
2625
+ "learning_rate": 1.6288915397749865e-06,
2626
+ "loss": 0.0284,
2627
+ "step": 210000
2628
+ },
2629
+ {
2630
+ "epoch": 9.26,
2631
+ "learning_rate": 1.5821111968750731e-06,
2632
+ "loss": 0.0311,
2633
+ "step": 210500
2634
+ },
2635
+ {
2636
+ "epoch": 9.28,
2637
+ "learning_rate": 1.5353308539751595e-06,
2638
+ "loss": 0.036,
2639
+ "step": 211000
2640
+ },
2641
+ {
2642
+ "epoch": 9.3,
2643
+ "learning_rate": 1.4885505110752464e-06,
2644
+ "loss": 0.0321,
2645
+ "step": 211500
2646
+ },
2647
+ {
2648
+ "epoch": 9.32,
2649
+ "learning_rate": 1.441770168175333e-06,
2650
+ "loss": 0.0303,
2651
+ "step": 212000
2652
+ },
2653
+ {
2654
+ "epoch": 9.34,
2655
+ "learning_rate": 1.3949898252754194e-06,
2656
+ "loss": 0.0326,
2657
+ "step": 212500
2658
+ },
2659
+ {
2660
+ "epoch": 9.37,
2661
+ "learning_rate": 1.348209482375506e-06,
2662
+ "loss": 0.03,
2663
+ "step": 213000
2664
+ },
2665
+ {
2666
+ "epoch": 9.39,
2667
+ "learning_rate": 1.3014291394755924e-06,
2668
+ "loss": 0.0325,
2669
+ "step": 213500
2670
+ },
2671
+ {
2672
+ "epoch": 9.41,
2673
+ "learning_rate": 1.254648796575679e-06,
2674
+ "loss": 0.034,
2675
+ "step": 214000
2676
+ },
2677
+ {
2678
+ "epoch": 9.43,
2679
+ "learning_rate": 1.2078684536757656e-06,
2680
+ "loss": 0.0423,
2681
+ "step": 214500
2682
+ },
2683
+ {
2684
+ "epoch": 9.45,
2685
+ "learning_rate": 1.161088110775852e-06,
2686
+ "loss": 0.0353,
2687
+ "step": 215000
2688
+ },
2689
+ {
2690
+ "epoch": 9.48,
2691
+ "learning_rate": 1.1143077678759386e-06,
2692
+ "loss": 0.0301,
2693
+ "step": 215500
2694
+ },
2695
+ {
2696
+ "epoch": 9.5,
2697
+ "learning_rate": 1.0675274249760252e-06,
2698
+ "loss": 0.0315,
2699
+ "step": 216000
2700
+ },
2701
+ {
2702
+ "epoch": 9.52,
2703
+ "learning_rate": 1.0207470820761118e-06,
2704
+ "loss": 0.0361,
2705
+ "step": 216500
2706
+ },
2707
+ {
2708
+ "epoch": 9.54,
2709
+ "learning_rate": 9.739667391761982e-07,
2710
+ "loss": 0.0388,
2711
+ "step": 217000
2712
+ },
2713
+ {
2714
+ "epoch": 9.56,
2715
+ "learning_rate": 9.271863962762848e-07,
2716
+ "loss": 0.0309,
2717
+ "step": 217500
2718
+ },
2719
+ {
2720
+ "epoch": 9.59,
2721
+ "learning_rate": 8.804060533763713e-07,
2722
+ "loss": 0.0299,
2723
+ "step": 218000
2724
+ },
2725
+ {
2726
+ "epoch": 9.61,
2727
+ "learning_rate": 8.336257104764578e-07,
2728
+ "loss": 0.0357,
2729
+ "step": 218500
2730
+ },
2731
+ {
2732
+ "epoch": 9.63,
2733
+ "learning_rate": 7.868453675765445e-07,
2734
+ "loss": 0.0285,
2735
+ "step": 219000
2736
+ },
2737
+ {
2738
+ "epoch": 9.65,
2739
+ "learning_rate": 7.40065024676631e-07,
2740
+ "loss": 0.0238,
2741
+ "step": 219500
2742
+ },
2743
+ {
2744
+ "epoch": 9.67,
2745
+ "learning_rate": 6.932846817767175e-07,
2746
+ "loss": 0.0284,
2747
+ "step": 220000
2748
+ },
2749
+ {
2750
+ "epoch": 9.7,
2751
+ "learning_rate": 6.46504338876804e-07,
2752
+ "loss": 0.033,
2753
+ "step": 220500
2754
+ },
2755
+ {
2756
+ "epoch": 9.72,
2757
+ "learning_rate": 5.997239959768906e-07,
2758
+ "loss": 0.031,
2759
+ "step": 221000
2760
+ },
2761
+ {
2762
+ "epoch": 9.74,
2763
+ "learning_rate": 5.529436530769771e-07,
2764
+ "loss": 0.028,
2765
+ "step": 221500
2766
+ },
2767
+ {
2768
+ "epoch": 9.76,
2769
+ "learning_rate": 5.061633101770636e-07,
2770
+ "loss": 0.0264,
2771
+ "step": 222000
2772
+ },
2773
+ {
2774
+ "epoch": 9.78,
2775
+ "learning_rate": 4.5938296727715023e-07,
2776
+ "loss": 0.0341,
2777
+ "step": 222500
2778
+ },
2779
+ {
2780
+ "epoch": 9.81,
2781
+ "learning_rate": 4.1260262437723673e-07,
2782
+ "loss": 0.0252,
2783
+ "step": 223000
2784
+ },
2785
+ {
2786
+ "epoch": 9.83,
2787
+ "learning_rate": 3.6582228147732323e-07,
2788
+ "loss": 0.0348,
2789
+ "step": 223500
2790
+ },
2791
+ {
2792
+ "epoch": 9.85,
2793
+ "learning_rate": 3.1904193857740983e-07,
2794
+ "loss": 0.0315,
2795
+ "step": 224000
2796
+ },
2797
+ {
2798
+ "epoch": 9.87,
2799
+ "learning_rate": 2.7226159567749633e-07,
2800
+ "loss": 0.0352,
2801
+ "step": 224500
2802
+ },
2803
+ {
2804
+ "epoch": 9.89,
2805
+ "learning_rate": 2.2548125277758288e-07,
2806
+ "loss": 0.0342,
2807
+ "step": 225000
2808
+ },
2809
+ {
2810
+ "epoch": 9.92,
2811
+ "learning_rate": 1.787009098776694e-07,
2812
+ "loss": 0.0313,
2813
+ "step": 225500
2814
+ },
2815
+ {
2816
+ "epoch": 9.94,
2817
+ "learning_rate": 1.3192056697775596e-07,
2818
+ "loss": 0.0324,
2819
+ "step": 226000
2820
+ },
2821
+ {
2822
+ "epoch": 9.96,
2823
+ "learning_rate": 8.51402240778425e-08,
2824
+ "loss": 0.0274,
2825
+ "step": 226500
2826
+ },
2827
+ {
2828
+ "epoch": 9.98,
2829
+ "learning_rate": 3.835988117792904e-08,
2830
+ "loss": 0.0379,
2831
+ "step": 227000
2832
+ },
2833
+ {
2834
+ "epoch": 10.0,
2835
+ "eval_accuracy": 0.9148899332179075,
2836
+ "eval_combined_score": 0.9003480960683503,
2837
+ "eval_f1": 0.8858062589187933,
2838
+ "eval_loss": 0.5646682977676392,
2839
+ "eval_runtime": 85.1463,
2840
+ "eval_samples_per_second": 474.83,
2841
+ "eval_steps_per_second": 59.357,
2842
+ "step": 227410
2843
+ },
2844
+ {
2845
+ "epoch": 10.0,
2846
+ "step": 227410,
2847
+ "total_flos": 2.393297626212864e+17,
2848
+ "train_loss": 0.1477880122620598,
2849
+ "train_runtime": 22868.2253,
2850
+ "train_samples_per_second": 159.105,
2851
+ "train_steps_per_second": 9.944
2852
+ }
2853
+ ],
2854
+ "max_steps": 227410,
2855
+ "num_train_epochs": 10,
2856
+ "total_flos": 2.393297626212864e+17,
2857
+ "trial_name": null,
2858
+ "trial_params": null
2859
+ }