emendes3 commited on
Commit
ae2966c
1 Parent(s): 6b2f015

Model save

Browse files
README.md CHANGED
@@ -1,27 +1,19 @@
1
  ---
2
  library_name: peft
3
  tags:
4
- - liuhaotian/llava-v1.5-13b_20.0
5
  - generated_from_trainer
6
  base_model: liuhaotian/llava-v1.5-13b
7
  model-index:
8
- - name: liuhaotian/llava-v1.5-13b_20.0
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # liuhaotian/llava-v1.5-13b_20.0
16
 
17
- This model is a fine-tuned version of [liuhaotian/llava-v1.5-13b_20.0](https://huggingface.co/liuhaotian/llava-v1.5-13b_20.0) on an unknown dataset.
18
- It achieves the following results on the evaluation set:
19
- - eval_loss: 0.0032
20
- - eval_runtime: 40.8515
21
- - eval_samples_per_second: 11.064
22
- - eval_steps_per_second: 0.367
23
- - epoch: 19.0
24
- - step: 285
25
 
26
  ## Model description
27
 
 
1
  ---
2
  library_name: peft
3
  tags:
 
4
  - generated_from_trainer
5
  base_model: liuhaotian/llava-v1.5-13b
6
  model-index:
7
+ - name: llava_13b_exact_location_name_synthetic
8
  results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # llava_13b_exact_location_name_synthetic
15
 
16
+ This model is a fine-tuned version of [liuhaotian/llava-v1.5-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) on an unknown dataset.
 
 
 
 
 
 
 
17
 
18
  ## Model description
19
 
adapter_config.json CHANGED
@@ -20,13 +20,13 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "k_proj",
24
- "o_proj",
25
- "down_proj",
26
- "q_proj",
27
- "gate_proj",
28
  "up_proj",
29
- "v_proj"
 
 
 
 
 
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
 
 
 
 
23
  "up_proj",
24
+ "q_proj",
25
+ "down_proj",
26
+ "o_proj",
27
+ "v_proj",
28
+ "k_proj",
29
+ "gate_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b16f933c06fd96912807f969fa6dbb4aebaf5b278e0b9f04892d2746a8407fab
3
  size 1001466944
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:557f643fb9ea39a222565056a227833e88cb61103173e8fb3aa44388922c23bb
3
  size 1001466944
num_examples=400/llava-v1.5-13b_1.0/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: liuhaotian/llava-v1.5-13b
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
num_examples=400/llava-v1.5-13b_1.0/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "liuhaotian/llava-v1.5-13b",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 256,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 128,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "up_proj",
24
+ "q_proj",
25
+ "down_proj",
26
+ "o_proj",
27
+ "v_proj",
28
+ "k_proj",
29
+ "gate_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
num_examples=400/llava-v1.5-13b_1.0/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:557f643fb9ea39a222565056a227833e88cb61103173e8fb3aa44388922c23bb
3
+ size 1001466944
num_examples=400/llava-v1.5-13b_1.0/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<unk>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
num_examples=400/llava-v1.5-13b_1.0/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
num_examples=400/llava-v1.5-13b_1.0/tokenizer_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "clean_up_tokenization_spaces": false,
32
+ "eos_token": "</s>",
33
+ "legacy": false,
34
+ "model_max_length": 2048,
35
+ "pad_token": "<unk>",
36
+ "padding_side": "right",
37
+ "sp_model_kwargs": {},
38
+ "spaces_between_special_tokens": false,
39
+ "tokenizer_class": "LlamaTokenizer",
40
+ "unk_token": "<unk>",
41
+ "use_default_system_prompt": false
42
+ }
num_examples=400/llava-v1.5-13b_1.0/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6c3553537bc1d3d0652cf3b2bdcfd5226cf4b6778fd749d352ca45570436029
3
+ size 6840
trainer_state.json CHANGED
@@ -3,3907 +3,1987 @@
3
  "best_model_checkpoint": null,
4
  "epoch": 20.0,
5
  "eval_steps": 500,
6
- "global_step": 620,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.03,
13
  "learning_rate": 0.0,
14
- "loss": 1.5209,
15
  "step": 1
16
  },
17
  {
18
- "epoch": 0.06,
19
- "learning_rate": 4.708178267332765e-05,
20
- "loss": 1.2229,
21
  "step": 2
22
  },
23
  {
24
- "epoch": 0.1,
25
- "learning_rate": 7.46228600043274e-05,
26
- "loss": 1.2327,
27
  "step": 3
28
  },
29
  {
30
- "epoch": 0.13,
31
- "learning_rate": 9.41635653466553e-05,
32
- "loss": 1.1423,
33
  "step": 4
34
  },
35
  {
36
- "epoch": 0.16,
37
- "learning_rate": 0.00010932051394658049,
38
- "loss": 1.0992,
39
  "step": 5
40
  },
41
  {
42
- "epoch": 0.19,
43
- "learning_rate": 0.00012170464267765504,
44
- "loss": 1.0988,
45
  "step": 6
46
  },
47
  {
48
- "epoch": 0.23,
49
- "learning_rate": 0.0001321752743272128,
50
- "loss": 1.0443,
51
  "step": 7
52
  },
53
  {
54
- "epoch": 0.26,
55
- "learning_rate": 0.00014124534801998294,
56
- "loss": 1.05,
57
  "step": 8
58
  },
59
  {
60
- "epoch": 0.29,
61
- "learning_rate": 0.0001492457200086548,
62
- "loss": 0.9695,
63
  "step": 9
64
  },
65
  {
66
- "epoch": 0.32,
67
- "learning_rate": 0.00015640229661990817,
68
- "loss": 1.0095,
69
  "step": 10
70
  },
71
  {
72
- "epoch": 0.35,
73
- "learning_rate": 0.00016287620764191934,
74
- "loss": 1.0081,
75
  "step": 11
76
  },
77
  {
78
- "epoch": 0.39,
79
- "learning_rate": 0.00016878642535098271,
80
- "loss": 0.9324,
81
  "step": 12
82
  },
83
  {
84
- "epoch": 0.42,
85
- "learning_rate": 0.00017422329860526873,
86
- "loss": 0.9096,
87
  "step": 13
88
  },
89
  {
90
- "epoch": 0.45,
91
- "learning_rate": 0.00017925705700054043,
92
- "loss": 0.8283,
93
  "step": 14
94
  },
95
  {
96
- "epoch": 0.48,
97
- "learning_rate": 0.0001839433739509079,
98
- "loss": 0.8708,
 
 
 
 
 
 
 
 
99
  "step": 15
100
  },
101
  {
102
- "epoch": 0.52,
103
- "learning_rate": 0.0001883271306933106,
104
- "loss": 0.854,
105
  "step": 16
106
  },
107
  {
108
- "epoch": 0.55,
109
- "learning_rate": 0.00019244503717705084,
110
- "loss": 0.8107,
111
  "step": 17
112
  },
113
  {
114
- "epoch": 0.58,
115
- "learning_rate": 0.00019632750268198243,
116
- "loss": 0.7396,
117
  "step": 18
118
  },
119
  {
120
- "epoch": 0.61,
121
  "learning_rate": 0.0002,
122
- "loss": 0.7374,
123
  "step": 19
124
  },
125
  {
126
- "epoch": 0.65,
127
  "learning_rate": 0.0002,
128
- "loss": 0.642,
129
  "step": 20
130
  },
131
  {
132
- "epoch": 0.68,
133
  "learning_rate": 0.0002,
134
- "loss": 0.7746,
135
  "step": 21
136
  },
137
  {
138
- "epoch": 0.71,
139
  "learning_rate": 0.0002,
140
- "loss": 0.6737,
141
  "step": 22
142
  },
143
  {
144
- "epoch": 0.74,
145
  "learning_rate": 0.0002,
146
- "loss": 0.6024,
147
  "step": 23
148
  },
149
  {
150
- "epoch": 0.77,
151
  "learning_rate": 0.0002,
152
- "loss": 0.6862,
153
  "step": 24
154
  },
155
  {
156
- "epoch": 0.81,
157
  "learning_rate": 0.0002,
158
- "loss": 0.5941,
159
  "step": 25
160
  },
161
  {
162
- "epoch": 0.84,
163
  "learning_rate": 0.0002,
164
- "loss": 0.5712,
165
  "step": 26
166
  },
167
  {
168
- "epoch": 0.87,
169
  "learning_rate": 0.0002,
170
- "loss": 0.5542,
171
  "step": 27
172
  },
173
  {
174
- "epoch": 0.9,
175
  "learning_rate": 0.0002,
176
- "loss": 0.4663,
177
  "step": 28
178
  },
179
  {
180
- "epoch": 0.94,
181
  "learning_rate": 0.0002,
182
- "loss": 0.4901,
183
  "step": 29
184
  },
185
  {
186
- "epoch": 0.97,
187
  "learning_rate": 0.0002,
188
- "loss": 0.4259,
189
  "step": 30
190
  },
191
  {
192
- "epoch": 1.0,
193
- "learning_rate": 0.0002,
194
- "loss": 0.3824,
195
- "step": 31
 
 
196
  },
197
  {
198
- "epoch": 1.0,
199
- "eval_loss": 0.27588292956352234,
200
- "eval_runtime": 79.0786,
201
- "eval_samples_per_second": 12.279,
202
- "eval_steps_per_second": 0.392,
203
  "step": 31
204
  },
205
  {
206
- "epoch": 1.03,
207
  "learning_rate": 0.0002,
208
- "loss": 0.2758,
209
  "step": 32
210
  },
211
  {
212
- "epoch": 1.06,
213
  "learning_rate": 0.0002,
214
- "loss": 0.3008,
215
  "step": 33
216
  },
217
  {
218
- "epoch": 1.1,
219
  "learning_rate": 0.0002,
220
- "loss": 0.2365,
221
  "step": 34
222
  },
223
  {
224
- "epoch": 1.13,
225
  "learning_rate": 0.0002,
226
- "loss": 0.2172,
227
  "step": 35
228
  },
229
  {
230
- "epoch": 1.16,
231
  "learning_rate": 0.0002,
232
- "loss": 0.2279,
233
  "step": 36
234
  },
235
  {
236
- "epoch": 1.19,
237
  "learning_rate": 0.0002,
238
- "loss": 0.2497,
239
  "step": 37
240
  },
241
  {
242
- "epoch": 1.23,
243
  "learning_rate": 0.0002,
244
- "loss": 0.1943,
245
  "step": 38
246
  },
247
  {
248
- "epoch": 1.26,
249
  "learning_rate": 0.0002,
250
- "loss": 0.2189,
251
  "step": 39
252
  },
253
  {
254
- "epoch": 1.29,
255
  "learning_rate": 0.0002,
256
- "loss": 0.1136,
257
  "step": 40
258
  },
259
  {
260
- "epoch": 1.32,
261
  "learning_rate": 0.0002,
262
- "loss": 0.1728,
263
  "step": 41
264
  },
265
  {
266
- "epoch": 1.35,
267
  "learning_rate": 0.0002,
268
- "loss": 0.1786,
269
  "step": 42
270
  },
271
  {
272
- "epoch": 1.39,
273
  "learning_rate": 0.0002,
274
- "loss": 0.1389,
275
  "step": 43
276
  },
277
  {
278
- "epoch": 1.42,
279
  "learning_rate": 0.0002,
280
- "loss": 0.1352,
281
  "step": 44
282
  },
283
  {
284
- "epoch": 1.45,
285
  "learning_rate": 0.0002,
286
- "loss": 0.1489,
 
 
 
 
 
 
 
 
287
  "step": 45
288
  },
289
  {
290
- "epoch": 1.48,
291
  "learning_rate": 0.0002,
292
- "loss": 0.1242,
293
  "step": 46
294
  },
295
  {
296
- "epoch": 1.52,
297
  "learning_rate": 0.0002,
298
- "loss": 0.1498,
299
  "step": 47
300
  },
301
  {
302
- "epoch": 1.55,
303
  "learning_rate": 0.0002,
304
- "loss": 0.1477,
305
  "step": 48
306
  },
307
  {
308
- "epoch": 1.58,
309
  "learning_rate": 0.0002,
310
- "loss": 0.0588,
311
  "step": 49
312
  },
313
  {
314
- "epoch": 1.61,
315
  "learning_rate": 0.0002,
316
- "loss": 0.1447,
317
  "step": 50
318
  },
319
  {
320
- "epoch": 1.65,
321
  "learning_rate": 0.0002,
322
- "loss": 0.1178,
323
  "step": 51
324
  },
325
  {
326
- "epoch": 1.68,
327
  "learning_rate": 0.0002,
328
- "loss": 0.1571,
329
  "step": 52
330
  },
331
  {
332
- "epoch": 1.71,
333
  "learning_rate": 0.0002,
334
- "loss": 0.1102,
335
  "step": 53
336
  },
337
  {
338
- "epoch": 1.74,
339
  "learning_rate": 0.0002,
340
- "loss": 0.0951,
341
  "step": 54
342
  },
343
  {
344
- "epoch": 1.77,
345
  "learning_rate": 0.0002,
346
- "loss": 0.1482,
347
  "step": 55
348
  },
349
  {
350
- "epoch": 1.81,
351
  "learning_rate": 0.0002,
352
- "loss": 0.0877,
353
  "step": 56
354
  },
355
  {
356
- "epoch": 1.84,
357
  "learning_rate": 0.0002,
358
- "loss": 0.1067,
359
  "step": 57
360
  },
361
  {
362
- "epoch": 1.87,
363
  "learning_rate": 0.0002,
364
- "loss": 0.0974,
365
  "step": 58
366
  },
367
  {
368
- "epoch": 1.9,
369
  "learning_rate": 0.0002,
370
- "loss": 0.0631,
371
  "step": 59
372
  },
373
  {
374
- "epoch": 1.94,
375
  "learning_rate": 0.0002,
376
- "loss": 0.098,
377
  "step": 60
378
  },
379
  {
380
- "epoch": 1.97,
381
- "learning_rate": 0.0002,
382
- "loss": 0.1148,
383
- "step": 61
 
 
384
  },
385
  {
386
- "epoch": 2.0,
387
  "learning_rate": 0.0002,
388
- "loss": 0.0442,
389
- "step": 62
390
  },
391
  {
392
- "epoch": 2.0,
393
- "eval_loss": 0.04755338281393051,
394
- "eval_runtime": 79.4962,
395
- "eval_samples_per_second": 12.214,
396
- "eval_steps_per_second": 0.39,
397
  "step": 62
398
  },
399
  {
400
- "epoch": 2.03,
401
  "learning_rate": 0.0002,
402
- "loss": 0.0526,
403
  "step": 63
404
  },
405
  {
406
- "epoch": 2.06,
407
  "learning_rate": 0.0002,
408
- "loss": 0.0471,
409
  "step": 64
410
  },
411
  {
412
- "epoch": 2.1,
413
  "learning_rate": 0.0002,
414
- "loss": 0.036,
415
  "step": 65
416
  },
417
  {
418
- "epoch": 2.13,
419
  "learning_rate": 0.0002,
420
- "loss": 0.037,
421
  "step": 66
422
  },
423
  {
424
- "epoch": 2.16,
425
  "learning_rate": 0.0002,
426
- "loss": 0.0293,
427
  "step": 67
428
  },
429
  {
430
- "epoch": 2.19,
431
  "learning_rate": 0.0002,
432
- "loss": 0.0316,
433
  "step": 68
434
  },
435
  {
436
- "epoch": 2.23,
437
  "learning_rate": 0.0002,
438
- "loss": 0.0268,
439
  "step": 69
440
  },
441
  {
442
- "epoch": 2.26,
443
  "learning_rate": 0.0002,
444
- "loss": 0.0366,
445
  "step": 70
446
  },
447
  {
448
- "epoch": 2.29,
449
  "learning_rate": 0.0002,
450
- "loss": 0.0305,
451
  "step": 71
452
  },
453
  {
454
- "epoch": 2.32,
455
  "learning_rate": 0.0002,
456
- "loss": 0.0336,
457
  "step": 72
458
  },
459
  {
460
- "epoch": 2.35,
461
  "learning_rate": 0.0002,
462
- "loss": 0.0497,
463
  "step": 73
464
  },
465
  {
466
- "epoch": 2.39,
467
  "learning_rate": 0.0002,
468
- "loss": 0.0275,
469
  "step": 74
470
  },
471
  {
472
- "epoch": 2.42,
473
  "learning_rate": 0.0002,
474
- "loss": 0.0385,
 
 
 
 
 
 
 
 
475
  "step": 75
476
  },
477
  {
478
- "epoch": 2.45,
479
  "learning_rate": 0.0002,
480
- "loss": 0.0221,
481
  "step": 76
482
  },
483
  {
484
- "epoch": 2.48,
485
  "learning_rate": 0.0002,
486
- "loss": 0.0184,
487
  "step": 77
488
  },
489
  {
490
- "epoch": 2.52,
491
  "learning_rate": 0.0002,
492
- "loss": 0.0354,
493
  "step": 78
494
  },
495
  {
496
- "epoch": 2.55,
497
  "learning_rate": 0.0002,
498
- "loss": 0.0148,
499
  "step": 79
500
  },
501
  {
502
- "epoch": 2.58,
503
  "learning_rate": 0.0002,
504
- "loss": 0.0254,
505
  "step": 80
506
  },
507
  {
508
- "epoch": 2.61,
509
  "learning_rate": 0.0002,
510
- "loss": 0.0356,
511
  "step": 81
512
  },
513
  {
514
- "epoch": 2.65,
515
  "learning_rate": 0.0002,
516
- "loss": 0.0237,
517
  "step": 82
518
  },
519
  {
520
- "epoch": 2.68,
521
  "learning_rate": 0.0002,
522
- "loss": 0.0259,
523
  "step": 83
524
  },
525
  {
526
- "epoch": 2.71,
527
  "learning_rate": 0.0002,
528
- "loss": 0.0276,
529
  "step": 84
530
  },
531
  {
532
- "epoch": 2.74,
533
  "learning_rate": 0.0002,
534
- "loss": 0.0212,
535
  "step": 85
536
  },
537
  {
538
- "epoch": 2.77,
539
  "learning_rate": 0.0002,
540
- "loss": 0.0351,
541
  "step": 86
542
  },
543
  {
544
- "epoch": 2.81,
545
  "learning_rate": 0.0002,
546
- "loss": 0.0332,
547
  "step": 87
548
  },
549
  {
550
- "epoch": 2.84,
551
  "learning_rate": 0.0002,
552
- "loss": 0.0225,
553
  "step": 88
554
  },
555
  {
556
- "epoch": 2.87,
557
  "learning_rate": 0.0002,
558
- "loss": 0.0113,
559
  "step": 89
560
  },
561
  {
562
- "epoch": 2.9,
563
  "learning_rate": 0.0002,
564
- "loss": 0.045,
 
 
 
 
 
 
 
 
565
  "step": 90
566
  },
567
  {
568
- "epoch": 2.94,
569
  "learning_rate": 0.0002,
570
- "loss": 0.0228,
571
  "step": 91
572
  },
573
  {
574
- "epoch": 2.97,
575
  "learning_rate": 0.0002,
576
- "loss": 0.0216,
577
  "step": 92
578
  },
579
  {
580
- "epoch": 3.0,
581
  "learning_rate": 0.0002,
582
- "loss": 0.0111,
583
- "step": 93
584
- },
585
- {
586
- "epoch": 3.0,
587
- "eval_loss": 0.013120166026055813,
588
- "eval_runtime": 79.6214,
589
- "eval_samples_per_second": 12.195,
590
- "eval_steps_per_second": 0.389,
591
  "step": 93
592
  },
593
  {
594
- "epoch": 3.03,
595
  "learning_rate": 0.0002,
596
- "loss": 0.009,
597
  "step": 94
598
  },
599
  {
600
- "epoch": 3.06,
601
  "learning_rate": 0.0002,
602
- "loss": 0.0111,
603
  "step": 95
604
  },
605
  {
606
- "epoch": 3.1,
607
  "learning_rate": 0.0002,
608
- "loss": 0.0147,
609
  "step": 96
610
  },
611
  {
612
- "epoch": 3.13,
613
  "learning_rate": 0.0002,
614
- "loss": 0.0185,
615
  "step": 97
616
  },
617
  {
618
- "epoch": 3.16,
619
  "learning_rate": 0.0002,
620
- "loss": 0.0155,
621
  "step": 98
622
  },
623
  {
624
- "epoch": 3.19,
625
  "learning_rate": 0.0002,
626
- "loss": 0.0091,
627
  "step": 99
628
  },
629
  {
630
- "epoch": 3.23,
631
  "learning_rate": 0.0002,
632
- "loss": 0.0072,
633
  "step": 100
634
  },
635
  {
636
- "epoch": 3.26,
637
  "learning_rate": 0.0002,
638
- "loss": 0.0126,
639
  "step": 101
640
  },
641
  {
642
- "epoch": 3.29,
643
  "learning_rate": 0.0002,
644
- "loss": 0.0106,
645
  "step": 102
646
  },
647
  {
648
- "epoch": 3.32,
649
  "learning_rate": 0.0002,
650
- "loss": 0.0106,
651
  "step": 103
652
  },
653
  {
654
- "epoch": 3.35,
655
  "learning_rate": 0.0002,
656
- "loss": 0.0145,
657
  "step": 104
658
  },
659
  {
660
- "epoch": 3.39,
661
  "learning_rate": 0.0002,
662
- "loss": 0.0063,
 
 
 
 
 
 
 
 
663
  "step": 105
664
  },
665
  {
666
- "epoch": 3.42,
667
  "learning_rate": 0.0002,
668
- "loss": 0.0165,
669
  "step": 106
670
  },
671
  {
672
- "epoch": 3.45,
673
  "learning_rate": 0.0002,
674
- "loss": 0.0066,
675
  "step": 107
676
  },
677
  {
678
- "epoch": 3.48,
679
  "learning_rate": 0.0002,
680
- "loss": 0.0148,
681
  "step": 108
682
  },
683
  {
684
- "epoch": 3.52,
685
  "learning_rate": 0.0002,
686
- "loss": 0.0112,
687
  "step": 109
688
  },
689
  {
690
- "epoch": 3.55,
691
  "learning_rate": 0.0002,
692
- "loss": 0.0106,
693
  "step": 110
694
  },
695
  {
696
- "epoch": 3.58,
697
  "learning_rate": 0.0002,
698
- "loss": 0.0109,
699
  "step": 111
700
  },
701
  {
702
- "epoch": 3.61,
703
  "learning_rate": 0.0002,
704
- "loss": 0.0189,
705
  "step": 112
706
  },
707
  {
708
- "epoch": 3.65,
709
  "learning_rate": 0.0002,
710
- "loss": 0.0131,
711
  "step": 113
712
  },
713
  {
714
- "epoch": 3.68,
715
  "learning_rate": 0.0002,
716
- "loss": 0.0115,
717
  "step": 114
718
  },
719
  {
720
- "epoch": 3.71,
721
  "learning_rate": 0.0002,
722
- "loss": 0.0108,
723
  "step": 115
724
  },
725
  {
726
- "epoch": 3.74,
727
  "learning_rate": 0.0002,
728
- "loss": 0.008,
729
  "step": 116
730
  },
731
  {
732
- "epoch": 3.77,
733
  "learning_rate": 0.0002,
734
- "loss": 0.0143,
735
  "step": 117
736
  },
737
  {
738
- "epoch": 3.81,
739
  "learning_rate": 0.0002,
740
- "loss": 0.0085,
741
  "step": 118
742
  },
743
  {
744
- "epoch": 3.84,
745
  "learning_rate": 0.0002,
746
- "loss": 0.0122,
747
  "step": 119
748
  },
749
  {
750
- "epoch": 3.87,
751
  "learning_rate": 0.0002,
752
- "loss": 0.0086,
 
 
 
 
 
 
 
 
753
  "step": 120
754
  },
755
  {
756
- "epoch": 3.9,
757
  "learning_rate": 0.0002,
758
- "loss": 0.013,
759
  "step": 121
760
  },
761
  {
762
- "epoch": 3.94,
763
  "learning_rate": 0.0002,
764
- "loss": 0.0049,
765
  "step": 122
766
  },
767
  {
768
- "epoch": 3.97,
769
  "learning_rate": 0.0002,
770
- "loss": 0.0102,
771
  "step": 123
772
  },
773
  {
774
- "epoch": 4.0,
775
  "learning_rate": 0.0002,
776
- "loss": 0.0043,
777
- "step": 124
778
- },
779
- {
780
- "epoch": 4.0,
781
- "eval_loss": 0.0071370587684214115,
782
- "eval_runtime": 79.2858,
783
- "eval_samples_per_second": 12.247,
784
- "eval_steps_per_second": 0.391,
785
  "step": 124
786
  },
787
  {
788
- "epoch": 4.03,
789
  "learning_rate": 0.0002,
790
- "loss": 0.0066,
791
  "step": 125
792
  },
793
  {
794
- "epoch": 4.06,
795
  "learning_rate": 0.0002,
796
- "loss": 0.0049,
797
  "step": 126
798
  },
799
  {
800
- "epoch": 4.1,
801
  "learning_rate": 0.0002,
802
- "loss": 0.0064,
803
  "step": 127
804
  },
805
  {
806
- "epoch": 4.13,
807
  "learning_rate": 0.0002,
808
- "loss": 0.0086,
809
  "step": 128
810
  },
811
  {
812
- "epoch": 4.16,
813
  "learning_rate": 0.0002,
814
- "loss": 0.0079,
815
  "step": 129
816
  },
817
  {
818
- "epoch": 4.19,
819
  "learning_rate": 0.0002,
820
- "loss": 0.0066,
821
  "step": 130
822
  },
823
  {
824
- "epoch": 4.23,
825
  "learning_rate": 0.0002,
826
- "loss": 0.0061,
827
  "step": 131
828
  },
829
  {
830
- "epoch": 4.26,
831
  "learning_rate": 0.0002,
832
- "loss": 0.004,
833
  "step": 132
834
  },
835
  {
836
- "epoch": 4.29,
837
  "learning_rate": 0.0002,
838
- "loss": 0.0079,
839
  "step": 133
840
  },
841
  {
842
- "epoch": 4.32,
843
  "learning_rate": 0.0002,
844
- "loss": 0.0105,
845
  "step": 134
846
  },
847
  {
848
- "epoch": 4.35,
849
  "learning_rate": 0.0002,
850
- "loss": 0.0035,
 
 
 
 
 
 
 
 
851
  "step": 135
852
  },
853
  {
854
- "epoch": 4.39,
855
  "learning_rate": 0.0002,
856
- "loss": 0.0087,
857
  "step": 136
858
  },
859
  {
860
- "epoch": 4.42,
861
  "learning_rate": 0.0002,
862
- "loss": 0.0043,
863
  "step": 137
864
  },
865
  {
866
- "epoch": 4.45,
867
  "learning_rate": 0.0002,
868
- "loss": 0.0054,
869
  "step": 138
870
  },
871
  {
872
- "epoch": 4.48,
873
  "learning_rate": 0.0002,
874
- "loss": 0.0064,
875
  "step": 139
876
  },
877
  {
878
- "epoch": 4.52,
879
  "learning_rate": 0.0002,
880
- "loss": 0.0077,
881
  "step": 140
882
  },
883
  {
884
- "epoch": 4.55,
885
  "learning_rate": 0.0002,
886
- "loss": 0.0103,
887
  "step": 141
888
  },
889
  {
890
- "epoch": 4.58,
891
  "learning_rate": 0.0002,
892
- "loss": 0.0089,
893
  "step": 142
894
  },
895
  {
896
- "epoch": 4.61,
897
  "learning_rate": 0.0002,
898
- "loss": 0.0051,
899
  "step": 143
900
  },
901
  {
902
- "epoch": 4.65,
903
  "learning_rate": 0.0002,
904
- "loss": 0.0034,
905
  "step": 144
906
  },
907
  {
908
- "epoch": 4.68,
909
  "learning_rate": 0.0002,
910
- "loss": 0.0046,
911
  "step": 145
912
  },
913
  {
914
- "epoch": 4.71,
915
  "learning_rate": 0.0002,
916
- "loss": 0.0093,
917
  "step": 146
918
  },
919
  {
920
- "epoch": 4.74,
921
  "learning_rate": 0.0002,
922
- "loss": 0.0084,
923
  "step": 147
924
  },
925
  {
926
- "epoch": 4.77,
927
  "learning_rate": 0.0002,
928
- "loss": 0.0059,
929
  "step": 148
930
  },
931
  {
932
- "epoch": 4.81,
933
  "learning_rate": 0.0002,
934
- "loss": 0.0041,
935
  "step": 149
936
  },
937
  {
938
- "epoch": 4.84,
939
  "learning_rate": 0.0002,
940
- "loss": 0.0071,
941
  "step": 150
942
  },
943
  {
944
- "epoch": 4.87,
 
 
 
 
 
 
 
 
945
  "learning_rate": 0.0002,
946
- "loss": 0.0093,
947
  "step": 151
948
  },
949
  {
950
- "epoch": 4.9,
951
  "learning_rate": 0.0002,
952
- "loss": 0.0069,
953
  "step": 152
954
  },
955
  {
956
- "epoch": 4.94,
957
  "learning_rate": 0.0002,
958
- "loss": 0.0079,
959
  "step": 153
960
  },
961
  {
962
- "epoch": 4.97,
963
  "learning_rate": 0.0002,
964
- "loss": 0.0091,
965
  "step": 154
966
  },
967
  {
968
- "epoch": 5.0,
969
  "learning_rate": 0.0002,
970
- "loss": 0.0064,
971
- "step": 155
972
- },
973
- {
974
- "epoch": 5.0,
975
- "eval_loss": 0.004239812958985567,
976
- "eval_runtime": 79.1869,
977
- "eval_samples_per_second": 12.262,
978
- "eval_steps_per_second": 0.391,
979
  "step": 155
980
  },
981
  {
982
- "epoch": 5.03,
983
  "learning_rate": 0.0002,
984
- "loss": 0.0024,
985
  "step": 156
986
  },
987
  {
988
- "epoch": 5.06,
989
  "learning_rate": 0.0002,
990
- "loss": 0.0025,
991
  "step": 157
992
  },
993
  {
994
- "epoch": 5.1,
995
  "learning_rate": 0.0002,
996
- "loss": 0.0024,
997
  "step": 158
998
  },
999
  {
1000
- "epoch": 5.13,
1001
  "learning_rate": 0.0002,
1002
- "loss": 0.0052,
1003
  "step": 159
1004
  },
1005
  {
1006
- "epoch": 5.16,
1007
  "learning_rate": 0.0002,
1008
- "loss": 0.0029,
1009
  "step": 160
1010
  },
1011
  {
1012
- "epoch": 5.19,
1013
  "learning_rate": 0.0002,
1014
- "loss": 0.0045,
1015
  "step": 161
1016
  },
1017
  {
1018
- "epoch": 5.23,
1019
  "learning_rate": 0.0002,
1020
- "loss": 0.0057,
1021
  "step": 162
1022
  },
1023
  {
1024
- "epoch": 5.26,
1025
  "learning_rate": 0.0002,
1026
- "loss": 0.0023,
1027
  "step": 163
1028
  },
1029
  {
1030
- "epoch": 5.29,
1031
  "learning_rate": 0.0002,
1032
- "loss": 0.0063,
1033
  "step": 164
1034
  },
1035
  {
1036
- "epoch": 5.32,
1037
  "learning_rate": 0.0002,
1038
- "loss": 0.0073,
 
 
 
 
 
 
 
 
1039
  "step": 165
1040
  },
1041
  {
1042
- "epoch": 5.35,
1043
  "learning_rate": 0.0002,
1044
- "loss": 0.0052,
1045
  "step": 166
1046
  },
1047
  {
1048
- "epoch": 5.39,
1049
  "learning_rate": 0.0002,
1050
- "loss": 0.0039,
1051
  "step": 167
1052
  },
1053
  {
1054
- "epoch": 5.42,
1055
  "learning_rate": 0.0002,
1056
- "loss": 0.0017,
1057
  "step": 168
1058
  },
1059
  {
1060
- "epoch": 5.45,
1061
  "learning_rate": 0.0002,
1062
- "loss": 0.0032,
1063
  "step": 169
1064
  },
1065
  {
1066
- "epoch": 5.48,
1067
  "learning_rate": 0.0002,
1068
- "loss": 0.0049,
1069
  "step": 170
1070
  },
1071
  {
1072
- "epoch": 5.52,
1073
  "learning_rate": 0.0002,
1074
- "loss": 0.0067,
1075
  "step": 171
1076
  },
1077
  {
1078
- "epoch": 5.55,
1079
  "learning_rate": 0.0002,
1080
- "loss": 0.0052,
1081
  "step": 172
1082
  },
1083
  {
1084
- "epoch": 5.58,
1085
  "learning_rate": 0.0002,
1086
- "loss": 0.0052,
1087
  "step": 173
1088
  },
1089
  {
1090
- "epoch": 5.61,
1091
  "learning_rate": 0.0002,
1092
- "loss": 0.0074,
1093
  "step": 174
1094
  },
1095
  {
1096
- "epoch": 5.65,
1097
  "learning_rate": 0.0002,
1098
- "loss": 0.0064,
1099
  "step": 175
1100
  },
1101
  {
1102
- "epoch": 5.68,
1103
  "learning_rate": 0.0002,
1104
- "loss": 0.002,
1105
  "step": 176
1106
  },
1107
  {
1108
- "epoch": 5.71,
1109
  "learning_rate": 0.0002,
1110
- "loss": 0.0017,
1111
  "step": 177
1112
  },
1113
  {
1114
- "epoch": 5.74,
1115
  "learning_rate": 0.0002,
1116
- "loss": 0.0056,
1117
  "step": 178
1118
  },
1119
  {
1120
- "epoch": 5.77,
1121
  "learning_rate": 0.0002,
1122
- "loss": 0.0036,
1123
  "step": 179
1124
  },
1125
  {
1126
- "epoch": 5.81,
1127
  "learning_rate": 0.0002,
1128
- "loss": 0.0021,
 
 
 
 
 
 
 
 
1129
  "step": 180
1130
  },
1131
  {
1132
- "epoch": 5.84,
1133
  "learning_rate": 0.0002,
1134
- "loss": 0.0064,
1135
  "step": 181
1136
  },
1137
  {
1138
- "epoch": 5.87,
1139
  "learning_rate": 0.0002,
1140
- "loss": 0.0067,
1141
  "step": 182
1142
  },
1143
  {
1144
- "epoch": 5.9,
1145
  "learning_rate": 0.0002,
1146
- "loss": 0.008,
1147
  "step": 183
1148
  },
1149
  {
1150
- "epoch": 5.94,
1151
  "learning_rate": 0.0002,
1152
- "loss": 0.0039,
1153
  "step": 184
1154
  },
1155
  {
1156
- "epoch": 5.97,
1157
  "learning_rate": 0.0002,
1158
- "loss": 0.0043,
1159
  "step": 185
1160
  },
1161
  {
1162
- "epoch": 6.0,
1163
  "learning_rate": 0.0002,
1164
- "loss": 0.0048,
1165
- "step": 186
1166
- },
1167
- {
1168
- "epoch": 6.0,
1169
- "eval_loss": 0.0028802985325455666,
1170
- "eval_runtime": 79.2422,
1171
- "eval_samples_per_second": 12.254,
1172
- "eval_steps_per_second": 0.391,
1173
  "step": 186
1174
  },
1175
  {
1176
- "epoch": 6.03,
1177
  "learning_rate": 0.0002,
1178
- "loss": 0.0017,
1179
  "step": 187
1180
  },
1181
  {
1182
- "epoch": 6.06,
1183
  "learning_rate": 0.0002,
1184
- "loss": 0.0025,
1185
  "step": 188
1186
  },
1187
  {
1188
- "epoch": 6.1,
1189
  "learning_rate": 0.0002,
1190
- "loss": 0.0024,
1191
  "step": 189
1192
  },
1193
  {
1194
- "epoch": 6.13,
1195
  "learning_rate": 0.0002,
1196
- "loss": 0.0018,
1197
  "step": 190
1198
  },
1199
  {
1200
- "epoch": 6.16,
1201
  "learning_rate": 0.0002,
1202
- "loss": 0.004,
1203
  "step": 191
1204
  },
1205
  {
1206
- "epoch": 6.19,
1207
  "learning_rate": 0.0002,
1208
- "loss": 0.006,
1209
  "step": 192
1210
  },
1211
  {
1212
- "epoch": 6.23,
1213
  "learning_rate": 0.0002,
1214
- "loss": 0.0015,
1215
  "step": 193
1216
  },
1217
  {
1218
- "epoch": 6.26,
1219
  "learning_rate": 0.0002,
1220
- "loss": 0.0017,
1221
  "step": 194
1222
  },
1223
  {
1224
- "epoch": 6.29,
1225
  "learning_rate": 0.0002,
1226
- "loss": 0.0021,
 
 
 
 
 
 
 
 
1227
  "step": 195
1228
  },
1229
  {
1230
- "epoch": 6.32,
1231
  "learning_rate": 0.0002,
1232
- "loss": 0.0073,
1233
  "step": 196
1234
  },
1235
  {
1236
- "epoch": 6.35,
1237
  "learning_rate": 0.0002,
1238
- "loss": 0.0056,
1239
  "step": 197
1240
  },
1241
  {
1242
- "epoch": 6.39,
1243
  "learning_rate": 0.0002,
1244
- "loss": 0.0015,
1245
  "step": 198
1246
  },
1247
  {
1248
- "epoch": 6.42,
1249
  "learning_rate": 0.0002,
1250
- "loss": 0.0028,
1251
  "step": 199
1252
  },
1253
  {
1254
- "epoch": 6.45,
1255
  "learning_rate": 0.0002,
1256
- "loss": 0.0012,
1257
  "step": 200
1258
  },
1259
  {
1260
- "epoch": 6.48,
1261
  "learning_rate": 0.0002,
1262
- "loss": 0.006,
1263
  "step": 201
1264
  },
1265
  {
1266
- "epoch": 6.52,
1267
  "learning_rate": 0.0002,
1268
- "loss": 0.0033,
1269
  "step": 202
1270
  },
1271
  {
1272
- "epoch": 6.55,
1273
  "learning_rate": 0.0002,
1274
- "loss": 0.0035,
1275
  "step": 203
1276
  },
1277
  {
1278
- "epoch": 6.58,
1279
  "learning_rate": 0.0002,
1280
- "loss": 0.0019,
1281
  "step": 204
1282
  },
1283
  {
1284
- "epoch": 6.61,
1285
  "learning_rate": 0.0002,
1286
- "loss": 0.007,
1287
  "step": 205
1288
  },
1289
  {
1290
- "epoch": 6.65,
1291
  "learning_rate": 0.0002,
1292
- "loss": 0.0019,
1293
  "step": 206
1294
  },
1295
  {
1296
- "epoch": 6.68,
1297
  "learning_rate": 0.0002,
1298
- "loss": 0.0053,
1299
  "step": 207
1300
  },
1301
  {
1302
- "epoch": 6.71,
1303
  "learning_rate": 0.0002,
1304
- "loss": 0.0013,
1305
  "step": 208
1306
  },
1307
  {
1308
- "epoch": 6.74,
1309
  "learning_rate": 0.0002,
1310
- "loss": 0.0021,
1311
  "step": 209
1312
  },
1313
  {
1314
- "epoch": 6.77,
1315
  "learning_rate": 0.0002,
1316
- "loss": 0.0041,
 
 
 
 
 
 
 
 
1317
  "step": 210
1318
  },
1319
  {
1320
- "epoch": 6.81,
1321
  "learning_rate": 0.0002,
1322
- "loss": 0.0028,
1323
  "step": 211
1324
  },
1325
  {
1326
- "epoch": 6.84,
1327
  "learning_rate": 0.0002,
1328
- "loss": 0.0016,
1329
  "step": 212
1330
  },
1331
  {
1332
- "epoch": 6.87,
1333
  "learning_rate": 0.0002,
1334
- "loss": 0.002,
1335
  "step": 213
1336
  },
1337
  {
1338
- "epoch": 6.9,
1339
  "learning_rate": 0.0002,
1340
- "loss": 0.0026,
1341
  "step": 214
1342
  },
1343
  {
1344
- "epoch": 6.94,
1345
  "learning_rate": 0.0002,
1346
- "loss": 0.0053,
1347
  "step": 215
1348
  },
1349
  {
1350
- "epoch": 6.97,
1351
  "learning_rate": 0.0002,
1352
- "loss": 0.0024,
1353
  "step": 216
1354
  },
1355
  {
1356
- "epoch": 7.0,
1357
  "learning_rate": 0.0002,
1358
- "loss": 0.0023,
1359
- "step": 217
1360
- },
1361
- {
1362
- "epoch": 7.0,
1363
- "eval_loss": 0.003779121907427907,
1364
- "eval_runtime": 79.3546,
1365
- "eval_samples_per_second": 12.236,
1366
- "eval_steps_per_second": 0.391,
1367
  "step": 217
1368
  },
1369
  {
1370
- "epoch": 7.03,
1371
  "learning_rate": 0.0002,
1372
- "loss": 0.0031,
1373
  "step": 218
1374
  },
1375
  {
1376
- "epoch": 7.06,
1377
  "learning_rate": 0.0002,
1378
- "loss": 0.0043,
1379
  "step": 219
1380
  },
1381
  {
1382
- "epoch": 7.1,
1383
  "learning_rate": 0.0002,
1384
- "loss": 0.0056,
1385
  "step": 220
1386
  },
1387
  {
1388
- "epoch": 7.13,
1389
  "learning_rate": 0.0002,
1390
- "loss": 0.0018,
1391
  "step": 221
1392
  },
1393
  {
1394
- "epoch": 7.16,
1395
  "learning_rate": 0.0002,
1396
- "loss": 0.0013,
1397
  "step": 222
1398
  },
1399
  {
1400
- "epoch": 7.19,
1401
  "learning_rate": 0.0002,
1402
- "loss": 0.0009,
1403
  "step": 223
1404
  },
1405
  {
1406
- "epoch": 7.23,
1407
  "learning_rate": 0.0002,
1408
- "loss": 0.0019,
1409
  "step": 224
1410
  },
1411
  {
1412
- "epoch": 7.26,
1413
  "learning_rate": 0.0002,
1414
- "loss": 0.0054,
 
 
 
 
 
 
 
 
1415
  "step": 225
1416
  },
1417
  {
1418
- "epoch": 7.29,
1419
  "learning_rate": 0.0002,
1420
- "loss": 0.0027,
1421
  "step": 226
1422
  },
1423
  {
1424
- "epoch": 7.32,
1425
  "learning_rate": 0.0002,
1426
  "loss": 0.0025,
1427
  "step": 227
1428
  },
1429
  {
1430
- "epoch": 7.35,
1431
  "learning_rate": 0.0002,
1432
- "loss": 0.0031,
1433
  "step": 228
1434
  },
1435
  {
1436
- "epoch": 7.39,
1437
  "learning_rate": 0.0002,
1438
- "loss": 0.0021,
1439
  "step": 229
1440
  },
1441
  {
1442
- "epoch": 7.42,
1443
  "learning_rate": 0.0002,
1444
- "loss": 0.0025,
1445
  "step": 230
1446
  },
1447
  {
1448
- "epoch": 7.45,
1449
  "learning_rate": 0.0002,
1450
- "loss": 0.0018,
1451
  "step": 231
1452
  },
1453
  {
1454
- "epoch": 7.48,
1455
  "learning_rate": 0.0002,
1456
- "loss": 0.0015,
1457
  "step": 232
1458
  },
1459
  {
1460
- "epoch": 7.52,
1461
  "learning_rate": 0.0002,
1462
- "loss": 0.0014,
1463
  "step": 233
1464
  },
1465
  {
1466
- "epoch": 7.55,
1467
  "learning_rate": 0.0002,
1468
- "loss": 0.0036,
1469
  "step": 234
1470
  },
1471
  {
1472
- "epoch": 7.58,
1473
  "learning_rate": 0.0002,
1474
- "loss": 0.0006,
1475
  "step": 235
1476
  },
1477
  {
1478
- "epoch": 7.61,
1479
  "learning_rate": 0.0002,
1480
- "loss": 0.0022,
1481
  "step": 236
1482
  },
1483
  {
1484
- "epoch": 7.65,
1485
  "learning_rate": 0.0002,
1486
- "loss": 0.0039,
1487
  "step": 237
1488
  },
1489
  {
1490
- "epoch": 7.68,
1491
  "learning_rate": 0.0002,
1492
- "loss": 0.0026,
1493
  "step": 238
1494
  },
1495
  {
1496
- "epoch": 7.71,
1497
  "learning_rate": 0.0002,
1498
- "loss": 0.0011,
1499
  "step": 239
1500
  },
1501
  {
1502
- "epoch": 7.74,
1503
  "learning_rate": 0.0002,
1504
- "loss": 0.0045,
 
 
 
 
 
 
 
 
1505
  "step": 240
1506
  },
1507
  {
1508
- "epoch": 7.77,
1509
  "learning_rate": 0.0002,
1510
- "loss": 0.0028,
1511
  "step": 241
1512
  },
1513
  {
1514
- "epoch": 7.81,
1515
  "learning_rate": 0.0002,
1516
- "loss": 0.0026,
1517
  "step": 242
1518
  },
1519
  {
1520
- "epoch": 7.84,
1521
  "learning_rate": 0.0002,
1522
- "loss": 0.0083,
1523
  "step": 243
1524
  },
1525
  {
1526
- "epoch": 7.87,
1527
  "learning_rate": 0.0002,
1528
- "loss": 0.0009,
1529
  "step": 244
1530
  },
1531
  {
1532
- "epoch": 7.9,
1533
  "learning_rate": 0.0002,
1534
- "loss": 0.0012,
1535
  "step": 245
1536
  },
1537
  {
1538
- "epoch": 7.94,
1539
  "learning_rate": 0.0002,
1540
- "loss": 0.0033,
1541
  "step": 246
1542
  },
1543
  {
1544
- "epoch": 7.97,
1545
  "learning_rate": 0.0002,
1546
- "loss": 0.0016,
1547
  "step": 247
1548
  },
1549
  {
1550
- "epoch": 8.0,
1551
  "learning_rate": 0.0002,
1552
- "loss": 0.0024,
1553
- "step": 248
1554
- },
1555
- {
1556
- "epoch": 8.0,
1557
- "eval_loss": 0.00199125986546278,
1558
- "eval_runtime": 79.2097,
1559
- "eval_samples_per_second": 12.259,
1560
- "eval_steps_per_second": 0.391,
1561
  "step": 248
1562
  },
1563
  {
1564
- "epoch": 8.03,
1565
  "learning_rate": 0.0002,
1566
- "loss": 0.0017,
1567
  "step": 249
1568
  },
1569
  {
1570
- "epoch": 8.06,
1571
  "learning_rate": 0.0002,
1572
- "loss": 0.0041,
1573
  "step": 250
1574
  },
1575
  {
1576
- "epoch": 8.1,
1577
  "learning_rate": 0.0002,
1578
- "loss": 0.0019,
1579
  "step": 251
1580
  },
1581
  {
1582
- "epoch": 8.13,
1583
  "learning_rate": 0.0002,
1584
- "loss": 0.0016,
1585
  "step": 252
1586
  },
1587
  {
1588
- "epoch": 8.16,
1589
  "learning_rate": 0.0002,
1590
- "loss": 0.0008,
1591
  "step": 253
1592
  },
1593
  {
1594
- "epoch": 8.19,
1595
  "learning_rate": 0.0002,
1596
- "loss": 0.0008,
1597
  "step": 254
1598
  },
1599
  {
1600
- "epoch": 8.23,
1601
  "learning_rate": 0.0002,
1602
- "loss": 0.0026,
 
 
 
 
 
 
 
 
1603
  "step": 255
1604
  },
1605
  {
1606
- "epoch": 8.26,
1607
  "learning_rate": 0.0002,
1608
- "loss": 0.0024,
1609
  "step": 256
1610
  },
1611
  {
1612
- "epoch": 8.29,
1613
  "learning_rate": 0.0002,
1614
- "loss": 0.0021,
1615
  "step": 257
1616
  },
1617
  {
1618
- "epoch": 8.32,
1619
  "learning_rate": 0.0002,
1620
- "loss": 0.0034,
1621
  "step": 258
1622
  },
1623
  {
1624
- "epoch": 8.35,
1625
  "learning_rate": 0.0002,
1626
- "loss": 0.0029,
1627
  "step": 259
1628
  },
1629
  {
1630
- "epoch": 8.39,
1631
  "learning_rate": 0.0002,
1632
- "loss": 0.0021,
1633
  "step": 260
1634
  },
1635
  {
1636
- "epoch": 8.42,
1637
  "learning_rate": 0.0002,
1638
- "loss": 0.0055,
1639
  "step": 261
1640
  },
1641
  {
1642
- "epoch": 8.45,
1643
  "learning_rate": 0.0002,
1644
- "loss": 0.0019,
1645
  "step": 262
1646
  },
1647
  {
1648
- "epoch": 8.48,
1649
  "learning_rate": 0.0002,
1650
- "loss": 0.0025,
1651
  "step": 263
1652
  },
1653
  {
1654
- "epoch": 8.52,
1655
  "learning_rate": 0.0002,
1656
- "loss": 0.0067,
1657
  "step": 264
1658
  },
1659
  {
1660
- "epoch": 8.55,
1661
  "learning_rate": 0.0002,
1662
- "loss": 0.0036,
1663
  "step": 265
1664
  },
1665
  {
1666
- "epoch": 8.58,
1667
  "learning_rate": 0.0002,
1668
- "loss": 0.0037,
1669
  "step": 266
1670
  },
1671
  {
1672
- "epoch": 8.61,
1673
  "learning_rate": 0.0002,
1674
- "loss": 0.0033,
1675
  "step": 267
1676
  },
1677
  {
1678
- "epoch": 8.65,
1679
  "learning_rate": 0.0002,
1680
- "loss": 0.0019,
1681
  "step": 268
1682
  },
1683
  {
1684
- "epoch": 8.68,
1685
  "learning_rate": 0.0002,
1686
- "loss": 0.0034,
1687
  "step": 269
1688
  },
1689
  {
1690
- "epoch": 8.71,
1691
  "learning_rate": 0.0002,
1692
- "loss": 0.003,
 
 
 
 
 
 
 
 
1693
  "step": 270
1694
  },
1695
  {
1696
- "epoch": 8.74,
1697
  "learning_rate": 0.0002,
1698
- "loss": 0.0045,
1699
  "step": 271
1700
  },
1701
  {
1702
- "epoch": 8.77,
1703
  "learning_rate": 0.0002,
1704
- "loss": 0.0058,
1705
  "step": 272
1706
  },
1707
  {
1708
- "epoch": 8.81,
1709
  "learning_rate": 0.0002,
1710
- "loss": 0.0019,
1711
  "step": 273
1712
  },
1713
  {
1714
- "epoch": 8.84,
1715
  "learning_rate": 0.0002,
1716
- "loss": 0.0044,
1717
  "step": 274
1718
  },
1719
  {
1720
- "epoch": 8.87,
1721
  "learning_rate": 0.0002,
1722
- "loss": 0.0037,
1723
  "step": 275
1724
  },
1725
  {
1726
- "epoch": 8.9,
1727
  "learning_rate": 0.0002,
1728
- "loss": 0.0037,
1729
  "step": 276
1730
  },
1731
  {
1732
- "epoch": 8.94,
1733
  "learning_rate": 0.0002,
1734
- "loss": 0.0074,
1735
  "step": 277
1736
  },
1737
  {
1738
- "epoch": 8.97,
1739
  "learning_rate": 0.0002,
1740
- "loss": 0.0025,
1741
  "step": 278
1742
  },
1743
  {
1744
- "epoch": 9.0,
1745
  "learning_rate": 0.0002,
1746
- "loss": 0.0032,
1747
  "step": 279
1748
  },
1749
  {
1750
- "epoch": 9.0,
1751
- "eval_loss": 0.0030502593144774437,
1752
- "eval_runtime": 79.2563,
1753
- "eval_samples_per_second": 12.251,
1754
- "eval_steps_per_second": 0.391,
1755
- "step": 279
1756
- },
1757
- {
1758
- "epoch": 9.03,
1759
  "learning_rate": 0.0002,
1760
- "loss": 0.0028,
1761
  "step": 280
1762
  },
1763
  {
1764
- "epoch": 9.06,
1765
  "learning_rate": 0.0002,
1766
- "loss": 0.0036,
1767
  "step": 281
1768
  },
1769
  {
1770
- "epoch": 9.1,
1771
  "learning_rate": 0.0002,
1772
- "loss": 0.0033,
1773
  "step": 282
1774
  },
1775
  {
1776
- "epoch": 9.13,
1777
  "learning_rate": 0.0002,
1778
- "loss": 0.0032,
1779
  "step": 283
1780
  },
1781
  {
1782
- "epoch": 9.16,
1783
  "learning_rate": 0.0002,
1784
- "loss": 0.002,
1785
  "step": 284
1786
  },
1787
  {
1788
- "epoch": 9.19,
1789
  "learning_rate": 0.0002,
1790
- "loss": 0.0028,
 
 
 
 
 
 
 
 
1791
  "step": 285
1792
  },
1793
  {
1794
- "epoch": 9.23,
1795
  "learning_rate": 0.0002,
1796
- "loss": 0.0025,
1797
  "step": 286
1798
  },
1799
  {
1800
- "epoch": 9.26,
1801
  "learning_rate": 0.0002,
1802
- "loss": 0.0012,
1803
  "step": 287
1804
  },
1805
  {
1806
- "epoch": 9.29,
1807
  "learning_rate": 0.0002,
1808
- "loss": 0.0014,
1809
  "step": 288
1810
  },
1811
  {
1812
- "epoch": 9.32,
1813
  "learning_rate": 0.0002,
1814
- "loss": 0.0032,
1815
  "step": 289
1816
  },
1817
  {
1818
- "epoch": 9.35,
1819
  "learning_rate": 0.0002,
1820
- "loss": 0.0016,
1821
  "step": 290
1822
  },
1823
  {
1824
- "epoch": 9.39,
1825
  "learning_rate": 0.0002,
1826
  "loss": 0.0019,
1827
  "step": 291
1828
  },
1829
  {
1830
- "epoch": 9.42,
1831
  "learning_rate": 0.0002,
1832
- "loss": 0.0022,
1833
  "step": 292
1834
  },
1835
  {
1836
- "epoch": 9.45,
1837
  "learning_rate": 0.0002,
1838
- "loss": 0.0022,
1839
  "step": 293
1840
  },
1841
  {
1842
- "epoch": 9.48,
1843
  "learning_rate": 0.0002,
1844
- "loss": 0.0016,
1845
  "step": 294
1846
  },
1847
  {
1848
- "epoch": 9.52,
1849
  "learning_rate": 0.0002,
1850
- "loss": 0.003,
1851
  "step": 295
1852
  },
1853
  {
1854
- "epoch": 9.55,
1855
  "learning_rate": 0.0002,
1856
- "loss": 0.0036,
1857
  "step": 296
1858
  },
1859
  {
1860
- "epoch": 9.58,
1861
  "learning_rate": 0.0002,
1862
- "loss": 0.0064,
1863
  "step": 297
1864
  },
1865
  {
1866
- "epoch": 9.61,
1867
  "learning_rate": 0.0002,
1868
- "loss": 0.0053,
1869
  "step": 298
1870
  },
1871
  {
1872
- "epoch": 9.65,
1873
  "learning_rate": 0.0002,
1874
- "loss": 0.003,
1875
  "step": 299
1876
  },
1877
  {
1878
- "epoch": 9.68,
1879
  "learning_rate": 0.0002,
1880
- "loss": 0.0034,
1881
  "step": 300
1882
  },
1883
- {
1884
- "epoch": 9.71,
1885
- "learning_rate": 0.0002,
1886
- "loss": 0.0017,
1887
- "step": 301
1888
- },
1889
- {
1890
- "epoch": 9.74,
1891
- "learning_rate": 0.0002,
1892
- "loss": 0.0013,
1893
- "step": 302
1894
- },
1895
- {
1896
- "epoch": 9.77,
1897
- "learning_rate": 0.0002,
1898
- "loss": 0.0028,
1899
- "step": 303
1900
- },
1901
- {
1902
- "epoch": 9.81,
1903
- "learning_rate": 0.0002,
1904
- "loss": 0.0055,
1905
- "step": 304
1906
- },
1907
- {
1908
- "epoch": 9.84,
1909
- "learning_rate": 0.0002,
1910
- "loss": 0.0043,
1911
- "step": 305
1912
- },
1913
- {
1914
- "epoch": 9.87,
1915
- "learning_rate": 0.0002,
1916
- "loss": 0.0014,
1917
- "step": 306
1918
- },
1919
- {
1920
- "epoch": 9.9,
1921
- "learning_rate": 0.0002,
1922
- "loss": 0.0014,
1923
- "step": 307
1924
- },
1925
- {
1926
- "epoch": 9.94,
1927
- "learning_rate": 0.0002,
1928
- "loss": 0.0033,
1929
- "step": 308
1930
- },
1931
- {
1932
- "epoch": 9.97,
1933
- "learning_rate": 0.0002,
1934
- "loss": 0.0023,
1935
- "step": 309
1936
- },
1937
- {
1938
- "epoch": 10.0,
1939
- "learning_rate": 0.0002,
1940
- "loss": 0.0016,
1941
- "step": 310
1942
- },
1943
- {
1944
- "epoch": 10.0,
1945
- "eval_loss": 0.002159351482987404,
1946
- "eval_runtime": 79.3857,
1947
- "eval_samples_per_second": 12.231,
1948
- "eval_steps_per_second": 0.39,
1949
- "step": 310
1950
- },
1951
- {
1952
- "epoch": 10.03,
1953
- "learning_rate": 0.0002,
1954
- "loss": 0.0014,
1955
- "step": 311
1956
- },
1957
- {
1958
- "epoch": 10.06,
1959
- "learning_rate": 0.0002,
1960
- "loss": 0.0028,
1961
- "step": 312
1962
- },
1963
- {
1964
- "epoch": 10.1,
1965
- "learning_rate": 0.0002,
1966
- "loss": 0.0034,
1967
- "step": 313
1968
- },
1969
- {
1970
- "epoch": 10.13,
1971
- "learning_rate": 0.0002,
1972
- "loss": 0.0012,
1973
- "step": 314
1974
- },
1975
- {
1976
- "epoch": 10.16,
1977
- "learning_rate": 0.0002,
1978
- "loss": 0.0022,
1979
- "step": 315
1980
- },
1981
- {
1982
- "epoch": 10.19,
1983
- "learning_rate": 0.0002,
1984
- "loss": 0.0008,
1985
- "step": 316
1986
- },
1987
- {
1988
- "epoch": 10.23,
1989
- "learning_rate": 0.0002,
1990
- "loss": 0.0049,
1991
- "step": 317
1992
- },
1993
- {
1994
- "epoch": 10.26,
1995
- "learning_rate": 0.0002,
1996
- "loss": 0.0023,
1997
- "step": 318
1998
- },
1999
- {
2000
- "epoch": 10.29,
2001
- "learning_rate": 0.0002,
2002
- "loss": 0.0012,
2003
- "step": 319
2004
- },
2005
- {
2006
- "epoch": 10.32,
2007
- "learning_rate": 0.0002,
2008
- "loss": 0.003,
2009
- "step": 320
2010
- },
2011
- {
2012
- "epoch": 10.35,
2013
- "learning_rate": 0.0002,
2014
- "loss": 0.0018,
2015
- "step": 321
2016
- },
2017
- {
2018
- "epoch": 10.39,
2019
- "learning_rate": 0.0002,
2020
- "loss": 0.0005,
2021
- "step": 322
2022
- },
2023
- {
2024
- "epoch": 10.42,
2025
- "learning_rate": 0.0002,
2026
- "loss": 0.0014,
2027
- "step": 323
2028
- },
2029
- {
2030
- "epoch": 10.45,
2031
- "learning_rate": 0.0002,
2032
- "loss": 0.0025,
2033
- "step": 324
2034
- },
2035
- {
2036
- "epoch": 10.48,
2037
- "learning_rate": 0.0002,
2038
- "loss": 0.0023,
2039
- "step": 325
2040
- },
2041
- {
2042
- "epoch": 10.52,
2043
- "learning_rate": 0.0002,
2044
- "loss": 0.002,
2045
- "step": 326
2046
- },
2047
- {
2048
- "epoch": 10.55,
2049
- "learning_rate": 0.0002,
2050
- "loss": 0.0038,
2051
- "step": 327
2052
- },
2053
- {
2054
- "epoch": 10.58,
2055
- "learning_rate": 0.0002,
2056
- "loss": 0.0014,
2057
- "step": 328
2058
- },
2059
- {
2060
- "epoch": 10.61,
2061
- "learning_rate": 0.0002,
2062
- "loss": 0.0053,
2063
- "step": 329
2064
- },
2065
- {
2066
- "epoch": 10.65,
2067
- "learning_rate": 0.0002,
2068
- "loss": 0.0033,
2069
- "step": 330
2070
- },
2071
- {
2072
- "epoch": 10.68,
2073
- "learning_rate": 0.0002,
2074
- "loss": 0.005,
2075
- "step": 331
2076
- },
2077
- {
2078
- "epoch": 10.71,
2079
- "learning_rate": 0.0002,
2080
- "loss": 0.0043,
2081
- "step": 332
2082
- },
2083
- {
2084
- "epoch": 10.74,
2085
- "learning_rate": 0.0002,
2086
- "loss": 0.004,
2087
- "step": 333
2088
- },
2089
- {
2090
- "epoch": 10.77,
2091
- "learning_rate": 0.0002,
2092
- "loss": 0.0037,
2093
- "step": 334
2094
- },
2095
- {
2096
- "epoch": 10.81,
2097
- "learning_rate": 0.0002,
2098
- "loss": 0.0048,
2099
- "step": 335
2100
- },
2101
- {
2102
- "epoch": 10.84,
2103
- "learning_rate": 0.0002,
2104
- "loss": 0.0087,
2105
- "step": 336
2106
- },
2107
- {
2108
- "epoch": 10.87,
2109
- "learning_rate": 0.0002,
2110
- "loss": 0.0051,
2111
- "step": 337
2112
- },
2113
- {
2114
- "epoch": 10.9,
2115
- "learning_rate": 0.0002,
2116
- "loss": 0.0051,
2117
- "step": 338
2118
- },
2119
- {
2120
- "epoch": 10.94,
2121
- "learning_rate": 0.0002,
2122
- "loss": 0.0045,
2123
- "step": 339
2124
- },
2125
- {
2126
- "epoch": 10.97,
2127
- "learning_rate": 0.0002,
2128
- "loss": 0.0088,
2129
- "step": 340
2130
- },
2131
- {
2132
- "epoch": 11.0,
2133
- "learning_rate": 0.0002,
2134
- "loss": 0.0049,
2135
- "step": 341
2136
- },
2137
- {
2138
- "epoch": 11.0,
2139
- "eval_loss": 0.0032238007988780737,
2140
- "eval_runtime": 79.3714,
2141
- "eval_samples_per_second": 12.234,
2142
- "eval_steps_per_second": 0.391,
2143
- "step": 341
2144
- },
2145
- {
2146
- "epoch": 11.03,
2147
- "learning_rate": 0.0002,
2148
- "loss": 0.0023,
2149
- "step": 342
2150
- },
2151
- {
2152
- "epoch": 11.06,
2153
- "learning_rate": 0.0002,
2154
- "loss": 0.0023,
2155
- "step": 343
2156
- },
2157
- {
2158
- "epoch": 11.1,
2159
- "learning_rate": 0.0002,
2160
- "loss": 0.0035,
2161
- "step": 344
2162
- },
2163
- {
2164
- "epoch": 11.13,
2165
- "learning_rate": 0.0002,
2166
- "loss": 0.0022,
2167
- "step": 345
2168
- },
2169
- {
2170
- "epoch": 11.16,
2171
- "learning_rate": 0.0002,
2172
- "loss": 0.0032,
2173
- "step": 346
2174
- },
2175
- {
2176
- "epoch": 11.19,
2177
- "learning_rate": 0.0002,
2178
- "loss": 0.0042,
2179
- "step": 347
2180
- },
2181
- {
2182
- "epoch": 11.23,
2183
- "learning_rate": 0.0002,
2184
- "loss": 0.0017,
2185
- "step": 348
2186
- },
2187
- {
2188
- "epoch": 11.26,
2189
- "learning_rate": 0.0002,
2190
- "loss": 0.0066,
2191
- "step": 349
2192
- },
2193
- {
2194
- "epoch": 11.29,
2195
- "learning_rate": 0.0002,
2196
- "loss": 0.0079,
2197
- "step": 350
2198
- },
2199
- {
2200
- "epoch": 11.32,
2201
- "learning_rate": 0.0002,
2202
- "loss": 0.0031,
2203
- "step": 351
2204
- },
2205
- {
2206
- "epoch": 11.35,
2207
- "learning_rate": 0.0002,
2208
- "loss": 0.0081,
2209
- "step": 352
2210
- },
2211
- {
2212
- "epoch": 11.39,
2213
- "learning_rate": 0.0002,
2214
- "loss": 0.0062,
2215
- "step": 353
2216
- },
2217
- {
2218
- "epoch": 11.42,
2219
- "learning_rate": 0.0002,
2220
- "loss": 0.0035,
2221
- "step": 354
2222
- },
2223
- {
2224
- "epoch": 11.45,
2225
- "learning_rate": 0.0002,
2226
- "loss": 0.0036,
2227
- "step": 355
2228
- },
2229
- {
2230
- "epoch": 11.48,
2231
- "learning_rate": 0.0002,
2232
- "loss": 0.0029,
2233
- "step": 356
2234
- },
2235
- {
2236
- "epoch": 11.52,
2237
- "learning_rate": 0.0002,
2238
- "loss": 0.0034,
2239
- "step": 357
2240
- },
2241
- {
2242
- "epoch": 11.55,
2243
- "learning_rate": 0.0002,
2244
- "loss": 0.0037,
2245
- "step": 358
2246
- },
2247
- {
2248
- "epoch": 11.58,
2249
- "learning_rate": 0.0002,
2250
- "loss": 0.0067,
2251
- "step": 359
2252
- },
2253
- {
2254
- "epoch": 11.61,
2255
- "learning_rate": 0.0002,
2256
- "loss": 0.0015,
2257
- "step": 360
2258
- },
2259
- {
2260
- "epoch": 11.65,
2261
- "learning_rate": 0.0002,
2262
- "loss": 0.0053,
2263
- "step": 361
2264
- },
2265
- {
2266
- "epoch": 11.68,
2267
- "learning_rate": 0.0002,
2268
- "loss": 0.0025,
2269
- "step": 362
2270
- },
2271
- {
2272
- "epoch": 11.71,
2273
- "learning_rate": 0.0002,
2274
- "loss": 0.0073,
2275
- "step": 363
2276
- },
2277
- {
2278
- "epoch": 11.74,
2279
- "learning_rate": 0.0002,
2280
- "loss": 0.0043,
2281
- "step": 364
2282
- },
2283
- {
2284
- "epoch": 11.77,
2285
- "learning_rate": 0.0002,
2286
- "loss": 0.004,
2287
- "step": 365
2288
- },
2289
- {
2290
- "epoch": 11.81,
2291
- "learning_rate": 0.0002,
2292
- "loss": 0.0061,
2293
- "step": 366
2294
- },
2295
- {
2296
- "epoch": 11.84,
2297
- "learning_rate": 0.0002,
2298
- "loss": 0.0074,
2299
- "step": 367
2300
- },
2301
- {
2302
- "epoch": 11.87,
2303
- "learning_rate": 0.0002,
2304
- "loss": 0.007,
2305
- "step": 368
2306
- },
2307
- {
2308
- "epoch": 11.9,
2309
- "learning_rate": 0.0002,
2310
- "loss": 0.004,
2311
- "step": 369
2312
- },
2313
- {
2314
- "epoch": 11.94,
2315
- "learning_rate": 0.0002,
2316
- "loss": 0.0066,
2317
- "step": 370
2318
- },
2319
- {
2320
- "epoch": 11.97,
2321
- "learning_rate": 0.0002,
2322
- "loss": 0.0051,
2323
- "step": 371
2324
- },
2325
- {
2326
- "epoch": 12.0,
2327
- "learning_rate": 0.0002,
2328
- "loss": 0.0022,
2329
- "step": 372
2330
- },
2331
- {
2332
- "epoch": 12.0,
2333
- "eval_loss": 0.004942539148032665,
2334
- "eval_runtime": 79.161,
2335
- "eval_samples_per_second": 12.266,
2336
- "eval_steps_per_second": 0.392,
2337
- "step": 372
2338
- },
2339
- {
2340
- "epoch": 12.03,
2341
- "learning_rate": 0.0002,
2342
- "loss": 0.0037,
2343
- "step": 373
2344
- },
2345
- {
2346
- "epoch": 12.06,
2347
- "learning_rate": 0.0002,
2348
- "loss": 0.004,
2349
- "step": 374
2350
- },
2351
- {
2352
- "epoch": 12.1,
2353
- "learning_rate": 0.0002,
2354
- "loss": 0.003,
2355
- "step": 375
2356
- },
2357
- {
2358
- "epoch": 12.13,
2359
- "learning_rate": 0.0002,
2360
- "loss": 0.0033,
2361
- "step": 376
2362
- },
2363
- {
2364
- "epoch": 12.16,
2365
- "learning_rate": 0.0002,
2366
- "loss": 0.0071,
2367
- "step": 377
2368
- },
2369
- {
2370
- "epoch": 12.19,
2371
- "learning_rate": 0.0002,
2372
- "loss": 0.0052,
2373
- "step": 378
2374
- },
2375
- {
2376
- "epoch": 12.23,
2377
- "learning_rate": 0.0002,
2378
- "loss": 0.0047,
2379
- "step": 379
2380
- },
2381
- {
2382
- "epoch": 12.26,
2383
- "learning_rate": 0.0002,
2384
- "loss": 0.006,
2385
- "step": 380
2386
- },
2387
- {
2388
- "epoch": 12.29,
2389
- "learning_rate": 0.0002,
2390
- "loss": 0.0058,
2391
- "step": 381
2392
- },
2393
- {
2394
- "epoch": 12.32,
2395
- "learning_rate": 0.0002,
2396
- "loss": 0.0065,
2397
- "step": 382
2398
- },
2399
- {
2400
- "epoch": 12.35,
2401
- "learning_rate": 0.0002,
2402
- "loss": 0.0039,
2403
- "step": 383
2404
- },
2405
- {
2406
- "epoch": 12.39,
2407
- "learning_rate": 0.0002,
2408
- "loss": 0.006,
2409
- "step": 384
2410
- },
2411
- {
2412
- "epoch": 12.42,
2413
- "learning_rate": 0.0002,
2414
- "loss": 0.0038,
2415
- "step": 385
2416
- },
2417
- {
2418
- "epoch": 12.45,
2419
- "learning_rate": 0.0002,
2420
- "loss": 0.005,
2421
- "step": 386
2422
- },
2423
- {
2424
- "epoch": 12.48,
2425
- "learning_rate": 0.0002,
2426
- "loss": 0.0053,
2427
- "step": 387
2428
- },
2429
- {
2430
- "epoch": 12.52,
2431
- "learning_rate": 0.0002,
2432
- "loss": 0.0033,
2433
- "step": 388
2434
- },
2435
- {
2436
- "epoch": 12.55,
2437
- "learning_rate": 0.0002,
2438
- "loss": 0.005,
2439
- "step": 389
2440
- },
2441
- {
2442
- "epoch": 12.58,
2443
- "learning_rate": 0.0002,
2444
- "loss": 0.0038,
2445
- "step": 390
2446
- },
2447
- {
2448
- "epoch": 12.61,
2449
- "learning_rate": 0.0002,
2450
- "loss": 0.0025,
2451
- "step": 391
2452
- },
2453
- {
2454
- "epoch": 12.65,
2455
- "learning_rate": 0.0002,
2456
- "loss": 0.0069,
2457
- "step": 392
2458
- },
2459
- {
2460
- "epoch": 12.68,
2461
- "learning_rate": 0.0002,
2462
- "loss": 0.0068,
2463
- "step": 393
2464
- },
2465
- {
2466
- "epoch": 12.71,
2467
- "learning_rate": 0.0002,
2468
- "loss": 0.0069,
2469
- "step": 394
2470
- },
2471
- {
2472
- "epoch": 12.74,
2473
- "learning_rate": 0.0002,
2474
- "loss": 0.0038,
2475
- "step": 395
2476
- },
2477
- {
2478
- "epoch": 12.77,
2479
- "learning_rate": 0.0002,
2480
- "loss": 0.0036,
2481
- "step": 396
2482
- },
2483
- {
2484
- "epoch": 12.81,
2485
- "learning_rate": 0.0002,
2486
- "loss": 0.0067,
2487
- "step": 397
2488
- },
2489
- {
2490
- "epoch": 12.84,
2491
- "learning_rate": 0.0002,
2492
- "loss": 0.0045,
2493
- "step": 398
2494
- },
2495
- {
2496
- "epoch": 12.87,
2497
- "learning_rate": 0.0002,
2498
- "loss": 0.0056,
2499
- "step": 399
2500
- },
2501
- {
2502
- "epoch": 12.9,
2503
- "learning_rate": 0.0002,
2504
- "loss": 0.0059,
2505
- "step": 400
2506
- },
2507
- {
2508
- "epoch": 12.94,
2509
- "learning_rate": 0.0002,
2510
- "loss": 0.003,
2511
- "step": 401
2512
- },
2513
- {
2514
- "epoch": 12.97,
2515
- "learning_rate": 0.0002,
2516
- "loss": 0.0064,
2517
- "step": 402
2518
- },
2519
- {
2520
- "epoch": 13.0,
2521
- "learning_rate": 0.0002,
2522
- "loss": 0.0035,
2523
- "step": 403
2524
- },
2525
- {
2526
- "epoch": 13.0,
2527
- "eval_loss": 0.007653499487787485,
2528
- "eval_runtime": 79.6263,
2529
- "eval_samples_per_second": 12.194,
2530
- "eval_steps_per_second": 0.389,
2531
- "step": 403
2532
- },
2533
- {
2534
- "epoch": 13.03,
2535
- "learning_rate": 0.0002,
2536
- "loss": 0.0063,
2537
- "step": 404
2538
- },
2539
- {
2540
- "epoch": 13.06,
2541
- "learning_rate": 0.0002,
2542
- "loss": 0.0052,
2543
- "step": 405
2544
- },
2545
- {
2546
- "epoch": 13.1,
2547
- "learning_rate": 0.0002,
2548
- "loss": 0.0048,
2549
- "step": 406
2550
- },
2551
- {
2552
- "epoch": 13.13,
2553
- "learning_rate": 0.0002,
2554
- "loss": 0.0066,
2555
- "step": 407
2556
- },
2557
- {
2558
- "epoch": 13.16,
2559
- "learning_rate": 0.0002,
2560
- "loss": 0.0127,
2561
- "step": 408
2562
- },
2563
- {
2564
- "epoch": 13.19,
2565
- "learning_rate": 0.0002,
2566
- "loss": 0.007,
2567
- "step": 409
2568
- },
2569
- {
2570
- "epoch": 13.23,
2571
- "learning_rate": 0.0002,
2572
- "loss": 0.0106,
2573
- "step": 410
2574
- },
2575
- {
2576
- "epoch": 13.26,
2577
- "learning_rate": 0.0002,
2578
- "loss": 0.0053,
2579
- "step": 411
2580
- },
2581
- {
2582
- "epoch": 13.29,
2583
- "learning_rate": 0.0002,
2584
- "loss": 0.0067,
2585
- "step": 412
2586
- },
2587
- {
2588
- "epoch": 13.32,
2589
- "learning_rate": 0.0002,
2590
- "loss": 0.0051,
2591
- "step": 413
2592
- },
2593
- {
2594
- "epoch": 13.35,
2595
- "learning_rate": 0.0002,
2596
- "loss": 0.0029,
2597
- "step": 414
2598
- },
2599
- {
2600
- "epoch": 13.39,
2601
- "learning_rate": 0.0002,
2602
- "loss": 0.0036,
2603
- "step": 415
2604
- },
2605
- {
2606
- "epoch": 13.42,
2607
- "learning_rate": 0.0002,
2608
- "loss": 0.0039,
2609
- "step": 416
2610
- },
2611
- {
2612
- "epoch": 13.45,
2613
- "learning_rate": 0.0002,
2614
- "loss": 0.0077,
2615
- "step": 417
2616
- },
2617
- {
2618
- "epoch": 13.48,
2619
- "learning_rate": 0.0002,
2620
- "loss": 0.003,
2621
- "step": 418
2622
- },
2623
- {
2624
- "epoch": 13.52,
2625
- "learning_rate": 0.0002,
2626
- "loss": 0.0064,
2627
- "step": 419
2628
- },
2629
- {
2630
- "epoch": 13.55,
2631
- "learning_rate": 0.0002,
2632
- "loss": 0.0104,
2633
- "step": 420
2634
- },
2635
- {
2636
- "epoch": 13.58,
2637
- "learning_rate": 0.0002,
2638
- "loss": 0.0098,
2639
- "step": 421
2640
- },
2641
- {
2642
- "epoch": 13.61,
2643
- "learning_rate": 0.0002,
2644
- "loss": 0.0069,
2645
- "step": 422
2646
- },
2647
- {
2648
- "epoch": 13.65,
2649
- "learning_rate": 0.0002,
2650
- "loss": 0.0078,
2651
- "step": 423
2652
- },
2653
- {
2654
- "epoch": 13.68,
2655
- "learning_rate": 0.0002,
2656
- "loss": 0.0084,
2657
- "step": 424
2658
- },
2659
- {
2660
- "epoch": 13.71,
2661
- "learning_rate": 0.0002,
2662
- "loss": 0.0043,
2663
- "step": 425
2664
- },
2665
- {
2666
- "epoch": 13.74,
2667
- "learning_rate": 0.0002,
2668
- "loss": 0.009,
2669
- "step": 426
2670
- },
2671
- {
2672
- "epoch": 13.77,
2673
- "learning_rate": 0.0002,
2674
- "loss": 0.0058,
2675
- "step": 427
2676
- },
2677
- {
2678
- "epoch": 13.81,
2679
- "learning_rate": 0.0002,
2680
- "loss": 0.0089,
2681
- "step": 428
2682
- },
2683
- {
2684
- "epoch": 13.84,
2685
- "learning_rate": 0.0002,
2686
- "loss": 0.0098,
2687
- "step": 429
2688
- },
2689
- {
2690
- "epoch": 13.87,
2691
- "learning_rate": 0.0002,
2692
- "loss": 0.0051,
2693
- "step": 430
2694
- },
2695
- {
2696
- "epoch": 13.9,
2697
- "learning_rate": 0.0002,
2698
- "loss": 0.0043,
2699
- "step": 431
2700
- },
2701
- {
2702
- "epoch": 13.94,
2703
- "learning_rate": 0.0002,
2704
- "loss": 0.0039,
2705
- "step": 432
2706
- },
2707
- {
2708
- "epoch": 13.97,
2709
- "learning_rate": 0.0002,
2710
- "loss": 0.0032,
2711
- "step": 433
2712
- },
2713
- {
2714
- "epoch": 14.0,
2715
- "learning_rate": 0.0002,
2716
- "loss": 0.0063,
2717
- "step": 434
2718
- },
2719
- {
2720
- "epoch": 14.0,
2721
- "eval_loss": 0.0040127672255039215,
2722
- "eval_runtime": 79.5726,
2723
- "eval_samples_per_second": 12.203,
2724
- "eval_steps_per_second": 0.39,
2725
- "step": 434
2726
- },
2727
- {
2728
- "epoch": 14.03,
2729
- "learning_rate": 0.0002,
2730
- "loss": 0.0046,
2731
- "step": 435
2732
- },
2733
- {
2734
- "epoch": 14.06,
2735
- "learning_rate": 0.0002,
2736
- "loss": 0.0081,
2737
- "step": 436
2738
- },
2739
- {
2740
- "epoch": 14.1,
2741
- "learning_rate": 0.0002,
2742
- "loss": 0.0023,
2743
- "step": 437
2744
- },
2745
- {
2746
- "epoch": 14.13,
2747
- "learning_rate": 0.0002,
2748
- "loss": 0.0062,
2749
- "step": 438
2750
- },
2751
- {
2752
- "epoch": 14.16,
2753
- "learning_rate": 0.0002,
2754
- "loss": 0.0025,
2755
- "step": 439
2756
- },
2757
- {
2758
- "epoch": 14.19,
2759
- "learning_rate": 0.0002,
2760
- "loss": 0.0012,
2761
- "step": 440
2762
- },
2763
- {
2764
- "epoch": 14.23,
2765
- "learning_rate": 0.0002,
2766
- "loss": 0.0029,
2767
- "step": 441
2768
- },
2769
- {
2770
- "epoch": 14.26,
2771
- "learning_rate": 0.0002,
2772
- "loss": 0.003,
2773
- "step": 442
2774
- },
2775
- {
2776
- "epoch": 14.29,
2777
- "learning_rate": 0.0002,
2778
- "loss": 0.0045,
2779
- "step": 443
2780
- },
2781
- {
2782
- "epoch": 14.32,
2783
- "learning_rate": 0.0002,
2784
- "loss": 0.0112,
2785
- "step": 444
2786
- },
2787
- {
2788
- "epoch": 14.35,
2789
- "learning_rate": 0.0002,
2790
- "loss": 0.0083,
2791
- "step": 445
2792
- },
2793
- {
2794
- "epoch": 14.39,
2795
- "learning_rate": 0.0002,
2796
- "loss": 0.0036,
2797
- "step": 446
2798
- },
2799
- {
2800
- "epoch": 14.42,
2801
- "learning_rate": 0.0002,
2802
- "loss": 0.0023,
2803
- "step": 447
2804
- },
2805
- {
2806
- "epoch": 14.45,
2807
- "learning_rate": 0.0002,
2808
- "loss": 0.0033,
2809
- "step": 448
2810
- },
2811
- {
2812
- "epoch": 14.48,
2813
- "learning_rate": 0.0002,
2814
- "loss": 0.003,
2815
- "step": 449
2816
- },
2817
- {
2818
- "epoch": 14.52,
2819
- "learning_rate": 0.0002,
2820
- "loss": 0.0109,
2821
- "step": 450
2822
- },
2823
- {
2824
- "epoch": 14.55,
2825
- "learning_rate": 0.0002,
2826
- "loss": 0.0085,
2827
- "step": 451
2828
- },
2829
- {
2830
- "epoch": 14.58,
2831
- "learning_rate": 0.0002,
2832
- "loss": 0.0051,
2833
- "step": 452
2834
- },
2835
- {
2836
- "epoch": 14.61,
2837
- "learning_rate": 0.0002,
2838
- "loss": 0.0065,
2839
- "step": 453
2840
- },
2841
- {
2842
- "epoch": 14.65,
2843
- "learning_rate": 0.0002,
2844
- "loss": 0.0049,
2845
- "step": 454
2846
- },
2847
- {
2848
- "epoch": 14.68,
2849
- "learning_rate": 0.0002,
2850
- "loss": 0.007,
2851
- "step": 455
2852
- },
2853
- {
2854
- "epoch": 14.71,
2855
- "learning_rate": 0.0002,
2856
- "loss": 0.0024,
2857
- "step": 456
2858
- },
2859
- {
2860
- "epoch": 14.74,
2861
- "learning_rate": 0.0002,
2862
- "loss": 0.007,
2863
- "step": 457
2864
- },
2865
- {
2866
- "epoch": 14.77,
2867
- "learning_rate": 0.0002,
2868
- "loss": 0.0034,
2869
- "step": 458
2870
- },
2871
- {
2872
- "epoch": 14.81,
2873
- "learning_rate": 0.0002,
2874
- "loss": 0.0048,
2875
- "step": 459
2876
- },
2877
- {
2878
- "epoch": 14.84,
2879
- "learning_rate": 0.0002,
2880
- "loss": 0.0053,
2881
- "step": 460
2882
- },
2883
- {
2884
- "epoch": 14.87,
2885
- "learning_rate": 0.0002,
2886
- "loss": 0.0087,
2887
- "step": 461
2888
- },
2889
- {
2890
- "epoch": 14.9,
2891
- "learning_rate": 0.0002,
2892
- "loss": 0.0068,
2893
- "step": 462
2894
- },
2895
- {
2896
- "epoch": 14.94,
2897
- "learning_rate": 0.0002,
2898
- "loss": 0.007,
2899
- "step": 463
2900
- },
2901
- {
2902
- "epoch": 14.97,
2903
- "learning_rate": 0.0002,
2904
- "loss": 0.0067,
2905
- "step": 464
2906
- },
2907
- {
2908
- "epoch": 15.0,
2909
- "learning_rate": 0.0002,
2910
- "loss": 0.0076,
2911
- "step": 465
2912
- },
2913
- {
2914
- "epoch": 15.0,
2915
- "eval_loss": 0.00824675615876913,
2916
- "eval_runtime": 79.7814,
2917
- "eval_samples_per_second": 12.171,
2918
- "eval_steps_per_second": 0.389,
2919
- "step": 465
2920
- },
2921
- {
2922
- "epoch": 15.03,
2923
- "learning_rate": 0.0002,
2924
- "loss": 0.0087,
2925
- "step": 466
2926
- },
2927
- {
2928
- "epoch": 15.06,
2929
- "learning_rate": 0.0002,
2930
- "loss": 0.0094,
2931
- "step": 467
2932
- },
2933
- {
2934
- "epoch": 15.1,
2935
- "learning_rate": 0.0002,
2936
- "loss": 0.011,
2937
- "step": 468
2938
- },
2939
- {
2940
- "epoch": 15.13,
2941
- "learning_rate": 0.0002,
2942
- "loss": 0.0064,
2943
- "step": 469
2944
- },
2945
- {
2946
- "epoch": 15.16,
2947
- "learning_rate": 0.0002,
2948
- "loss": 0.0067,
2949
- "step": 470
2950
- },
2951
- {
2952
- "epoch": 15.19,
2953
- "learning_rate": 0.0002,
2954
- "loss": 0.0126,
2955
- "step": 471
2956
- },
2957
- {
2958
- "epoch": 15.23,
2959
- "learning_rate": 0.0002,
2960
- "loss": 0.0078,
2961
- "step": 472
2962
- },
2963
- {
2964
- "epoch": 15.26,
2965
- "learning_rate": 0.0002,
2966
- "loss": 0.01,
2967
- "step": 473
2968
- },
2969
- {
2970
- "epoch": 15.29,
2971
- "learning_rate": 0.0002,
2972
- "loss": 0.0185,
2973
- "step": 474
2974
- },
2975
- {
2976
- "epoch": 15.32,
2977
- "learning_rate": 0.0002,
2978
- "loss": 0.0037,
2979
- "step": 475
2980
- },
2981
- {
2982
- "epoch": 15.35,
2983
- "learning_rate": 0.0002,
2984
- "loss": 0.0077,
2985
- "step": 476
2986
- },
2987
- {
2988
- "epoch": 15.39,
2989
- "learning_rate": 0.0002,
2990
- "loss": 0.0095,
2991
- "step": 477
2992
- },
2993
- {
2994
- "epoch": 15.42,
2995
- "learning_rate": 0.0002,
2996
- "loss": 0.0053,
2997
- "step": 478
2998
- },
2999
- {
3000
- "epoch": 15.45,
3001
- "learning_rate": 0.0002,
3002
- "loss": 0.0059,
3003
- "step": 479
3004
- },
3005
- {
3006
- "epoch": 15.48,
3007
- "learning_rate": 0.0002,
3008
- "loss": 0.0083,
3009
- "step": 480
3010
- },
3011
- {
3012
- "epoch": 15.52,
3013
- "learning_rate": 0.0002,
3014
- "loss": 0.0101,
3015
- "step": 481
3016
- },
3017
- {
3018
- "epoch": 15.55,
3019
- "learning_rate": 0.0002,
3020
- "loss": 0.0043,
3021
- "step": 482
3022
- },
3023
- {
3024
- "epoch": 15.58,
3025
- "learning_rate": 0.0002,
3026
- "loss": 0.0088,
3027
- "step": 483
3028
- },
3029
- {
3030
- "epoch": 15.61,
3031
- "learning_rate": 0.0002,
3032
- "loss": 0.0085,
3033
- "step": 484
3034
- },
3035
- {
3036
- "epoch": 15.65,
3037
- "learning_rate": 0.0002,
3038
- "loss": 0.0088,
3039
- "step": 485
3040
- },
3041
- {
3042
- "epoch": 15.68,
3043
- "learning_rate": 0.0002,
3044
- "loss": 0.0083,
3045
- "step": 486
3046
- },
3047
- {
3048
- "epoch": 15.71,
3049
- "learning_rate": 0.0002,
3050
- "loss": 0.0038,
3051
- "step": 487
3052
- },
3053
- {
3054
- "epoch": 15.74,
3055
- "learning_rate": 0.0002,
3056
- "loss": 0.0132,
3057
- "step": 488
3058
- },
3059
- {
3060
- "epoch": 15.77,
3061
- "learning_rate": 0.0002,
3062
- "loss": 0.0092,
3063
- "step": 489
3064
- },
3065
- {
3066
- "epoch": 15.81,
3067
- "learning_rate": 0.0002,
3068
- "loss": 0.0084,
3069
- "step": 490
3070
- },
3071
- {
3072
- "epoch": 15.84,
3073
- "learning_rate": 0.0002,
3074
- "loss": 0.0059,
3075
- "step": 491
3076
- },
3077
- {
3078
- "epoch": 15.87,
3079
- "learning_rate": 0.0002,
3080
- "loss": 0.0051,
3081
- "step": 492
3082
- },
3083
- {
3084
- "epoch": 15.9,
3085
- "learning_rate": 0.0002,
3086
- "loss": 0.0046,
3087
- "step": 493
3088
- },
3089
- {
3090
- "epoch": 15.94,
3091
- "learning_rate": 0.0002,
3092
- "loss": 0.0159,
3093
- "step": 494
3094
- },
3095
- {
3096
- "epoch": 15.97,
3097
- "learning_rate": 0.0002,
3098
- "loss": 0.0041,
3099
- "step": 495
3100
- },
3101
- {
3102
- "epoch": 16.0,
3103
- "learning_rate": 0.0002,
3104
- "loss": 0.0063,
3105
- "step": 496
3106
- },
3107
- {
3108
- "epoch": 16.0,
3109
- "eval_loss": 0.006366066634654999,
3110
- "eval_runtime": 80.009,
3111
- "eval_samples_per_second": 12.136,
3112
- "eval_steps_per_second": 0.387,
3113
- "step": 496
3114
- },
3115
- {
3116
- "epoch": 16.03,
3117
- "learning_rate": 0.0002,
3118
- "loss": 0.0048,
3119
- "step": 497
3120
- },
3121
- {
3122
- "epoch": 16.06,
3123
- "learning_rate": 0.0002,
3124
- "loss": 0.0127,
3125
- "step": 498
3126
- },
3127
- {
3128
- "epoch": 16.1,
3129
- "learning_rate": 0.0002,
3130
- "loss": 0.0072,
3131
- "step": 499
3132
- },
3133
- {
3134
- "epoch": 16.13,
3135
- "learning_rate": 0.0002,
3136
- "loss": 0.0088,
3137
- "step": 500
3138
- },
3139
- {
3140
- "epoch": 16.16,
3141
- "learning_rate": 0.0002,
3142
- "loss": 0.015,
3143
- "step": 501
3144
- },
3145
- {
3146
- "epoch": 16.19,
3147
- "learning_rate": 0.0002,
3148
- "loss": 0.0076,
3149
- "step": 502
3150
- },
3151
- {
3152
- "epoch": 16.23,
3153
- "learning_rate": 0.0002,
3154
- "loss": 0.0071,
3155
- "step": 503
3156
- },
3157
- {
3158
- "epoch": 16.26,
3159
- "learning_rate": 0.0002,
3160
- "loss": 0.0109,
3161
- "step": 504
3162
- },
3163
- {
3164
- "epoch": 16.29,
3165
- "learning_rate": 0.0002,
3166
- "loss": 0.0094,
3167
- "step": 505
3168
- },
3169
- {
3170
- "epoch": 16.32,
3171
- "learning_rate": 0.0002,
3172
- "loss": 0.0055,
3173
- "step": 506
3174
- },
3175
- {
3176
- "epoch": 16.35,
3177
- "learning_rate": 0.0002,
3178
- "loss": 0.006,
3179
- "step": 507
3180
- },
3181
- {
3182
- "epoch": 16.39,
3183
- "learning_rate": 0.0002,
3184
- "loss": 0.0079,
3185
- "step": 508
3186
- },
3187
- {
3188
- "epoch": 16.42,
3189
- "learning_rate": 0.0002,
3190
- "loss": 0.0114,
3191
- "step": 509
3192
- },
3193
- {
3194
- "epoch": 16.45,
3195
- "learning_rate": 0.0002,
3196
- "loss": 0.0085,
3197
- "step": 510
3198
- },
3199
- {
3200
- "epoch": 16.48,
3201
- "learning_rate": 0.0002,
3202
- "loss": 0.0088,
3203
- "step": 511
3204
- },
3205
- {
3206
- "epoch": 16.52,
3207
- "learning_rate": 0.0002,
3208
- "loss": 0.0103,
3209
- "step": 512
3210
- },
3211
- {
3212
- "epoch": 16.55,
3213
- "learning_rate": 0.0002,
3214
- "loss": 0.0092,
3215
- "step": 513
3216
- },
3217
- {
3218
- "epoch": 16.58,
3219
- "learning_rate": 0.0002,
3220
- "loss": 0.0055,
3221
- "step": 514
3222
- },
3223
- {
3224
- "epoch": 16.61,
3225
- "learning_rate": 0.0002,
3226
- "loss": 0.0114,
3227
- "step": 515
3228
- },
3229
- {
3230
- "epoch": 16.65,
3231
- "learning_rate": 0.0002,
3232
- "loss": 0.0095,
3233
- "step": 516
3234
- },
3235
- {
3236
- "epoch": 16.68,
3237
- "learning_rate": 0.0002,
3238
- "loss": 0.0082,
3239
- "step": 517
3240
- },
3241
- {
3242
- "epoch": 16.71,
3243
- "learning_rate": 0.0002,
3244
- "loss": 0.0118,
3245
- "step": 518
3246
- },
3247
- {
3248
- "epoch": 16.74,
3249
- "learning_rate": 0.0002,
3250
- "loss": 0.0098,
3251
- "step": 519
3252
- },
3253
- {
3254
- "epoch": 16.77,
3255
- "learning_rate": 0.0002,
3256
- "loss": 0.0115,
3257
- "step": 520
3258
- },
3259
- {
3260
- "epoch": 16.81,
3261
- "learning_rate": 0.0002,
3262
- "loss": 0.0123,
3263
- "step": 521
3264
- },
3265
- {
3266
- "epoch": 16.84,
3267
- "learning_rate": 0.0002,
3268
- "loss": 0.0102,
3269
- "step": 522
3270
- },
3271
- {
3272
- "epoch": 16.87,
3273
- "learning_rate": 0.0002,
3274
- "loss": 0.0089,
3275
- "step": 523
3276
- },
3277
- {
3278
- "epoch": 16.9,
3279
- "learning_rate": 0.0002,
3280
- "loss": 0.0042,
3281
- "step": 524
3282
- },
3283
- {
3284
- "epoch": 16.94,
3285
- "learning_rate": 0.0002,
3286
- "loss": 0.0098,
3287
- "step": 525
3288
- },
3289
- {
3290
- "epoch": 16.97,
3291
- "learning_rate": 0.0002,
3292
- "loss": 0.0061,
3293
- "step": 526
3294
- },
3295
- {
3296
- "epoch": 17.0,
3297
- "learning_rate": 0.0002,
3298
- "loss": 0.0086,
3299
- "step": 527
3300
- },
3301
- {
3302
- "epoch": 17.0,
3303
- "eval_loss": 0.006956302560865879,
3304
- "eval_runtime": 79.5624,
3305
- "eval_samples_per_second": 12.204,
3306
- "eval_steps_per_second": 0.39,
3307
- "step": 527
3308
- },
3309
- {
3310
- "epoch": 17.03,
3311
- "learning_rate": 0.0002,
3312
- "loss": 0.0083,
3313
- "step": 528
3314
- },
3315
- {
3316
- "epoch": 17.06,
3317
- "learning_rate": 0.0002,
3318
- "loss": 0.007,
3319
- "step": 529
3320
- },
3321
- {
3322
- "epoch": 17.1,
3323
- "learning_rate": 0.0002,
3324
- "loss": 0.0096,
3325
- "step": 530
3326
- },
3327
- {
3328
- "epoch": 17.13,
3329
- "learning_rate": 0.0002,
3330
- "loss": 0.0091,
3331
- "step": 531
3332
- },
3333
- {
3334
- "epoch": 17.16,
3335
- "learning_rate": 0.0002,
3336
- "loss": 0.0083,
3337
- "step": 532
3338
- },
3339
- {
3340
- "epoch": 17.19,
3341
- "learning_rate": 0.0002,
3342
- "loss": 0.0037,
3343
- "step": 533
3344
- },
3345
- {
3346
- "epoch": 17.23,
3347
- "learning_rate": 0.0002,
3348
- "loss": 0.009,
3349
- "step": 534
3350
- },
3351
- {
3352
- "epoch": 17.26,
3353
- "learning_rate": 0.0002,
3354
- "loss": 0.0045,
3355
- "step": 535
3356
- },
3357
- {
3358
- "epoch": 17.29,
3359
- "learning_rate": 0.0002,
3360
- "loss": 0.0066,
3361
- "step": 536
3362
- },
3363
- {
3364
- "epoch": 17.32,
3365
- "learning_rate": 0.0002,
3366
- "loss": 0.0081,
3367
- "step": 537
3368
- },
3369
- {
3370
- "epoch": 17.35,
3371
- "learning_rate": 0.0002,
3372
- "loss": 0.0092,
3373
- "step": 538
3374
- },
3375
- {
3376
- "epoch": 17.39,
3377
- "learning_rate": 0.0002,
3378
- "loss": 0.0037,
3379
- "step": 539
3380
- },
3381
- {
3382
- "epoch": 17.42,
3383
- "learning_rate": 0.0002,
3384
- "loss": 0.0046,
3385
- "step": 540
3386
- },
3387
- {
3388
- "epoch": 17.45,
3389
- "learning_rate": 0.0002,
3390
- "loss": 0.006,
3391
- "step": 541
3392
- },
3393
- {
3394
- "epoch": 17.48,
3395
- "learning_rate": 0.0002,
3396
- "loss": 0.0095,
3397
- "step": 542
3398
- },
3399
- {
3400
- "epoch": 17.52,
3401
- "learning_rate": 0.0002,
3402
- "loss": 0.0033,
3403
- "step": 543
3404
- },
3405
- {
3406
- "epoch": 17.55,
3407
- "learning_rate": 0.0002,
3408
- "loss": 0.0074,
3409
- "step": 544
3410
- },
3411
- {
3412
- "epoch": 17.58,
3413
- "learning_rate": 0.0002,
3414
- "loss": 0.005,
3415
- "step": 545
3416
- },
3417
- {
3418
- "epoch": 17.61,
3419
- "learning_rate": 0.0002,
3420
- "loss": 0.0054,
3421
- "step": 546
3422
- },
3423
- {
3424
- "epoch": 17.65,
3425
- "learning_rate": 0.0002,
3426
- "loss": 0.0064,
3427
- "step": 547
3428
- },
3429
- {
3430
- "epoch": 17.68,
3431
- "learning_rate": 0.0002,
3432
- "loss": 0.0124,
3433
- "step": 548
3434
- },
3435
- {
3436
- "epoch": 17.71,
3437
- "learning_rate": 0.0002,
3438
- "loss": 0.0063,
3439
- "step": 549
3440
- },
3441
- {
3442
- "epoch": 17.74,
3443
- "learning_rate": 0.0002,
3444
- "loss": 0.0102,
3445
- "step": 550
3446
- },
3447
- {
3448
- "epoch": 17.77,
3449
- "learning_rate": 0.0002,
3450
- "loss": 0.0053,
3451
- "step": 551
3452
- },
3453
- {
3454
- "epoch": 17.81,
3455
- "learning_rate": 0.0002,
3456
- "loss": 0.0087,
3457
- "step": 552
3458
- },
3459
- {
3460
- "epoch": 17.84,
3461
- "learning_rate": 0.0002,
3462
- "loss": 0.0066,
3463
- "step": 553
3464
- },
3465
- {
3466
- "epoch": 17.87,
3467
- "learning_rate": 0.0002,
3468
- "loss": 0.0066,
3469
- "step": 554
3470
- },
3471
- {
3472
- "epoch": 17.9,
3473
- "learning_rate": 0.0002,
3474
- "loss": 0.0051,
3475
- "step": 555
3476
- },
3477
- {
3478
- "epoch": 17.94,
3479
- "learning_rate": 0.0002,
3480
- "loss": 0.007,
3481
- "step": 556
3482
- },
3483
- {
3484
- "epoch": 17.97,
3485
- "learning_rate": 0.0002,
3486
- "loss": 0.0089,
3487
- "step": 557
3488
- },
3489
- {
3490
- "epoch": 18.0,
3491
- "learning_rate": 0.0002,
3492
- "loss": 0.0112,
3493
- "step": 558
3494
- },
3495
- {
3496
- "epoch": 18.0,
3497
- "eval_loss": 0.0062157167121768,
3498
- "eval_runtime": 79.5293,
3499
- "eval_samples_per_second": 12.209,
3500
- "eval_steps_per_second": 0.39,
3501
- "step": 558
3502
- },
3503
- {
3504
- "epoch": 18.03,
3505
- "learning_rate": 0.0002,
3506
- "loss": 0.0099,
3507
- "step": 559
3508
- },
3509
- {
3510
- "epoch": 18.06,
3511
- "learning_rate": 0.0002,
3512
- "loss": 0.0032,
3513
- "step": 560
3514
- },
3515
- {
3516
- "epoch": 18.1,
3517
- "learning_rate": 0.0002,
3518
- "loss": 0.0114,
3519
- "step": 561
3520
- },
3521
- {
3522
- "epoch": 18.13,
3523
- "learning_rate": 0.0002,
3524
- "loss": 0.0087,
3525
- "step": 562
3526
- },
3527
- {
3528
- "epoch": 18.16,
3529
- "learning_rate": 0.0002,
3530
- "loss": 0.0056,
3531
- "step": 563
3532
- },
3533
- {
3534
- "epoch": 18.19,
3535
- "learning_rate": 0.0002,
3536
- "loss": 0.0095,
3537
- "step": 564
3538
- },
3539
- {
3540
- "epoch": 18.23,
3541
- "learning_rate": 0.0002,
3542
- "loss": 0.0047,
3543
- "step": 565
3544
- },
3545
- {
3546
- "epoch": 18.26,
3547
- "learning_rate": 0.0002,
3548
- "loss": 0.0033,
3549
- "step": 566
3550
- },
3551
- {
3552
- "epoch": 18.29,
3553
- "learning_rate": 0.0002,
3554
- "loss": 0.0031,
3555
- "step": 567
3556
- },
3557
- {
3558
- "epoch": 18.32,
3559
- "learning_rate": 0.0002,
3560
- "loss": 0.0065,
3561
- "step": 568
3562
- },
3563
- {
3564
- "epoch": 18.35,
3565
- "learning_rate": 0.0002,
3566
- "loss": 0.0062,
3567
- "step": 569
3568
- },
3569
- {
3570
- "epoch": 18.39,
3571
- "learning_rate": 0.0002,
3572
- "loss": 0.0082,
3573
- "step": 570
3574
- },
3575
- {
3576
- "epoch": 18.42,
3577
- "learning_rate": 0.0002,
3578
- "loss": 0.0059,
3579
- "step": 571
3580
- },
3581
- {
3582
- "epoch": 18.45,
3583
- "learning_rate": 0.0002,
3584
- "loss": 0.009,
3585
- "step": 572
3586
- },
3587
- {
3588
- "epoch": 18.48,
3589
- "learning_rate": 0.0002,
3590
- "loss": 0.0072,
3591
- "step": 573
3592
- },
3593
- {
3594
- "epoch": 18.52,
3595
- "learning_rate": 0.0002,
3596
- "loss": 0.0049,
3597
- "step": 574
3598
- },
3599
- {
3600
- "epoch": 18.55,
3601
- "learning_rate": 0.0002,
3602
- "loss": 0.009,
3603
- "step": 575
3604
- },
3605
- {
3606
- "epoch": 18.58,
3607
- "learning_rate": 0.0002,
3608
- "loss": 0.0115,
3609
- "step": 576
3610
- },
3611
- {
3612
- "epoch": 18.61,
3613
- "learning_rate": 0.0002,
3614
- "loss": 0.0045,
3615
- "step": 577
3616
- },
3617
- {
3618
- "epoch": 18.65,
3619
- "learning_rate": 0.0002,
3620
- "loss": 0.0054,
3621
- "step": 578
3622
- },
3623
- {
3624
- "epoch": 18.68,
3625
- "learning_rate": 0.0002,
3626
- "loss": 0.0108,
3627
- "step": 579
3628
- },
3629
- {
3630
- "epoch": 18.71,
3631
- "learning_rate": 0.0002,
3632
- "loss": 0.0035,
3633
- "step": 580
3634
- },
3635
- {
3636
- "epoch": 18.74,
3637
- "learning_rate": 0.0002,
3638
- "loss": 0.0087,
3639
- "step": 581
3640
- },
3641
- {
3642
- "epoch": 18.77,
3643
- "learning_rate": 0.0002,
3644
- "loss": 0.0101,
3645
- "step": 582
3646
- },
3647
- {
3648
- "epoch": 18.81,
3649
- "learning_rate": 0.0002,
3650
- "loss": 0.0034,
3651
- "step": 583
3652
- },
3653
- {
3654
- "epoch": 18.84,
3655
- "learning_rate": 0.0002,
3656
- "loss": 0.0119,
3657
- "step": 584
3658
- },
3659
- {
3660
- "epoch": 18.87,
3661
- "learning_rate": 0.0002,
3662
- "loss": 0.005,
3663
- "step": 585
3664
- },
3665
- {
3666
- "epoch": 18.9,
3667
- "learning_rate": 0.0002,
3668
- "loss": 0.0045,
3669
- "step": 586
3670
- },
3671
- {
3672
- "epoch": 18.94,
3673
- "learning_rate": 0.0002,
3674
- "loss": 0.0114,
3675
- "step": 587
3676
- },
3677
- {
3678
- "epoch": 18.97,
3679
- "learning_rate": 0.0002,
3680
- "loss": 0.0117,
3681
- "step": 588
3682
- },
3683
- {
3684
- "epoch": 19.0,
3685
- "learning_rate": 0.0002,
3686
- "loss": 0.0075,
3687
- "step": 589
3688
- },
3689
- {
3690
- "epoch": 19.0,
3691
- "eval_loss": 0.005089292302727699,
3692
- "eval_runtime": 79.5895,
3693
- "eval_samples_per_second": 12.2,
3694
- "eval_steps_per_second": 0.389,
3695
- "step": 589
3696
- },
3697
- {
3698
- "epoch": 19.03,
3699
- "learning_rate": 0.0002,
3700
- "loss": 0.0034,
3701
- "step": 590
3702
- },
3703
- {
3704
- "epoch": 19.06,
3705
- "learning_rate": 0.0002,
3706
- "loss": 0.0027,
3707
- "step": 591
3708
- },
3709
- {
3710
- "epoch": 19.1,
3711
- "learning_rate": 0.0002,
3712
- "loss": 0.0068,
3713
- "step": 592
3714
- },
3715
- {
3716
- "epoch": 19.13,
3717
- "learning_rate": 0.0002,
3718
- "loss": 0.0094,
3719
- "step": 593
3720
- },
3721
- {
3722
- "epoch": 19.16,
3723
- "learning_rate": 0.0002,
3724
- "loss": 0.0023,
3725
- "step": 594
3726
- },
3727
- {
3728
- "epoch": 19.19,
3729
- "learning_rate": 0.0002,
3730
- "loss": 0.004,
3731
- "step": 595
3732
- },
3733
- {
3734
- "epoch": 19.23,
3735
- "learning_rate": 0.0002,
3736
- "loss": 0.0045,
3737
- "step": 596
3738
- },
3739
- {
3740
- "epoch": 19.26,
3741
- "learning_rate": 0.0002,
3742
- "loss": 0.0114,
3743
- "step": 597
3744
- },
3745
- {
3746
- "epoch": 19.29,
3747
- "learning_rate": 0.0002,
3748
- "loss": 0.0119,
3749
- "step": 598
3750
- },
3751
- {
3752
- "epoch": 19.32,
3753
- "learning_rate": 0.0002,
3754
- "loss": 0.003,
3755
- "step": 599
3756
- },
3757
- {
3758
- "epoch": 19.35,
3759
- "learning_rate": 0.0002,
3760
- "loss": 0.0062,
3761
- "step": 600
3762
- },
3763
- {
3764
- "epoch": 19.39,
3765
- "learning_rate": 0.0002,
3766
- "loss": 0.0089,
3767
- "step": 601
3768
- },
3769
- {
3770
- "epoch": 19.42,
3771
- "learning_rate": 0.0002,
3772
- "loss": 0.0075,
3773
- "step": 602
3774
- },
3775
- {
3776
- "epoch": 19.45,
3777
- "learning_rate": 0.0002,
3778
- "loss": 0.0095,
3779
- "step": 603
3780
- },
3781
- {
3782
- "epoch": 19.48,
3783
- "learning_rate": 0.0002,
3784
- "loss": 0.0086,
3785
- "step": 604
3786
- },
3787
- {
3788
- "epoch": 19.52,
3789
- "learning_rate": 0.0002,
3790
- "loss": 0.008,
3791
- "step": 605
3792
- },
3793
- {
3794
- "epoch": 19.55,
3795
- "learning_rate": 0.0002,
3796
- "loss": 0.0051,
3797
- "step": 606
3798
- },
3799
- {
3800
- "epoch": 19.58,
3801
- "learning_rate": 0.0002,
3802
- "loss": 0.0068,
3803
- "step": 607
3804
- },
3805
- {
3806
- "epoch": 19.61,
3807
- "learning_rate": 0.0002,
3808
- "loss": 0.0036,
3809
- "step": 608
3810
- },
3811
- {
3812
- "epoch": 19.65,
3813
- "learning_rate": 0.0002,
3814
- "loss": 0.0079,
3815
- "step": 609
3816
- },
3817
- {
3818
- "epoch": 19.68,
3819
- "learning_rate": 0.0002,
3820
- "loss": 0.0038,
3821
- "step": 610
3822
- },
3823
- {
3824
- "epoch": 19.71,
3825
- "learning_rate": 0.0002,
3826
- "loss": 0.0038,
3827
- "step": 611
3828
- },
3829
- {
3830
- "epoch": 19.74,
3831
- "learning_rate": 0.0002,
3832
- "loss": 0.0062,
3833
- "step": 612
3834
- },
3835
- {
3836
- "epoch": 19.77,
3837
- "learning_rate": 0.0002,
3838
- "loss": 0.0072,
3839
- "step": 613
3840
- },
3841
- {
3842
- "epoch": 19.81,
3843
- "learning_rate": 0.0002,
3844
- "loss": 0.01,
3845
- "step": 614
3846
- },
3847
- {
3848
- "epoch": 19.84,
3849
- "learning_rate": 0.0002,
3850
- "loss": 0.0056,
3851
- "step": 615
3852
- },
3853
- {
3854
- "epoch": 19.87,
3855
- "learning_rate": 0.0002,
3856
- "loss": 0.0063,
3857
- "step": 616
3858
- },
3859
- {
3860
- "epoch": 19.9,
3861
- "learning_rate": 0.0002,
3862
- "loss": 0.0067,
3863
- "step": 617
3864
- },
3865
- {
3866
- "epoch": 19.94,
3867
- "learning_rate": 0.0002,
3868
- "loss": 0.0097,
3869
- "step": 618
3870
- },
3871
- {
3872
- "epoch": 19.97,
3873
- "learning_rate": 0.0002,
3874
- "loss": 0.0061,
3875
- "step": 619
3876
- },
3877
- {
3878
- "epoch": 20.0,
3879
- "learning_rate": 0.0002,
3880
- "loss": 0.0042,
3881
- "step": 620
3882
- },
3883
  {
3884
  "epoch": 20.0,
3885
- "eval_loss": 0.0034851552918553352,
3886
- "eval_runtime": 79.4724,
3887
- "eval_samples_per_second": 12.218,
3888
- "eval_steps_per_second": 0.39,
3889
- "step": 620
3890
  },
3891
  {
3892
  "epoch": 20.0,
3893
- "step": 620,
3894
- "total_flos": 2.339039814587777e+18,
3895
- "train_loss": 0.055754700854548346,
3896
- "train_runtime": 10597.5021,
3897
- "train_samples_per_second": 1.833,
3898
- "train_steps_per_second": 0.059
3899
  }
3900
  ],
3901
  "logging_steps": 1.0,
3902
- "max_steps": 620,
3903
  "num_input_tokens_seen": 0,
3904
  "num_train_epochs": 20,
3905
  "save_steps": 50000,
3906
- "total_flos": 2.339039814587777e+18,
3907
  "train_batch_size": 4,
3908
  "trial_name": null,
3909
  "trial_params": null
 
3
  "best_model_checkpoint": null,
4
  "epoch": 20.0,
5
  "eval_steps": 500,
6
+ "global_step": 300,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.07,
13
  "learning_rate": 0.0,
14
+ "loss": 1.4819,
15
  "step": 1
16
  },
17
  {
18
+ "epoch": 0.13,
19
+ "learning_rate": 6.309297535714573e-05,
20
+ "loss": 1.2098,
21
  "step": 2
22
  },
23
  {
24
+ "epoch": 0.2,
25
+ "learning_rate": 0.0001,
26
+ "loss": 1.1506,
27
  "step": 3
28
  },
29
  {
30
+ "epoch": 0.27,
31
+ "learning_rate": 0.00012618595071429146,
32
+ "loss": 1.0983,
33
  "step": 4
34
  },
35
  {
36
+ "epoch": 0.33,
37
+ "learning_rate": 0.0001464973520717927,
38
+ "loss": 1.0829,
39
  "step": 5
40
  },
41
  {
42
+ "epoch": 0.4,
43
+ "learning_rate": 0.00016309297535714573,
44
+ "loss": 1.0365,
45
  "step": 6
46
  },
47
  {
48
+ "epoch": 0.47,
49
+ "learning_rate": 0.00017712437491614223,
50
+ "loss": 1.0037,
51
  "step": 7
52
  },
53
  {
54
+ "epoch": 0.53,
55
+ "learning_rate": 0.0001892789260714372,
56
+ "loss": 0.8911,
57
  "step": 8
58
  },
59
  {
60
+ "epoch": 0.6,
61
+ "learning_rate": 0.0002,
62
+ "loss": 0.8777,
63
  "step": 9
64
  },
65
  {
66
+ "epoch": 0.67,
67
+ "learning_rate": 0.0002,
68
+ "loss": 0.8581,
69
  "step": 10
70
  },
71
  {
72
+ "epoch": 0.73,
73
+ "learning_rate": 0.0002,
74
+ "loss": 0.8421,
75
  "step": 11
76
  },
77
  {
78
+ "epoch": 0.8,
79
+ "learning_rate": 0.0002,
80
+ "loss": 0.7595,
81
  "step": 12
82
  },
83
  {
84
+ "epoch": 0.87,
85
+ "learning_rate": 0.0002,
86
+ "loss": 0.6998,
87
  "step": 13
88
  },
89
  {
90
+ "epoch": 0.93,
91
+ "learning_rate": 0.0002,
92
+ "loss": 0.5836,
93
  "step": 14
94
  },
95
  {
96
+ "epoch": 1.0,
97
+ "learning_rate": 0.0002,
98
+ "loss": 0.4911,
99
+ "step": 15
100
+ },
101
+ {
102
+ "epoch": 1.0,
103
+ "eval_loss": 0.40942174196243286,
104
+ "eval_runtime": 40.6357,
105
+ "eval_samples_per_second": 11.123,
106
+ "eval_steps_per_second": 0.369,
107
  "step": 15
108
  },
109
  {
110
+ "epoch": 1.07,
111
+ "learning_rate": 0.0002,
112
+ "loss": 0.385,
113
  "step": 16
114
  },
115
  {
116
+ "epoch": 1.13,
117
+ "learning_rate": 0.0002,
118
+ "loss": 0.3817,
119
  "step": 17
120
  },
121
  {
122
+ "epoch": 1.2,
123
+ "learning_rate": 0.0002,
124
+ "loss": 0.2833,
125
  "step": 18
126
  },
127
  {
128
+ "epoch": 1.27,
129
  "learning_rate": 0.0002,
130
+ "loss": 0.3438,
131
  "step": 19
132
  },
133
  {
134
+ "epoch": 1.33,
135
  "learning_rate": 0.0002,
136
+ "loss": 0.2252,
137
  "step": 20
138
  },
139
  {
140
+ "epoch": 1.4,
141
  "learning_rate": 0.0002,
142
+ "loss": 0.2014,
143
  "step": 21
144
  },
145
  {
146
+ "epoch": 1.47,
147
  "learning_rate": 0.0002,
148
+ "loss": 0.1665,
149
  "step": 22
150
  },
151
  {
152
+ "epoch": 1.53,
153
  "learning_rate": 0.0002,
154
+ "loss": 0.1682,
155
  "step": 23
156
  },
157
  {
158
+ "epoch": 1.6,
159
  "learning_rate": 0.0002,
160
+ "loss": 0.1654,
161
  "step": 24
162
  },
163
  {
164
+ "epoch": 1.67,
165
  "learning_rate": 0.0002,
166
+ "loss": 0.1103,
167
  "step": 25
168
  },
169
  {
170
+ "epoch": 1.73,
171
  "learning_rate": 0.0002,
172
+ "loss": 0.2226,
173
  "step": 26
174
  },
175
  {
176
+ "epoch": 1.8,
177
  "learning_rate": 0.0002,
178
+ "loss": 0.1823,
179
  "step": 27
180
  },
181
  {
182
+ "epoch": 1.87,
183
  "learning_rate": 0.0002,
184
+ "loss": 0.1459,
185
  "step": 28
186
  },
187
  {
188
+ "epoch": 1.93,
189
  "learning_rate": 0.0002,
190
+ "loss": 0.1565,
191
  "step": 29
192
  },
193
  {
194
+ "epoch": 2.0,
195
  "learning_rate": 0.0002,
196
+ "loss": 0.0683,
197
  "step": 30
198
  },
199
  {
200
+ "epoch": 2.0,
201
+ "eval_loss": 0.06821632385253906,
202
+ "eval_runtime": 40.6781,
203
+ "eval_samples_per_second": 11.112,
204
+ "eval_steps_per_second": 0.369,
205
+ "step": 30
206
  },
207
  {
208
+ "epoch": 2.07,
209
+ "learning_rate": 0.0002,
210
+ "loss": 0.0823,
 
 
211
  "step": 31
212
  },
213
  {
214
+ "epoch": 2.13,
215
  "learning_rate": 0.0002,
216
+ "loss": 0.0323,
217
  "step": 32
218
  },
219
  {
220
+ "epoch": 2.2,
221
  "learning_rate": 0.0002,
222
+ "loss": 0.061,
223
  "step": 33
224
  },
225
  {
226
+ "epoch": 2.27,
227
  "learning_rate": 0.0002,
228
+ "loss": 0.0226,
229
  "step": 34
230
  },
231
  {
232
+ "epoch": 2.33,
233
  "learning_rate": 0.0002,
234
+ "loss": 0.0332,
235
  "step": 35
236
  },
237
  {
238
+ "epoch": 2.4,
239
  "learning_rate": 0.0002,
240
+ "loss": 0.02,
241
  "step": 36
242
  },
243
  {
244
+ "epoch": 2.47,
245
  "learning_rate": 0.0002,
246
+ "loss": 0.0625,
247
  "step": 37
248
  },
249
  {
250
+ "epoch": 2.53,
251
  "learning_rate": 0.0002,
252
+ "loss": 0.0611,
253
  "step": 38
254
  },
255
  {
256
+ "epoch": 2.6,
257
  "learning_rate": 0.0002,
258
+ "loss": 0.0322,
259
  "step": 39
260
  },
261
  {
262
+ "epoch": 2.67,
263
  "learning_rate": 0.0002,
264
+ "loss": 0.0483,
265
  "step": 40
266
  },
267
  {
268
+ "epoch": 2.73,
269
  "learning_rate": 0.0002,
270
+ "loss": 0.0182,
271
  "step": 41
272
  },
273
  {
274
+ "epoch": 2.8,
275
  "learning_rate": 0.0002,
276
+ "loss": 0.0249,
277
  "step": 42
278
  },
279
  {
280
+ "epoch": 2.87,
281
  "learning_rate": 0.0002,
282
+ "loss": 0.0313,
283
  "step": 43
284
  },
285
  {
286
+ "epoch": 2.93,
287
  "learning_rate": 0.0002,
288
+ "loss": 0.0323,
289
  "step": 44
290
  },
291
  {
292
+ "epoch": 3.0,
293
  "learning_rate": 0.0002,
294
+ "loss": 0.0296,
295
+ "step": 45
296
+ },
297
+ {
298
+ "epoch": 3.0,
299
+ "eval_loss": 0.015485187992453575,
300
+ "eval_runtime": 40.7892,
301
+ "eval_samples_per_second": 11.081,
302
+ "eval_steps_per_second": 0.368,
303
  "step": 45
304
  },
305
  {
306
+ "epoch": 3.07,
307
  "learning_rate": 0.0002,
308
+ "loss": 0.0104,
309
  "step": 46
310
  },
311
  {
312
+ "epoch": 3.13,
313
  "learning_rate": 0.0002,
314
+ "loss": 0.0125,
315
  "step": 47
316
  },
317
  {
318
+ "epoch": 3.2,
319
  "learning_rate": 0.0002,
320
+ "loss": 0.0125,
321
  "step": 48
322
  },
323
  {
324
+ "epoch": 3.27,
325
  "learning_rate": 0.0002,
326
+ "loss": 0.0137,
327
  "step": 49
328
  },
329
  {
330
+ "epoch": 3.33,
331
  "learning_rate": 0.0002,
332
+ "loss": 0.0071,
333
  "step": 50
334
  },
335
  {
336
+ "epoch": 3.4,
337
  "learning_rate": 0.0002,
338
+ "loss": 0.0085,
339
  "step": 51
340
  },
341
  {
342
+ "epoch": 3.47,
343
  "learning_rate": 0.0002,
344
+ "loss": 0.0184,
345
  "step": 52
346
  },
347
  {
348
+ "epoch": 3.53,
349
  "learning_rate": 0.0002,
350
+ "loss": 0.0117,
351
  "step": 53
352
  },
353
  {
354
+ "epoch": 3.6,
355
  "learning_rate": 0.0002,
356
+ "loss": 0.0078,
357
  "step": 54
358
  },
359
  {
360
+ "epoch": 3.67,
361
  "learning_rate": 0.0002,
362
+ "loss": 0.0067,
363
  "step": 55
364
  },
365
  {
366
+ "epoch": 3.73,
367
  "learning_rate": 0.0002,
368
+ "loss": 0.0126,
369
  "step": 56
370
  },
371
  {
372
+ "epoch": 3.8,
373
  "learning_rate": 0.0002,
374
+ "loss": 0.0111,
375
  "step": 57
376
  },
377
  {
378
+ "epoch": 3.87,
379
  "learning_rate": 0.0002,
380
+ "loss": 0.0105,
381
  "step": 58
382
  },
383
  {
384
+ "epoch": 3.93,
385
  "learning_rate": 0.0002,
386
+ "loss": 0.0058,
387
  "step": 59
388
  },
389
  {
390
+ "epoch": 4.0,
391
  "learning_rate": 0.0002,
392
+ "loss": 0.0097,
393
  "step": 60
394
  },
395
  {
396
+ "epoch": 4.0,
397
+ "eval_loss": 0.006230538245290518,
398
+ "eval_runtime": 40.8202,
399
+ "eval_samples_per_second": 11.073,
400
+ "eval_steps_per_second": 0.367,
401
+ "step": 60
402
  },
403
  {
404
+ "epoch": 4.07,
405
  "learning_rate": 0.0002,
406
+ "loss": 0.0043,
407
+ "step": 61
408
  },
409
  {
410
+ "epoch": 4.13,
411
+ "learning_rate": 0.0002,
412
+ "loss": 0.0068,
 
 
413
  "step": 62
414
  },
415
  {
416
+ "epoch": 4.2,
417
  "learning_rate": 0.0002,
418
+ "loss": 0.004,
419
  "step": 63
420
  },
421
  {
422
+ "epoch": 4.27,
423
  "learning_rate": 0.0002,
424
+ "loss": 0.006,
425
  "step": 64
426
  },
427
  {
428
+ "epoch": 4.33,
429
  "learning_rate": 0.0002,
430
+ "loss": 0.0098,
431
  "step": 65
432
  },
433
  {
434
+ "epoch": 4.4,
435
  "learning_rate": 0.0002,
436
+ "loss": 0.0095,
437
  "step": 66
438
  },
439
  {
440
+ "epoch": 4.47,
441
  "learning_rate": 0.0002,
442
+ "loss": 0.0054,
443
  "step": 67
444
  },
445
  {
446
+ "epoch": 4.53,
447
  "learning_rate": 0.0002,
448
+ "loss": 0.004,
449
  "step": 68
450
  },
451
  {
452
+ "epoch": 4.6,
453
  "learning_rate": 0.0002,
454
+ "loss": 0.0057,
455
  "step": 69
456
  },
457
  {
458
+ "epoch": 4.67,
459
  "learning_rate": 0.0002,
460
+ "loss": 0.005,
461
  "step": 70
462
  },
463
  {
464
+ "epoch": 4.73,
465
  "learning_rate": 0.0002,
466
+ "loss": 0.0044,
467
  "step": 71
468
  },
469
  {
470
+ "epoch": 4.8,
471
  "learning_rate": 0.0002,
472
+ "loss": 0.0034,
473
  "step": 72
474
  },
475
  {
476
+ "epoch": 4.87,
477
  "learning_rate": 0.0002,
478
+ "loss": 0.0029,
479
  "step": 73
480
  },
481
  {
482
+ "epoch": 4.93,
483
  "learning_rate": 0.0002,
484
+ "loss": 0.0065,
485
  "step": 74
486
  },
487
  {
488
+ "epoch": 5.0,
489
  "learning_rate": 0.0002,
490
+ "loss": 0.0018,
491
+ "step": 75
492
+ },
493
+ {
494
+ "epoch": 5.0,
495
+ "eval_loss": 0.002322128973901272,
496
+ "eval_runtime": 40.7591,
497
+ "eval_samples_per_second": 11.09,
498
+ "eval_steps_per_second": 0.368,
499
  "step": 75
500
  },
501
  {
502
+ "epoch": 5.07,
503
  "learning_rate": 0.0002,
504
+ "loss": 0.0009,
505
  "step": 76
506
  },
507
  {
508
+ "epoch": 5.13,
509
  "learning_rate": 0.0002,
510
+ "loss": 0.0022,
511
  "step": 77
512
  },
513
  {
514
+ "epoch": 5.2,
515
  "learning_rate": 0.0002,
516
+ "loss": 0.0013,
517
  "step": 78
518
  },
519
  {
520
+ "epoch": 5.27,
521
  "learning_rate": 0.0002,
522
+ "loss": 0.0017,
523
  "step": 79
524
  },
525
  {
526
+ "epoch": 5.33,
527
  "learning_rate": 0.0002,
528
+ "loss": 0.0039,
529
  "step": 80
530
  },
531
  {
532
+ "epoch": 5.4,
533
  "learning_rate": 0.0002,
534
+ "loss": 0.0038,
535
  "step": 81
536
  },
537
  {
538
+ "epoch": 5.47,
539
  "learning_rate": 0.0002,
540
+ "loss": 0.0031,
541
  "step": 82
542
  },
543
  {
544
+ "epoch": 5.53,
545
  "learning_rate": 0.0002,
546
+ "loss": 0.0026,
547
  "step": 83
548
  },
549
  {
550
+ "epoch": 5.6,
551
  "learning_rate": 0.0002,
552
+ "loss": 0.0011,
553
  "step": 84
554
  },
555
  {
556
+ "epoch": 5.67,
557
  "learning_rate": 0.0002,
558
+ "loss": 0.0009,
559
  "step": 85
560
  },
561
  {
562
+ "epoch": 5.73,
563
  "learning_rate": 0.0002,
564
+ "loss": 0.0043,
565
  "step": 86
566
  },
567
  {
568
+ "epoch": 5.8,
569
  "learning_rate": 0.0002,
570
+ "loss": 0.0023,
571
  "step": 87
572
  },
573
  {
574
+ "epoch": 5.87,
575
  "learning_rate": 0.0002,
576
+ "loss": 0.0021,
577
  "step": 88
578
  },
579
  {
580
+ "epoch": 5.93,
581
  "learning_rate": 0.0002,
582
+ "loss": 0.0018,
583
  "step": 89
584
  },
585
  {
586
+ "epoch": 6.0,
587
  "learning_rate": 0.0002,
588
+ "loss": 0.0011,
589
+ "step": 90
590
+ },
591
+ {
592
+ "epoch": 6.0,
593
+ "eval_loss": 0.0016255057416856289,
594
+ "eval_runtime": 40.8742,
595
+ "eval_samples_per_second": 11.058,
596
+ "eval_steps_per_second": 0.367,
597
  "step": 90
598
  },
599
  {
600
+ "epoch": 6.07,
601
  "learning_rate": 0.0002,
602
+ "loss": 0.0006,
603
  "step": 91
604
  },
605
  {
606
+ "epoch": 6.13,
607
  "learning_rate": 0.0002,
608
+ "loss": 0.001,
609
  "step": 92
610
  },
611
  {
612
+ "epoch": 6.2,
613
  "learning_rate": 0.0002,
614
+ "loss": 0.002,
 
 
 
 
 
 
 
 
615
  "step": 93
616
  },
617
  {
618
+ "epoch": 6.27,
619
  "learning_rate": 0.0002,
620
+ "loss": 0.0009,
621
  "step": 94
622
  },
623
  {
624
+ "epoch": 6.33,
625
  "learning_rate": 0.0002,
626
+ "loss": 0.0007,
627
  "step": 95
628
  },
629
  {
630
+ "epoch": 6.4,
631
  "learning_rate": 0.0002,
632
+ "loss": 0.0022,
633
  "step": 96
634
  },
635
  {
636
+ "epoch": 6.47,
637
  "learning_rate": 0.0002,
638
+ "loss": 0.0007,
639
  "step": 97
640
  },
641
  {
642
+ "epoch": 6.53,
643
  "learning_rate": 0.0002,
644
+ "loss": 0.0019,
645
  "step": 98
646
  },
647
  {
648
+ "epoch": 6.6,
649
  "learning_rate": 0.0002,
650
+ "loss": 0.0004,
651
  "step": 99
652
  },
653
  {
654
+ "epoch": 6.67,
655
  "learning_rate": 0.0002,
656
+ "loss": 0.0018,
657
  "step": 100
658
  },
659
  {
660
+ "epoch": 6.73,
661
  "learning_rate": 0.0002,
662
+ "loss": 0.0005,
663
  "step": 101
664
  },
665
  {
666
+ "epoch": 6.8,
667
  "learning_rate": 0.0002,
668
+ "loss": 0.0009,
669
  "step": 102
670
  },
671
  {
672
+ "epoch": 6.87,
673
  "learning_rate": 0.0002,
674
+ "loss": 0.0006,
675
  "step": 103
676
  },
677
  {
678
+ "epoch": 6.93,
679
  "learning_rate": 0.0002,
680
+ "loss": 0.0026,
681
  "step": 104
682
  },
683
  {
684
+ "epoch": 7.0,
685
  "learning_rate": 0.0002,
686
+ "loss": 0.0007,
687
+ "step": 105
688
+ },
689
+ {
690
+ "epoch": 7.0,
691
+ "eval_loss": 0.0005919847753830254,
692
+ "eval_runtime": 40.7613,
693
+ "eval_samples_per_second": 11.089,
694
+ "eval_steps_per_second": 0.368,
695
  "step": 105
696
  },
697
  {
698
+ "epoch": 7.07,
699
  "learning_rate": 0.0002,
700
+ "loss": 0.0004,
701
  "step": 106
702
  },
703
  {
704
+ "epoch": 7.13,
705
  "learning_rate": 0.0002,
706
+ "loss": 0.0005,
707
  "step": 107
708
  },
709
  {
710
+ "epoch": 7.2,
711
  "learning_rate": 0.0002,
712
+ "loss": 0.0007,
713
  "step": 108
714
  },
715
  {
716
+ "epoch": 7.27,
717
  "learning_rate": 0.0002,
718
+ "loss": 0.0003,
719
  "step": 109
720
  },
721
  {
722
+ "epoch": 7.33,
723
  "learning_rate": 0.0002,
724
+ "loss": 0.0005,
725
  "step": 110
726
  },
727
  {
728
+ "epoch": 7.4,
729
  "learning_rate": 0.0002,
730
+ "loss": 0.0002,
731
  "step": 111
732
  },
733
  {
734
+ "epoch": 7.47,
735
  "learning_rate": 0.0002,
736
+ "loss": 0.0006,
737
  "step": 112
738
  },
739
  {
740
+ "epoch": 7.53,
741
  "learning_rate": 0.0002,
742
+ "loss": 0.0016,
743
  "step": 113
744
  },
745
  {
746
+ "epoch": 7.6,
747
  "learning_rate": 0.0002,
748
+ "loss": 0.0003,
749
  "step": 114
750
  },
751
  {
752
+ "epoch": 7.67,
753
  "learning_rate": 0.0002,
754
+ "loss": 0.0006,
755
  "step": 115
756
  },
757
  {
758
+ "epoch": 7.73,
759
  "learning_rate": 0.0002,
760
+ "loss": 0.0024,
761
  "step": 116
762
  },
763
  {
764
+ "epoch": 7.8,
765
  "learning_rate": 0.0002,
766
+ "loss": 0.0008,
767
  "step": 117
768
  },
769
  {
770
+ "epoch": 7.87,
771
  "learning_rate": 0.0002,
772
+ "loss": 0.0008,
773
  "step": 118
774
  },
775
  {
776
+ "epoch": 7.93,
777
  "learning_rate": 0.0002,
778
+ "loss": 0.0004,
779
  "step": 119
780
  },
781
  {
782
+ "epoch": 8.0,
783
  "learning_rate": 0.0002,
784
+ "loss": 0.0003,
785
+ "step": 120
786
+ },
787
+ {
788
+ "epoch": 8.0,
789
+ "eval_loss": 0.0006732757901772857,
790
+ "eval_runtime": 40.828,
791
+ "eval_samples_per_second": 11.071,
792
+ "eval_steps_per_second": 0.367,
793
  "step": 120
794
  },
795
  {
796
+ "epoch": 8.07,
797
  "learning_rate": 0.0002,
798
+ "loss": 0.0001,
799
  "step": 121
800
  },
801
  {
802
+ "epoch": 8.13,
803
  "learning_rate": 0.0002,
804
+ "loss": 0.0002,
805
  "step": 122
806
  },
807
  {
808
+ "epoch": 8.2,
809
  "learning_rate": 0.0002,
810
+ "loss": 0.0008,
811
  "step": 123
812
  },
813
  {
814
+ "epoch": 8.27,
815
  "learning_rate": 0.0002,
816
+ "loss": 0.0002,
 
 
 
 
 
 
 
 
817
  "step": 124
818
  },
819
  {
820
+ "epoch": 8.33,
821
  "learning_rate": 0.0002,
822
+ "loss": 0.0004,
823
  "step": 125
824
  },
825
  {
826
+ "epoch": 8.4,
827
  "learning_rate": 0.0002,
828
+ "loss": 0.0005,
829
  "step": 126
830
  },
831
  {
832
+ "epoch": 8.47,
833
  "learning_rate": 0.0002,
834
+ "loss": 0.0004,
835
  "step": 127
836
  },
837
  {
838
+ "epoch": 8.53,
839
  "learning_rate": 0.0002,
840
+ "loss": 0.0014,
841
  "step": 128
842
  },
843
  {
844
+ "epoch": 8.6,
845
  "learning_rate": 0.0002,
846
+ "loss": 0.0022,
847
  "step": 129
848
  },
849
  {
850
+ "epoch": 8.67,
851
  "learning_rate": 0.0002,
852
+ "loss": 0.0007,
853
  "step": 130
854
  },
855
  {
856
+ "epoch": 8.73,
857
  "learning_rate": 0.0002,
858
+ "loss": 0.0003,
859
  "step": 131
860
  },
861
  {
862
+ "epoch": 8.8,
863
  "learning_rate": 0.0002,
864
+ "loss": 0.0008,
865
  "step": 132
866
  },
867
  {
868
+ "epoch": 8.87,
869
  "learning_rate": 0.0002,
870
+ "loss": 0.0006,
871
  "step": 133
872
  },
873
  {
874
+ "epoch": 8.93,
875
  "learning_rate": 0.0002,
876
+ "loss": 0.0028,
877
  "step": 134
878
  },
879
  {
880
+ "epoch": 9.0,
881
  "learning_rate": 0.0002,
882
+ "loss": 0.0003,
883
+ "step": 135
884
+ },
885
+ {
886
+ "epoch": 9.0,
887
+ "eval_loss": 0.0004627737507689744,
888
+ "eval_runtime": 40.7716,
889
+ "eval_samples_per_second": 11.086,
890
+ "eval_steps_per_second": 0.368,
891
  "step": 135
892
  },
893
  {
894
+ "epoch": 9.07,
895
  "learning_rate": 0.0002,
896
+ "loss": 0.0002,
897
  "step": 136
898
  },
899
  {
900
+ "epoch": 9.13,
901
  "learning_rate": 0.0002,
902
+ "loss": 0.0002,
903
  "step": 137
904
  },
905
  {
906
+ "epoch": 9.2,
907
  "learning_rate": 0.0002,
908
+ "loss": 0.0027,
909
  "step": 138
910
  },
911
  {
912
+ "epoch": 9.27,
913
  "learning_rate": 0.0002,
914
+ "loss": 0.0004,
915
  "step": 139
916
  },
917
  {
918
+ "epoch": 9.33,
919
  "learning_rate": 0.0002,
920
+ "loss": 0.0014,
921
  "step": 140
922
  },
923
  {
924
+ "epoch": 9.4,
925
  "learning_rate": 0.0002,
926
+ "loss": 0.0003,
927
  "step": 141
928
  },
929
  {
930
+ "epoch": 9.47,
931
  "learning_rate": 0.0002,
932
+ "loss": 0.0002,
933
  "step": 142
934
  },
935
  {
936
+ "epoch": 9.53,
937
  "learning_rate": 0.0002,
938
+ "loss": 0.0002,
939
  "step": 143
940
  },
941
  {
942
+ "epoch": 9.6,
943
  "learning_rate": 0.0002,
944
+ "loss": 0.0012,
945
  "step": 144
946
  },
947
  {
948
+ "epoch": 9.67,
949
  "learning_rate": 0.0002,
950
+ "loss": 0.0004,
951
  "step": 145
952
  },
953
  {
954
+ "epoch": 9.73,
955
  "learning_rate": 0.0002,
956
+ "loss": 0.0021,
957
  "step": 146
958
  },
959
  {
960
+ "epoch": 9.8,
961
  "learning_rate": 0.0002,
962
+ "loss": 0.0007,
963
  "step": 147
964
  },
965
  {
966
+ "epoch": 9.87,
967
  "learning_rate": 0.0002,
968
+ "loss": 0.0003,
969
  "step": 148
970
  },
971
  {
972
+ "epoch": 9.93,
973
  "learning_rate": 0.0002,
974
+ "loss": 0.0012,
975
  "step": 149
976
  },
977
  {
978
+ "epoch": 10.0,
979
  "learning_rate": 0.0002,
980
+ "loss": 0.0021,
981
  "step": 150
982
  },
983
  {
984
+ "epoch": 10.0,
985
+ "eval_loss": 0.0012871649814769626,
986
+ "eval_runtime": 40.7493,
987
+ "eval_samples_per_second": 11.092,
988
+ "eval_steps_per_second": 0.368,
989
+ "step": 150
990
+ },
991
+ {
992
+ "epoch": 10.07,
993
  "learning_rate": 0.0002,
994
+ "loss": 0.0005,
995
  "step": 151
996
  },
997
  {
998
+ "epoch": 10.13,
999
  "learning_rate": 0.0002,
1000
+ "loss": 0.0007,
1001
  "step": 152
1002
  },
1003
  {
1004
+ "epoch": 10.2,
1005
  "learning_rate": 0.0002,
1006
+ "loss": 0.001,
1007
  "step": 153
1008
  },
1009
  {
1010
+ "epoch": 10.27,
1011
  "learning_rate": 0.0002,
1012
+ "loss": 0.0042,
1013
  "step": 154
1014
  },
1015
  {
1016
+ "epoch": 10.33,
1017
  "learning_rate": 0.0002,
1018
+ "loss": 0.0039,
 
 
 
 
 
 
 
 
1019
  "step": 155
1020
  },
1021
  {
1022
+ "epoch": 10.4,
1023
  "learning_rate": 0.0002,
1024
+ "loss": 0.0003,
1025
  "step": 156
1026
  },
1027
  {
1028
+ "epoch": 10.47,
1029
  "learning_rate": 0.0002,
1030
+ "loss": 0.0016,
1031
  "step": 157
1032
  },
1033
  {
1034
+ "epoch": 10.53,
1035
  "learning_rate": 0.0002,
1036
+ "loss": 0.0012,
1037
  "step": 158
1038
  },
1039
  {
1040
+ "epoch": 10.6,
1041
  "learning_rate": 0.0002,
1042
+ "loss": 0.0007,
1043
  "step": 159
1044
  },
1045
  {
1046
+ "epoch": 10.67,
1047
  "learning_rate": 0.0002,
1048
+ "loss": 0.0014,
1049
  "step": 160
1050
  },
1051
  {
1052
+ "epoch": 10.73,
1053
  "learning_rate": 0.0002,
1054
+ "loss": 0.0004,
1055
  "step": 161
1056
  },
1057
  {
1058
+ "epoch": 10.8,
1059
  "learning_rate": 0.0002,
1060
+ "loss": 0.007,
1061
  "step": 162
1062
  },
1063
  {
1064
+ "epoch": 10.87,
1065
  "learning_rate": 0.0002,
1066
+ "loss": 0.0007,
1067
  "step": 163
1068
  },
1069
  {
1070
+ "epoch": 10.93,
1071
  "learning_rate": 0.0002,
1072
+ "loss": 0.0036,
1073
  "step": 164
1074
  },
1075
  {
1076
+ "epoch": 11.0,
1077
  "learning_rate": 0.0002,
1078
+ "loss": 0.001,
1079
+ "step": 165
1080
+ },
1081
+ {
1082
+ "epoch": 11.0,
1083
+ "eval_loss": 0.0014927532756701112,
1084
+ "eval_runtime": 40.8298,
1085
+ "eval_samples_per_second": 11.07,
1086
+ "eval_steps_per_second": 0.367,
1087
  "step": 165
1088
  },
1089
  {
1090
+ "epoch": 11.07,
1091
  "learning_rate": 0.0002,
1092
+ "loss": 0.0044,
1093
  "step": 166
1094
  },
1095
  {
1096
+ "epoch": 11.13,
1097
  "learning_rate": 0.0002,
1098
+ "loss": 0.0026,
1099
  "step": 167
1100
  },
1101
  {
1102
+ "epoch": 11.2,
1103
  "learning_rate": 0.0002,
1104
+ "loss": 0.0006,
1105
  "step": 168
1106
  },
1107
  {
1108
+ "epoch": 11.27,
1109
  "learning_rate": 0.0002,
1110
+ "loss": 0.0023,
1111
  "step": 169
1112
  },
1113
  {
1114
+ "epoch": 11.33,
1115
  "learning_rate": 0.0002,
1116
+ "loss": 0.0043,
1117
  "step": 170
1118
  },
1119
  {
1120
+ "epoch": 11.4,
1121
  "learning_rate": 0.0002,
1122
+ "loss": 0.0003,
1123
  "step": 171
1124
  },
1125
  {
1126
+ "epoch": 11.47,
1127
  "learning_rate": 0.0002,
1128
+ "loss": 0.0008,
1129
  "step": 172
1130
  },
1131
  {
1132
+ "epoch": 11.53,
1133
  "learning_rate": 0.0002,
1134
+ "loss": 0.002,
1135
  "step": 173
1136
  },
1137
  {
1138
+ "epoch": 11.6,
1139
  "learning_rate": 0.0002,
1140
+ "loss": 0.0018,
1141
  "step": 174
1142
  },
1143
  {
1144
+ "epoch": 11.67,
1145
  "learning_rate": 0.0002,
1146
+ "loss": 0.001,
1147
  "step": 175
1148
  },
1149
  {
1150
+ "epoch": 11.73,
1151
  "learning_rate": 0.0002,
1152
+ "loss": 0.001,
1153
  "step": 176
1154
  },
1155
  {
1156
+ "epoch": 11.8,
1157
  "learning_rate": 0.0002,
1158
+ "loss": 0.0006,
1159
  "step": 177
1160
  },
1161
  {
1162
+ "epoch": 11.87,
1163
  "learning_rate": 0.0002,
1164
+ "loss": 0.0004,
1165
  "step": 178
1166
  },
1167
  {
1168
+ "epoch": 11.93,
1169
  "learning_rate": 0.0002,
1170
+ "loss": 0.0032,
1171
  "step": 179
1172
  },
1173
  {
1174
+ "epoch": 12.0,
1175
  "learning_rate": 0.0002,
1176
+ "loss": 0.0066,
1177
+ "step": 180
1178
+ },
1179
+ {
1180
+ "epoch": 12.0,
1181
+ "eval_loss": 0.000720755138900131,
1182
+ "eval_runtime": 40.7732,
1183
+ "eval_samples_per_second": 11.086,
1184
+ "eval_steps_per_second": 0.368,
1185
  "step": 180
1186
  },
1187
  {
1188
+ "epoch": 12.07,
1189
  "learning_rate": 0.0002,
1190
+ "loss": 0.0004,
1191
  "step": 181
1192
  },
1193
  {
1194
+ "epoch": 12.13,
1195
  "learning_rate": 0.0002,
1196
+ "loss": 0.0015,
1197
  "step": 182
1198
  },
1199
  {
1200
+ "epoch": 12.2,
1201
  "learning_rate": 0.0002,
1202
+ "loss": 0.0007,
1203
  "step": 183
1204
  },
1205
  {
1206
+ "epoch": 12.27,
1207
  "learning_rate": 0.0002,
1208
+ "loss": 0.0022,
1209
  "step": 184
1210
  },
1211
  {
1212
+ "epoch": 12.33,
1213
  "learning_rate": 0.0002,
1214
+ "loss": 0.0008,
1215
  "step": 185
1216
  },
1217
  {
1218
+ "epoch": 12.4,
1219
  "learning_rate": 0.0002,
1220
+ "loss": 0.0038,
 
 
 
 
 
 
 
 
1221
  "step": 186
1222
  },
1223
  {
1224
+ "epoch": 12.47,
1225
  "learning_rate": 0.0002,
1226
+ "loss": 0.0005,
1227
  "step": 187
1228
  },
1229
  {
1230
+ "epoch": 12.53,
1231
  "learning_rate": 0.0002,
1232
+ "loss": 0.0015,
1233
  "step": 188
1234
  },
1235
  {
1236
+ "epoch": 12.6,
1237
  "learning_rate": 0.0002,
1238
+ "loss": 0.0047,
1239
  "step": 189
1240
  },
1241
  {
1242
+ "epoch": 12.67,
1243
  "learning_rate": 0.0002,
1244
+ "loss": 0.0002,
1245
  "step": 190
1246
  },
1247
  {
1248
+ "epoch": 12.73,
1249
  "learning_rate": 0.0002,
1250
+ "loss": 0.0025,
1251
  "step": 191
1252
  },
1253
  {
1254
+ "epoch": 12.8,
1255
  "learning_rate": 0.0002,
1256
+ "loss": 0.0015,
1257
  "step": 192
1258
  },
1259
  {
1260
+ "epoch": 12.87,
1261
  "learning_rate": 0.0002,
1262
+ "loss": 0.0005,
1263
  "step": 193
1264
  },
1265
  {
1266
+ "epoch": 12.93,
1267
  "learning_rate": 0.0002,
1268
+ "loss": 0.0028,
1269
  "step": 194
1270
  },
1271
  {
1272
+ "epoch": 13.0,
1273
  "learning_rate": 0.0002,
1274
+ "loss": 0.0003,
1275
+ "step": 195
1276
+ },
1277
+ {
1278
+ "epoch": 13.0,
1279
+ "eval_loss": 0.0018433822551742196,
1280
+ "eval_runtime": 40.7961,
1281
+ "eval_samples_per_second": 11.079,
1282
+ "eval_steps_per_second": 0.368,
1283
  "step": 195
1284
  },
1285
  {
1286
+ "epoch": 13.07,
1287
  "learning_rate": 0.0002,
1288
+ "loss": 0.0004,
1289
  "step": 196
1290
  },
1291
  {
1292
+ "epoch": 13.13,
1293
  "learning_rate": 0.0002,
1294
+ "loss": 0.004,
1295
  "step": 197
1296
  },
1297
  {
1298
+ "epoch": 13.2,
1299
  "learning_rate": 0.0002,
1300
+ "loss": 0.0014,
1301
  "step": 198
1302
  },
1303
  {
1304
+ "epoch": 13.27,
1305
  "learning_rate": 0.0002,
1306
+ "loss": 0.0007,
1307
  "step": 199
1308
  },
1309
  {
1310
+ "epoch": 13.33,
1311
  "learning_rate": 0.0002,
1312
+ "loss": 0.0055,
1313
  "step": 200
1314
  },
1315
  {
1316
+ "epoch": 13.4,
1317
  "learning_rate": 0.0002,
1318
+ "loss": 0.0024,
1319
  "step": 201
1320
  },
1321
  {
1322
+ "epoch": 13.47,
1323
  "learning_rate": 0.0002,
1324
+ "loss": 0.0012,
1325
  "step": 202
1326
  },
1327
  {
1328
+ "epoch": 13.53,
1329
  "learning_rate": 0.0002,
1330
+ "loss": 0.0012,
1331
  "step": 203
1332
  },
1333
  {
1334
+ "epoch": 13.6,
1335
  "learning_rate": 0.0002,
1336
+ "loss": 0.0007,
1337
  "step": 204
1338
  },
1339
  {
1340
+ "epoch": 13.67,
1341
  "learning_rate": 0.0002,
1342
+ "loss": 0.0028,
1343
  "step": 205
1344
  },
1345
  {
1346
+ "epoch": 13.73,
1347
  "learning_rate": 0.0002,
1348
+ "loss": 0.0025,
1349
  "step": 206
1350
  },
1351
  {
1352
+ "epoch": 13.8,
1353
  "learning_rate": 0.0002,
1354
+ "loss": 0.0026,
1355
  "step": 207
1356
  },
1357
  {
1358
+ "epoch": 13.87,
1359
  "learning_rate": 0.0002,
1360
+ "loss": 0.0056,
1361
  "step": 208
1362
  },
1363
  {
1364
+ "epoch": 13.93,
1365
  "learning_rate": 0.0002,
1366
+ "loss": 0.0033,
1367
  "step": 209
1368
  },
1369
  {
1370
+ "epoch": 14.0,
1371
  "learning_rate": 0.0002,
1372
+ "loss": 0.0008,
1373
+ "step": 210
1374
+ },
1375
+ {
1376
+ "epoch": 14.0,
1377
+ "eval_loss": 0.0017898541409522295,
1378
+ "eval_runtime": 40.7166,
1379
+ "eval_samples_per_second": 11.101,
1380
+ "eval_steps_per_second": 0.368,
1381
  "step": 210
1382
  },
1383
  {
1384
+ "epoch": 14.07,
1385
  "learning_rate": 0.0002,
1386
+ "loss": 0.0012,
1387
  "step": 211
1388
  },
1389
  {
1390
+ "epoch": 14.13,
1391
  "learning_rate": 0.0002,
1392
+ "loss": 0.0032,
1393
  "step": 212
1394
  },
1395
  {
1396
+ "epoch": 14.2,
1397
  "learning_rate": 0.0002,
1398
+ "loss": 0.0017,
1399
  "step": 213
1400
  },
1401
  {
1402
+ "epoch": 14.27,
1403
  "learning_rate": 0.0002,
1404
+ "loss": 0.0027,
1405
  "step": 214
1406
  },
1407
  {
1408
+ "epoch": 14.33,
1409
  "learning_rate": 0.0002,
1410
+ "loss": 0.0014,
1411
  "step": 215
1412
  },
1413
  {
1414
+ "epoch": 14.4,
1415
  "learning_rate": 0.0002,
1416
+ "loss": 0.0023,
1417
  "step": 216
1418
  },
1419
  {
1420
+ "epoch": 14.47,
1421
  "learning_rate": 0.0002,
1422
+ "loss": 0.0031,
 
 
 
 
 
 
 
 
1423
  "step": 217
1424
  },
1425
  {
1426
+ "epoch": 14.53,
1427
  "learning_rate": 0.0002,
1428
+ "loss": 0.0027,
1429
  "step": 218
1430
  },
1431
  {
1432
+ "epoch": 14.6,
1433
  "learning_rate": 0.0002,
1434
+ "loss": 0.0049,
1435
  "step": 219
1436
  },
1437
  {
1438
+ "epoch": 14.67,
1439
  "learning_rate": 0.0002,
1440
+ "loss": 0.0113,
1441
  "step": 220
1442
  },
1443
  {
1444
+ "epoch": 14.73,
1445
  "learning_rate": 0.0002,
1446
+ "loss": 0.0015,
1447
  "step": 221
1448
  },
1449
  {
1450
+ "epoch": 14.8,
1451
  "learning_rate": 0.0002,
1452
+ "loss": 0.0037,
1453
  "step": 222
1454
  },
1455
  {
1456
+ "epoch": 14.87,
1457
  "learning_rate": 0.0002,
1458
+ "loss": 0.0045,
1459
  "step": 223
1460
  },
1461
  {
1462
+ "epoch": 14.93,
1463
  "learning_rate": 0.0002,
1464
+ "loss": 0.004,
1465
  "step": 224
1466
  },
1467
  {
1468
+ "epoch": 15.0,
1469
  "learning_rate": 0.0002,
1470
+ "loss": 0.002,
1471
+ "step": 225
1472
+ },
1473
+ {
1474
+ "epoch": 15.0,
1475
+ "eval_loss": 0.001991454279050231,
1476
+ "eval_runtime": 40.8273,
1477
+ "eval_samples_per_second": 11.071,
1478
+ "eval_steps_per_second": 0.367,
1479
  "step": 225
1480
  },
1481
  {
1482
+ "epoch": 15.07,
1483
  "learning_rate": 0.0002,
1484
+ "loss": 0.0009,
1485
  "step": 226
1486
  },
1487
  {
1488
+ "epoch": 15.13,
1489
  "learning_rate": 0.0002,
1490
  "loss": 0.0025,
1491
  "step": 227
1492
  },
1493
  {
1494
+ "epoch": 15.2,
1495
  "learning_rate": 0.0002,
1496
+ "loss": 0.0021,
1497
  "step": 228
1498
  },
1499
  {
1500
+ "epoch": 15.27,
1501
  "learning_rate": 0.0002,
1502
+ "loss": 0.001,
1503
  "step": 229
1504
  },
1505
  {
1506
+ "epoch": 15.33,
1507
  "learning_rate": 0.0002,
1508
+ "loss": 0.0049,
1509
  "step": 230
1510
  },
1511
  {
1512
+ "epoch": 15.4,
1513
  "learning_rate": 0.0002,
1514
+ "loss": 0.0017,
1515
  "step": 231
1516
  },
1517
  {
1518
+ "epoch": 15.47,
1519
  "learning_rate": 0.0002,
1520
+ "loss": 0.0029,
1521
  "step": 232
1522
  },
1523
  {
1524
+ "epoch": 15.53,
1525
  "learning_rate": 0.0002,
1526
+ "loss": 0.0007,
1527
  "step": 233
1528
  },
1529
  {
1530
+ "epoch": 15.6,
1531
  "learning_rate": 0.0002,
1532
+ "loss": 0.0007,
1533
  "step": 234
1534
  },
1535
  {
1536
+ "epoch": 15.67,
1537
  "learning_rate": 0.0002,
1538
+ "loss": 0.0032,
1539
  "step": 235
1540
  },
1541
  {
1542
+ "epoch": 15.73,
1543
  "learning_rate": 0.0002,
1544
+ "loss": 0.0015,
1545
  "step": 236
1546
  },
1547
  {
1548
+ "epoch": 15.8,
1549
  "learning_rate": 0.0002,
1550
+ "loss": 0.0031,
1551
  "step": 237
1552
  },
1553
  {
1554
+ "epoch": 15.87,
1555
  "learning_rate": 0.0002,
1556
+ "loss": 0.0042,
1557
  "step": 238
1558
  },
1559
  {
1560
+ "epoch": 15.93,
1561
  "learning_rate": 0.0002,
1562
+ "loss": 0.0005,
1563
  "step": 239
1564
  },
1565
  {
1566
+ "epoch": 16.0,
1567
  "learning_rate": 0.0002,
1568
+ "loss": 0.0008,
1569
+ "step": 240
1570
+ },
1571
+ {
1572
+ "epoch": 16.0,
1573
+ "eval_loss": 0.0014073444763198495,
1574
+ "eval_runtime": 40.6782,
1575
+ "eval_samples_per_second": 11.112,
1576
+ "eval_steps_per_second": 0.369,
1577
  "step": 240
1578
  },
1579
  {
1580
+ "epoch": 16.07,
1581
  "learning_rate": 0.0002,
1582
+ "loss": 0.0005,
1583
  "step": 241
1584
  },
1585
  {
1586
+ "epoch": 16.13,
1587
  "learning_rate": 0.0002,
1588
+ "loss": 0.0018,
1589
  "step": 242
1590
  },
1591
  {
1592
+ "epoch": 16.2,
1593
  "learning_rate": 0.0002,
1594
+ "loss": 0.0011,
1595
  "step": 243
1596
  },
1597
  {
1598
+ "epoch": 16.27,
1599
  "learning_rate": 0.0002,
1600
+ "loss": 0.0021,
1601
  "step": 244
1602
  },
1603
  {
1604
+ "epoch": 16.33,
1605
  "learning_rate": 0.0002,
1606
+ "loss": 0.006,
1607
  "step": 245
1608
  },
1609
  {
1610
+ "epoch": 16.4,
1611
  "learning_rate": 0.0002,
1612
+ "loss": 0.0034,
1613
  "step": 246
1614
  },
1615
  {
1616
+ "epoch": 16.47,
1617
  "learning_rate": 0.0002,
1618
+ "loss": 0.0043,
1619
  "step": 247
1620
  },
1621
  {
1622
+ "epoch": 16.53,
1623
  "learning_rate": 0.0002,
1624
+ "loss": 0.0041,
 
 
 
 
 
 
 
 
1625
  "step": 248
1626
  },
1627
  {
1628
+ "epoch": 16.6,
1629
  "learning_rate": 0.0002,
1630
+ "loss": 0.006,
1631
  "step": 249
1632
  },
1633
  {
1634
+ "epoch": 16.67,
1635
  "learning_rate": 0.0002,
1636
+ "loss": 0.0062,
1637
  "step": 250
1638
  },
1639
  {
1640
+ "epoch": 16.73,
1641
  "learning_rate": 0.0002,
1642
+ "loss": 0.0034,
1643
  "step": 251
1644
  },
1645
  {
1646
+ "epoch": 16.8,
1647
  "learning_rate": 0.0002,
1648
+ "loss": 0.0053,
1649
  "step": 252
1650
  },
1651
  {
1652
+ "epoch": 16.87,
1653
  "learning_rate": 0.0002,
1654
+ "loss": 0.005,
1655
  "step": 253
1656
  },
1657
  {
1658
+ "epoch": 16.93,
1659
  "learning_rate": 0.0002,
1660
+ "loss": 0.0064,
1661
  "step": 254
1662
  },
1663
  {
1664
+ "epoch": 17.0,
1665
  "learning_rate": 0.0002,
1666
+ "loss": 0.0056,
1667
+ "step": 255
1668
+ },
1669
+ {
1670
+ "epoch": 17.0,
1671
+ "eval_loss": 0.0032809872645884752,
1672
+ "eval_runtime": 40.8858,
1673
+ "eval_samples_per_second": 11.055,
1674
+ "eval_steps_per_second": 0.367,
1675
  "step": 255
1676
  },
1677
  {
1678
+ "epoch": 17.07,
1679
  "learning_rate": 0.0002,
1680
+ "loss": 0.0044,
1681
  "step": 256
1682
  },
1683
  {
1684
+ "epoch": 17.13,
1685
  "learning_rate": 0.0002,
1686
+ "loss": 0.0039,
1687
  "step": 257
1688
  },
1689
  {
1690
+ "epoch": 17.2,
1691
  "learning_rate": 0.0002,
1692
+ "loss": 0.0013,
1693
  "step": 258
1694
  },
1695
  {
1696
+ "epoch": 17.27,
1697
  "learning_rate": 0.0002,
1698
+ "loss": 0.0045,
1699
  "step": 259
1700
  },
1701
  {
1702
+ "epoch": 17.33,
1703
  "learning_rate": 0.0002,
1704
+ "loss": 0.0033,
1705
  "step": 260
1706
  },
1707
  {
1708
+ "epoch": 17.4,
1709
  "learning_rate": 0.0002,
1710
+ "loss": 0.0025,
1711
  "step": 261
1712
  },
1713
  {
1714
+ "epoch": 17.47,
1715
  "learning_rate": 0.0002,
1716
+ "loss": 0.0016,
1717
  "step": 262
1718
  },
1719
  {
1720
+ "epoch": 17.53,
1721
  "learning_rate": 0.0002,
1722
+ "loss": 0.0055,
1723
  "step": 263
1724
  },
1725
  {
1726
+ "epoch": 17.6,
1727
  "learning_rate": 0.0002,
1728
+ "loss": 0.0042,
1729
  "step": 264
1730
  },
1731
  {
1732
+ "epoch": 17.67,
1733
  "learning_rate": 0.0002,
1734
+ "loss": 0.008,
1735
  "step": 265
1736
  },
1737
  {
1738
+ "epoch": 17.73,
1739
  "learning_rate": 0.0002,
1740
+ "loss": 0.0008,
1741
  "step": 266
1742
  },
1743
  {
1744
+ "epoch": 17.8,
1745
  "learning_rate": 0.0002,
1746
+ "loss": 0.0017,
1747
  "step": 267
1748
  },
1749
  {
1750
+ "epoch": 17.87,
1751
  "learning_rate": 0.0002,
1752
+ "loss": 0.0021,
1753
  "step": 268
1754
  },
1755
  {
1756
+ "epoch": 17.93,
1757
  "learning_rate": 0.0002,
1758
+ "loss": 0.002,
1759
  "step": 269
1760
  },
1761
  {
1762
+ "epoch": 18.0,
1763
  "learning_rate": 0.0002,
1764
+ "loss": 0.0067,
1765
+ "step": 270
1766
+ },
1767
+ {
1768
+ "epoch": 18.0,
1769
+ "eval_loss": 0.0025466056540608406,
1770
+ "eval_runtime": 40.8446,
1771
+ "eval_samples_per_second": 11.066,
1772
+ "eval_steps_per_second": 0.367,
1773
  "step": 270
1774
  },
1775
  {
1776
+ "epoch": 18.07,
1777
  "learning_rate": 0.0002,
1778
+ "loss": 0.0036,
1779
  "step": 271
1780
  },
1781
  {
1782
+ "epoch": 18.13,
1783
  "learning_rate": 0.0002,
1784
+ "loss": 0.0024,
1785
  "step": 272
1786
  },
1787
  {
1788
+ "epoch": 18.2,
1789
  "learning_rate": 0.0002,
1790
+ "loss": 0.0022,
1791
  "step": 273
1792
  },
1793
  {
1794
+ "epoch": 18.27,
1795
  "learning_rate": 0.0002,
1796
+ "loss": 0.0022,
1797
  "step": 274
1798
  },
1799
  {
1800
+ "epoch": 18.33,
1801
  "learning_rate": 0.0002,
1802
+ "loss": 0.0021,
1803
  "step": 275
1804
  },
1805
  {
1806
+ "epoch": 18.4,
1807
  "learning_rate": 0.0002,
1808
+ "loss": 0.0017,
1809
  "step": 276
1810
  },
1811
  {
1812
+ "epoch": 18.47,
1813
  "learning_rate": 0.0002,
1814
+ "loss": 0.0028,
1815
  "step": 277
1816
  },
1817
  {
1818
+ "epoch": 18.53,
1819
  "learning_rate": 0.0002,
1820
+ "loss": 0.0049,
1821
  "step": 278
1822
  },
1823
  {
1824
+ "epoch": 18.6,
1825
  "learning_rate": 0.0002,
1826
+ "loss": 0.0047,
1827
  "step": 279
1828
  },
1829
  {
1830
+ "epoch": 18.67,
 
 
 
 
 
 
 
 
1831
  "learning_rate": 0.0002,
1832
+ "loss": 0.0019,
1833
  "step": 280
1834
  },
1835
  {
1836
+ "epoch": 18.73,
1837
  "learning_rate": 0.0002,
1838
+ "loss": 0.0051,
1839
  "step": 281
1840
  },
1841
  {
1842
+ "epoch": 18.8,
1843
  "learning_rate": 0.0002,
1844
+ "loss": 0.0075,
1845
  "step": 282
1846
  },
1847
  {
1848
+ "epoch": 18.87,
1849
  "learning_rate": 0.0002,
1850
+ "loss": 0.0056,
1851
  "step": 283
1852
  },
1853
  {
1854
+ "epoch": 18.93,
1855
  "learning_rate": 0.0002,
1856
+ "loss": 0.0047,
1857
  "step": 284
1858
  },
1859
  {
1860
+ "epoch": 19.0,
1861
  "learning_rate": 0.0002,
1862
+ "loss": 0.0022,
1863
+ "step": 285
1864
+ },
1865
+ {
1866
+ "epoch": 19.0,
1867
+ "eval_loss": 0.0031834603287279606,
1868
+ "eval_runtime": 40.8515,
1869
+ "eval_samples_per_second": 11.064,
1870
+ "eval_steps_per_second": 0.367,
1871
  "step": 285
1872
  },
1873
  {
1874
+ "epoch": 19.07,
1875
  "learning_rate": 0.0002,
1876
+ "loss": 0.0029,
1877
  "step": 286
1878
  },
1879
  {
1880
+ "epoch": 19.13,
1881
  "learning_rate": 0.0002,
1882
+ "loss": 0.0026,
1883
  "step": 287
1884
  },
1885
  {
1886
+ "epoch": 19.2,
1887
  "learning_rate": 0.0002,
1888
+ "loss": 0.0023,
1889
  "step": 288
1890
  },
1891
  {
1892
+ "epoch": 19.27,
1893
  "learning_rate": 0.0002,
1894
+ "loss": 0.0022,
1895
  "step": 289
1896
  },
1897
  {
1898
+ "epoch": 19.33,
1899
  "learning_rate": 0.0002,
1900
+ "loss": 0.0038,
1901
  "step": 290
1902
  },
1903
  {
1904
+ "epoch": 19.4,
1905
  "learning_rate": 0.0002,
1906
  "loss": 0.0019,
1907
  "step": 291
1908
  },
1909
  {
1910
+ "epoch": 19.47,
1911
  "learning_rate": 0.0002,
1912
+ "loss": 0.0007,
1913
  "step": 292
1914
  },
1915
  {
1916
+ "epoch": 19.53,
1917
  "learning_rate": 0.0002,
1918
+ "loss": 0.0023,
1919
  "step": 293
1920
  },
1921
  {
1922
+ "epoch": 19.6,
1923
  "learning_rate": 0.0002,
1924
+ "loss": 0.0014,
1925
  "step": 294
1926
  },
1927
  {
1928
+ "epoch": 19.67,
1929
  "learning_rate": 0.0002,
1930
+ "loss": 0.0014,
1931
  "step": 295
1932
  },
1933
  {
1934
+ "epoch": 19.73,
1935
  "learning_rate": 0.0002,
1936
+ "loss": 0.0028,
1937
  "step": 296
1938
  },
1939
  {
1940
+ "epoch": 19.8,
1941
  "learning_rate": 0.0002,
1942
+ "loss": 0.0013,
1943
  "step": 297
1944
  },
1945
  {
1946
+ "epoch": 19.87,
1947
  "learning_rate": 0.0002,
1948
+ "loss": 0.0012,
1949
  "step": 298
1950
  },
1951
  {
1952
+ "epoch": 19.93,
1953
  "learning_rate": 0.0002,
1954
+ "loss": 0.0032,
1955
  "step": 299
1956
  },
1957
  {
1958
+ "epoch": 20.0,
1959
  "learning_rate": 0.0002,
1960
+ "loss": 0.0009,
1961
  "step": 300
1962
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1963
  {
1964
  "epoch": 20.0,
1965
+ "eval_loss": 0.001072522602044046,
1966
+ "eval_runtime": 40.8012,
1967
+ "eval_samples_per_second": 11.078,
1968
+ "eval_steps_per_second": 0.368,
1969
+ "step": 300
1970
  },
1971
  {
1972
  "epoch": 20.0,
1973
+ "step": 300,
1974
+ "total_flos": 1.1384455326209147e+18,
1975
+ "train_loss": 0.061958263306781496,
1976
+ "train_runtime": 6619.0096,
1977
+ "train_samples_per_second": 1.366,
1978
+ "train_steps_per_second": 0.045
1979
  }
1980
  ],
1981
  "logging_steps": 1.0,
1982
+ "max_steps": 300,
1983
  "num_input_tokens_seen": 0,
1984
  "num_train_epochs": 20,
1985
  "save_steps": 50000,
1986
+ "total_flos": 1.1384455326209147e+18,
1987
  "train_batch_size": 4,
1988
  "trial_name": null,
1989
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0d3015f2aca66eab473a03a8d16e8fccb8d2ce900506cbb559fb09032c3d6d13
3
  size 6840
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6c3553537bc1d3d0652cf3b2bdcfd5226cf4b6778fd749d352ca45570436029
3
  size 6840