kyryl-georgian commited on
Commit
6f4a345
1 Parent(s): 360860b

End of training

Browse files
README.md CHANGED
@@ -1,204 +1,56 @@
1
  ---
 
2
  library_name: peft
3
- base_model: google/flan-t5-base
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
 
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
 
18
 
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
 
201
 
202
  ### Framework versions
203
 
204
- - PEFT 0.7.1
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  library_name: peft
4
+ tags:
5
+ - generated_from_trainer
6
+ base_model: google/flan-t5-small
7
+ model-index:
8
+ - name: flan-base-sql
9
+ results: []
10
  ---
11
 
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # flan-base-sql
16
 
17
+ This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on the None dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 1.6025
20
 
21
+ ## Model description
22
 
23
+ More information needed
24
 
25
+ ## Intended uses & limitations
26
 
27
+ More information needed
28
 
29
+ ## Training and evaluation data
30
 
31
+ More information needed
32
 
33
+ ## Training procedure
 
 
 
 
 
 
34
 
35
+ ### Training hyperparameters
36
 
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 0.001
39
+ - train_batch_size: 16
40
+ - eval_batch_size: 16
41
+ - seed: 42
42
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
+ - lr_scheduler_type: linear
44
+ - num_epochs: 0.001
45
 
46
+ ### Training results
 
 
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
 
50
  ### Framework versions
51
 
52
+ - PEFT 0.7.1
53
+ - Transformers 4.38.0
54
+ - Pytorch 2.1.2+cu121
55
+ - Datasets 2.17.0
56
+ - Tokenizers 0.15.2
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "google/flan-t5-base",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
@@ -19,8 +19,8 @@
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "v",
23
- "q"
24
  ],
25
  "task_type": "SEQ_2_SEQ_LM"
26
  }
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "google/flan-t5-small",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
 
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
+ "q",
23
+ "v"
24
  ],
25
  "task_type": "SEQ_2_SEQ_LM"
26
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9b3296fcff514fee3ff0d0fd95872e9945028e0a5171922fe4e120f7bc1da6fb
3
- size 7098016
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:088861e13048d008a84b5bcea780b9d9f5ad691da24ba9da77ef998a2c522c2f
3
+ size 2765880
all_results.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
- "epoch": 10.0,
3
- "eval_loss": 0.05540305748581886,
4
- "eval_runtime": 27.2595,
5
- "eval_samples_per_second": 288.266,
6
- "eval_steps_per_second": 18.049,
7
- "train_loss": 0.09247852528257068,
8
- "train_runtime": 8790.0948,
9
- "train_samples_per_second": 80.453,
10
- "train_steps_per_second": 5.028
11
  }
 
1
  {
2
+ "epoch": 0.0,
3
+ "eval_loss": 1.6024976968765259,
4
+ "eval_runtime": 13.4905,
5
+ "eval_samples_per_second": 582.482,
6
+ "eval_steps_per_second": 36.47,
7
+ "train_loss": 2.5696409225463865,
8
+ "train_runtime": 1.0509,
9
+ "train_samples_per_second": 67.292,
10
+ "train_steps_per_second": 4.758
11
  }
eval_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 10.0,
3
- "eval_loss": 0.05540305748581886,
4
- "eval_runtime": 27.2595,
5
- "eval_samples_per_second": 288.266,
6
- "eval_steps_per_second": 18.049
7
  }
 
1
  {
2
+ "epoch": 0.0,
3
+ "eval_loss": 1.6024976968765259,
4
+ "eval_runtime": 13.4905,
5
+ "eval_samples_per_second": 582.482,
6
+ "eval_steps_per_second": 36.47
7
  }
special_tokens_map.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": {
105
+ "content": "</s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "pad_token": {
112
+ "content": "<pad>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "unk_token": {
119
+ "content": "<unk>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ }
125
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,938 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "32000": {
28
+ "content": "<extra_id_99>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "32001": {
36
+ "content": "<extra_id_98>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "32002": {
44
+ "content": "<extra_id_97>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "32003": {
52
+ "content": "<extra_id_96>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "32004": {
60
+ "content": "<extra_id_95>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "32005": {
68
+ "content": "<extra_id_94>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "32006": {
76
+ "content": "<extra_id_93>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "32007": {
84
+ "content": "<extra_id_92>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "32008": {
92
+ "content": "<extra_id_91>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "32009": {
100
+ "content": "<extra_id_90>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "32010": {
108
+ "content": "<extra_id_89>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "32011": {
116
+ "content": "<extra_id_88>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "32012": {
124
+ "content": "<extra_id_87>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "32013": {
132
+ "content": "<extra_id_86>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "32014": {
140
+ "content": "<extra_id_85>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "32015": {
148
+ "content": "<extra_id_84>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "32016": {
156
+ "content": "<extra_id_83>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "32017": {
164
+ "content": "<extra_id_82>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "32018": {
172
+ "content": "<extra_id_81>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "32019": {
180
+ "content": "<extra_id_80>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "32020": {
188
+ "content": "<extra_id_79>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "32021": {
196
+ "content": "<extra_id_78>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "32022": {
204
+ "content": "<extra_id_77>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "32023": {
212
+ "content": "<extra_id_76>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "32024": {
220
+ "content": "<extra_id_75>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "32025": {
228
+ "content": "<extra_id_74>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "32026": {
236
+ "content": "<extra_id_73>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "32027": {
244
+ "content": "<extra_id_72>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "32028": {
252
+ "content": "<extra_id_71>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "32029": {
260
+ "content": "<extra_id_70>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "32030": {
268
+ "content": "<extra_id_69>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "32031": {
276
+ "content": "<extra_id_68>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "32032": {
284
+ "content": "<extra_id_67>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "32033": {
292
+ "content": "<extra_id_66>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "32034": {
300
+ "content": "<extra_id_65>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "32035": {
308
+ "content": "<extra_id_64>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "32036": {
316
+ "content": "<extra_id_63>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "32037": {
324
+ "content": "<extra_id_62>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "32038": {
332
+ "content": "<extra_id_61>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "32039": {
340
+ "content": "<extra_id_60>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "32040": {
348
+ "content": "<extra_id_59>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "32041": {
356
+ "content": "<extra_id_58>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "32042": {
364
+ "content": "<extra_id_57>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "32043": {
372
+ "content": "<extra_id_56>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "32044": {
380
+ "content": "<extra_id_55>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "32045": {
388
+ "content": "<extra_id_54>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "32046": {
396
+ "content": "<extra_id_53>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "32047": {
404
+ "content": "<extra_id_52>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "32048": {
412
+ "content": "<extra_id_51>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "32049": {
420
+ "content": "<extra_id_50>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "32050": {
428
+ "content": "<extra_id_49>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "32051": {
436
+ "content": "<extra_id_48>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "32052": {
444
+ "content": "<extra_id_47>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "32053": {
452
+ "content": "<extra_id_46>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "32054": {
460
+ "content": "<extra_id_45>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "32055": {
468
+ "content": "<extra_id_44>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "32056": {
476
+ "content": "<extra_id_43>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "32057": {
484
+ "content": "<extra_id_42>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "32058": {
492
+ "content": "<extra_id_41>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "32059": {
500
+ "content": "<extra_id_40>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "32060": {
508
+ "content": "<extra_id_39>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "32061": {
516
+ "content": "<extra_id_38>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "32062": {
524
+ "content": "<extra_id_37>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "32063": {
532
+ "content": "<extra_id_36>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "32064": {
540
+ "content": "<extra_id_35>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "32065": {
548
+ "content": "<extra_id_34>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "32066": {
556
+ "content": "<extra_id_33>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "32067": {
564
+ "content": "<extra_id_32>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "32068": {
572
+ "content": "<extra_id_31>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "32069": {
580
+ "content": "<extra_id_30>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "32070": {
588
+ "content": "<extra_id_29>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "32071": {
596
+ "content": "<extra_id_28>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "32072": {
604
+ "content": "<extra_id_27>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "32073": {
612
+ "content": "<extra_id_26>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "32074": {
620
+ "content": "<extra_id_25>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "32075": {
628
+ "content": "<extra_id_24>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "32076": {
636
+ "content": "<extra_id_23>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "32077": {
644
+ "content": "<extra_id_22>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "32078": {
652
+ "content": "<extra_id_21>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "32079": {
660
+ "content": "<extra_id_20>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "32080": {
668
+ "content": "<extra_id_19>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "32081": {
676
+ "content": "<extra_id_18>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "32082": {
684
+ "content": "<extra_id_17>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "32083": {
692
+ "content": "<extra_id_16>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "32084": {
700
+ "content": "<extra_id_15>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "32085": {
708
+ "content": "<extra_id_14>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "32086": {
716
+ "content": "<extra_id_13>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "32087": {
724
+ "content": "<extra_id_12>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "32088": {
732
+ "content": "<extra_id_11>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "32089": {
740
+ "content": "<extra_id_10>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "32090": {
748
+ "content": "<extra_id_9>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "32091": {
756
+ "content": "<extra_id_8>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "32092": {
764
+ "content": "<extra_id_7>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "32093": {
772
+ "content": "<extra_id_6>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "32094": {
780
+ "content": "<extra_id_5>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "32095": {
788
+ "content": "<extra_id_4>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "32096": {
796
+ "content": "<extra_id_3>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "32097": {
804
+ "content": "<extra_id_2>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "32098": {
812
+ "content": "<extra_id_1>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "32099": {
820
+ "content": "<extra_id_0>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ }
827
+ },
828
+ "additional_special_tokens": [
829
+ "<extra_id_0>",
830
+ "<extra_id_1>",
831
+ "<extra_id_2>",
832
+ "<extra_id_3>",
833
+ "<extra_id_4>",
834
+ "<extra_id_5>",
835
+ "<extra_id_6>",
836
+ "<extra_id_7>",
837
+ "<extra_id_8>",
838
+ "<extra_id_9>",
839
+ "<extra_id_10>",
840
+ "<extra_id_11>",
841
+ "<extra_id_12>",
842
+ "<extra_id_13>",
843
+ "<extra_id_14>",
844
+ "<extra_id_15>",
845
+ "<extra_id_16>",
846
+ "<extra_id_17>",
847
+ "<extra_id_18>",
848
+ "<extra_id_19>",
849
+ "<extra_id_20>",
850
+ "<extra_id_21>",
851
+ "<extra_id_22>",
852
+ "<extra_id_23>",
853
+ "<extra_id_24>",
854
+ "<extra_id_25>",
855
+ "<extra_id_26>",
856
+ "<extra_id_27>",
857
+ "<extra_id_28>",
858
+ "<extra_id_29>",
859
+ "<extra_id_30>",
860
+ "<extra_id_31>",
861
+ "<extra_id_32>",
862
+ "<extra_id_33>",
863
+ "<extra_id_34>",
864
+ "<extra_id_35>",
865
+ "<extra_id_36>",
866
+ "<extra_id_37>",
867
+ "<extra_id_38>",
868
+ "<extra_id_39>",
869
+ "<extra_id_40>",
870
+ "<extra_id_41>",
871
+ "<extra_id_42>",
872
+ "<extra_id_43>",
873
+ "<extra_id_44>",
874
+ "<extra_id_45>",
875
+ "<extra_id_46>",
876
+ "<extra_id_47>",
877
+ "<extra_id_48>",
878
+ "<extra_id_49>",
879
+ "<extra_id_50>",
880
+ "<extra_id_51>",
881
+ "<extra_id_52>",
882
+ "<extra_id_53>",
883
+ "<extra_id_54>",
884
+ "<extra_id_55>",
885
+ "<extra_id_56>",
886
+ "<extra_id_57>",
887
+ "<extra_id_58>",
888
+ "<extra_id_59>",
889
+ "<extra_id_60>",
890
+ "<extra_id_61>",
891
+ "<extra_id_62>",
892
+ "<extra_id_63>",
893
+ "<extra_id_64>",
894
+ "<extra_id_65>",
895
+ "<extra_id_66>",
896
+ "<extra_id_67>",
897
+ "<extra_id_68>",
898
+ "<extra_id_69>",
899
+ "<extra_id_70>",
900
+ "<extra_id_71>",
901
+ "<extra_id_72>",
902
+ "<extra_id_73>",
903
+ "<extra_id_74>",
904
+ "<extra_id_75>",
905
+ "<extra_id_76>",
906
+ "<extra_id_77>",
907
+ "<extra_id_78>",
908
+ "<extra_id_79>",
909
+ "<extra_id_80>",
910
+ "<extra_id_81>",
911
+ "<extra_id_82>",
912
+ "<extra_id_83>",
913
+ "<extra_id_84>",
914
+ "<extra_id_85>",
915
+ "<extra_id_86>",
916
+ "<extra_id_87>",
917
+ "<extra_id_88>",
918
+ "<extra_id_89>",
919
+ "<extra_id_90>",
920
+ "<extra_id_91>",
921
+ "<extra_id_92>",
922
+ "<extra_id_93>",
923
+ "<extra_id_94>",
924
+ "<extra_id_95>",
925
+ "<extra_id_96>",
926
+ "<extra_id_97>",
927
+ "<extra_id_98>",
928
+ "<extra_id_99>"
929
+ ],
930
+ "clean_up_tokenization_spaces": true,
931
+ "eos_token": "</s>",
932
+ "extra_ids": 100,
933
+ "model_max_length": 512,
934
+ "pad_token": "<pad>",
935
+ "sp_model_kwargs": {},
936
+ "tokenizer_class": "T5Tokenizer",
937
+ "unk_token": "<unk>"
938
+ }
train_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 10.0,
3
- "train_loss": 0.09247852528257068,
4
- "train_runtime": 8790.0948,
5
- "train_samples_per_second": 80.453,
6
- "train_steps_per_second": 5.028
7
  }
 
1
  {
2
+ "epoch": 0.0,
3
+ "train_loss": 2.5696409225463865,
4
+ "train_runtime": 1.0509,
5
+ "train_samples_per_second": 67.292,
6
+ "train_steps_per_second": 4.758
7
  }
trainer_state.json CHANGED
@@ -1,1349 +1,29 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 10.0,
5
  "eval_steps": 500,
6
- "global_step": 44200,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.11,
13
- "grad_norm": 0.4132317900657654,
14
- "learning_rate": 0.0009886877828054299,
15
- "loss": 0.285,
16
- "step": 500
17
- },
18
- {
19
- "epoch": 0.11,
20
- "eval_loss": 0.12178385257720947,
21
- "eval_runtime": 27.3061,
22
- "eval_samples_per_second": 287.774,
23
- "eval_steps_per_second": 18.018,
24
- "step": 500
25
- },
26
- {
27
- "epoch": 0.23,
28
- "grad_norm": 0.4537642300128937,
29
- "learning_rate": 0.0009773755656108597,
30
- "loss": 0.1782,
31
- "step": 1000
32
- },
33
- {
34
- "epoch": 0.23,
35
- "eval_loss": 0.10424336045980453,
36
- "eval_runtime": 27.2202,
37
- "eval_samples_per_second": 288.683,
38
- "eval_steps_per_second": 18.075,
39
- "step": 1000
40
- },
41
- {
42
- "epoch": 0.34,
43
- "grad_norm": 0.4927336871623993,
44
- "learning_rate": 0.0009660633484162896,
45
- "loss": 0.1623,
46
- "step": 1500
47
- },
48
- {
49
- "epoch": 0.34,
50
- "eval_loss": 0.1000167727470398,
51
- "eval_runtime": 27.2235,
52
- "eval_samples_per_second": 288.648,
53
- "eval_steps_per_second": 18.073,
54
- "step": 1500
55
- },
56
- {
57
- "epoch": 0.45,
58
- "grad_norm": 0.3784632682800293,
59
- "learning_rate": 0.0009547511312217196,
60
- "loss": 0.1487,
61
- "step": 2000
62
- },
63
- {
64
- "epoch": 0.45,
65
- "eval_loss": 0.10203839093446732,
66
- "eval_runtime": 27.2422,
67
- "eval_samples_per_second": 288.45,
68
- "eval_steps_per_second": 18.06,
69
- "step": 2000
70
- },
71
- {
72
- "epoch": 0.57,
73
- "grad_norm": 0.4409666955471039,
74
- "learning_rate": 0.0009434389140271493,
75
- "loss": 0.1419,
76
- "step": 2500
77
- },
78
- {
79
- "epoch": 0.57,
80
- "eval_loss": 0.0881214290857315,
81
- "eval_runtime": 27.2438,
82
- "eval_samples_per_second": 288.432,
83
- "eval_steps_per_second": 18.059,
84
- "step": 2500
85
- },
86
- {
87
- "epoch": 0.68,
88
- "grad_norm": 0.33156296610832214,
89
- "learning_rate": 0.0009321266968325792,
90
- "loss": 0.1371,
91
- "step": 3000
92
- },
93
- {
94
- "epoch": 0.68,
95
- "eval_loss": 0.08762603253126144,
96
- "eval_runtime": 27.2331,
97
- "eval_samples_per_second": 288.546,
98
- "eval_steps_per_second": 18.066,
99
- "step": 3000
100
- },
101
- {
102
- "epoch": 0.79,
103
- "grad_norm": 0.26063305139541626,
104
- "learning_rate": 0.000920814479638009,
105
- "loss": 0.1366,
106
- "step": 3500
107
- },
108
- {
109
- "epoch": 0.79,
110
- "eval_loss": 0.08680489659309387,
111
- "eval_runtime": 27.2296,
112
- "eval_samples_per_second": 288.583,
113
- "eval_steps_per_second": 18.069,
114
- "step": 3500
115
- },
116
- {
117
- "epoch": 0.9,
118
- "grad_norm": 0.6152302622795105,
119
- "learning_rate": 0.0009095022624434389,
120
- "loss": 0.1288,
121
- "step": 4000
122
- },
123
- {
124
- "epoch": 0.9,
125
- "eval_loss": 0.08397215604782104,
126
- "eval_runtime": 27.259,
127
- "eval_samples_per_second": 288.272,
128
- "eval_steps_per_second": 18.049,
129
- "step": 4000
130
- },
131
- {
132
- "epoch": 1.02,
133
- "grad_norm": 0.20703616738319397,
134
- "learning_rate": 0.0008981900452488689,
135
- "loss": 0.1329,
136
- "step": 4500
137
- },
138
- {
139
- "epoch": 1.02,
140
- "eval_loss": 0.0818256288766861,
141
- "eval_runtime": 27.2728,
142
- "eval_samples_per_second": 288.126,
143
- "eval_steps_per_second": 18.04,
144
- "step": 4500
145
- },
146
- {
147
- "epoch": 1.13,
148
- "grad_norm": 0.47000011801719666,
149
- "learning_rate": 0.0008868778280542986,
150
- "loss": 0.1221,
151
- "step": 5000
152
- },
153
- {
154
- "epoch": 1.13,
155
- "eval_loss": 0.08076580613851547,
156
- "eval_runtime": 27.2281,
157
- "eval_samples_per_second": 288.599,
158
- "eval_steps_per_second": 18.07,
159
- "step": 5000
160
- },
161
- {
162
- "epoch": 1.24,
163
- "grad_norm": 0.3874566853046417,
164
- "learning_rate": 0.0008755656108597285,
165
- "loss": 0.1202,
166
- "step": 5500
167
- },
168
- {
169
- "epoch": 1.24,
170
- "eval_loss": 0.08285848051309586,
171
- "eval_runtime": 27.2548,
172
- "eval_samples_per_second": 288.316,
173
- "eval_steps_per_second": 18.052,
174
- "step": 5500
175
- },
176
- {
177
- "epoch": 1.36,
178
- "grad_norm": 0.35874509811401367,
179
- "learning_rate": 0.0008642533936651585,
180
- "loss": 0.1186,
181
- "step": 6000
182
- },
183
- {
184
- "epoch": 1.36,
185
- "eval_loss": 0.081941157579422,
186
- "eval_runtime": 27.2605,
187
- "eval_samples_per_second": 288.256,
188
- "eval_steps_per_second": 18.048,
189
- "step": 6000
190
- },
191
- {
192
- "epoch": 1.47,
193
- "grad_norm": 0.34497329592704773,
194
- "learning_rate": 0.0008529411764705882,
195
- "loss": 0.1163,
196
- "step": 6500
197
- },
198
- {
199
- "epoch": 1.47,
200
- "eval_loss": 0.07750312983989716,
201
- "eval_runtime": 27.2737,
202
- "eval_samples_per_second": 288.116,
203
- "eval_steps_per_second": 18.039,
204
- "step": 6500
205
- },
206
- {
207
- "epoch": 1.58,
208
- "grad_norm": 0.34537777304649353,
209
- "learning_rate": 0.0008416289592760181,
210
- "loss": 0.1213,
211
- "step": 7000
212
- },
213
- {
214
- "epoch": 1.58,
215
- "eval_loss": 0.0756232738494873,
216
- "eval_runtime": 27.2421,
217
- "eval_samples_per_second": 288.451,
218
- "eval_steps_per_second": 18.06,
219
- "step": 7000
220
- },
221
- {
222
- "epoch": 1.7,
223
- "grad_norm": 0.31827136874198914,
224
- "learning_rate": 0.000830316742081448,
225
- "loss": 0.1169,
226
- "step": 7500
227
- },
228
- {
229
- "epoch": 1.7,
230
- "eval_loss": 0.07395777106285095,
231
- "eval_runtime": 27.2385,
232
- "eval_samples_per_second": 288.489,
233
- "eval_steps_per_second": 18.063,
234
- "step": 7500
235
- },
236
- {
237
- "epoch": 1.81,
238
- "grad_norm": 0.43441176414489746,
239
- "learning_rate": 0.0008190045248868778,
240
- "loss": 0.1151,
241
- "step": 8000
242
- },
243
- {
244
- "epoch": 1.81,
245
- "eval_loss": 0.08032752573490143,
246
- "eval_runtime": 27.2392,
247
- "eval_samples_per_second": 288.481,
248
- "eval_steps_per_second": 18.062,
249
- "step": 8000
250
- },
251
- {
252
- "epoch": 1.92,
253
- "grad_norm": 0.403935968875885,
254
- "learning_rate": 0.0008076923076923078,
255
- "loss": 0.1162,
256
- "step": 8500
257
- },
258
- {
259
- "epoch": 1.92,
260
- "eval_loss": 0.07349220663309097,
261
- "eval_runtime": 27.2543,
262
- "eval_samples_per_second": 288.322,
263
- "eval_steps_per_second": 18.052,
264
- "step": 8500
265
- },
266
- {
267
- "epoch": 2.04,
268
- "grad_norm": 0.2286953330039978,
269
- "learning_rate": 0.0007963800904977375,
270
- "loss": 0.1157,
271
- "step": 9000
272
- },
273
- {
274
- "epoch": 2.04,
275
- "eval_loss": 0.07655400782823563,
276
- "eval_runtime": 27.2562,
277
- "eval_samples_per_second": 288.301,
278
- "eval_steps_per_second": 18.051,
279
- "step": 9000
280
- },
281
- {
282
- "epoch": 2.15,
283
- "grad_norm": 0.2294893115758896,
284
- "learning_rate": 0.0007850678733031674,
285
- "loss": 0.1121,
286
- "step": 9500
287
- },
288
- {
289
- "epoch": 2.15,
290
- "eval_loss": 0.07088885456323624,
291
- "eval_runtime": 27.2467,
292
- "eval_samples_per_second": 288.402,
293
- "eval_steps_per_second": 18.057,
294
- "step": 9500
295
- },
296
- {
297
- "epoch": 2.26,
298
- "grad_norm": 0.44981372356414795,
299
- "learning_rate": 0.0007737556561085974,
300
- "loss": 0.1073,
301
- "step": 10000
302
- },
303
- {
304
- "epoch": 2.26,
305
- "eval_loss": 0.07331141084432602,
306
- "eval_runtime": 27.2427,
307
- "eval_samples_per_second": 288.445,
308
- "eval_steps_per_second": 18.06,
309
- "step": 10000
310
- },
311
- {
312
- "epoch": 2.38,
313
- "grad_norm": 0.4742676019668579,
314
- "learning_rate": 0.0007624434389140271,
315
- "loss": 0.1063,
316
- "step": 10500
317
- },
318
- {
319
- "epoch": 2.38,
320
- "eval_loss": 0.07561534643173218,
321
- "eval_runtime": 27.2233,
322
- "eval_samples_per_second": 288.65,
323
- "eval_steps_per_second": 18.073,
324
- "step": 10500
325
- },
326
- {
327
- "epoch": 2.49,
328
- "grad_norm": 0.4676545262336731,
329
- "learning_rate": 0.0007511312217194571,
330
- "loss": 0.1109,
331
- "step": 11000
332
- },
333
- {
334
- "epoch": 2.49,
335
- "eval_loss": 0.07197986543178558,
336
- "eval_runtime": 27.2211,
337
- "eval_samples_per_second": 288.673,
338
- "eval_steps_per_second": 18.074,
339
- "step": 11000
340
- },
341
- {
342
- "epoch": 2.6,
343
- "grad_norm": 0.5688673257827759,
344
- "learning_rate": 0.0007398190045248869,
345
- "loss": 0.1072,
346
- "step": 11500
347
- },
348
- {
349
- "epoch": 2.6,
350
- "eval_loss": 0.07261210680007935,
351
- "eval_runtime": 27.2328,
352
- "eval_samples_per_second": 288.549,
353
- "eval_steps_per_second": 18.066,
354
- "step": 11500
355
- },
356
- {
357
- "epoch": 2.71,
358
- "grad_norm": 0.24911655485630035,
359
- "learning_rate": 0.0007285067873303167,
360
- "loss": 0.1055,
361
- "step": 12000
362
- },
363
- {
364
- "epoch": 2.71,
365
- "eval_loss": 0.06898481398820877,
366
- "eval_runtime": 27.2378,
367
- "eval_samples_per_second": 288.496,
368
- "eval_steps_per_second": 18.063,
369
- "step": 12000
370
- },
371
- {
372
- "epoch": 2.83,
373
- "grad_norm": 0.4301845133304596,
374
- "learning_rate": 0.0007171945701357467,
375
- "loss": 0.1004,
376
- "step": 12500
377
- },
378
- {
379
- "epoch": 2.83,
380
- "eval_loss": 0.06929654628038406,
381
- "eval_runtime": 27.2358,
382
- "eval_samples_per_second": 288.517,
383
- "eval_steps_per_second": 18.064,
384
- "step": 12500
385
- },
386
- {
387
- "epoch": 2.94,
388
- "grad_norm": 0.4303476810455322,
389
- "learning_rate": 0.0007058823529411765,
390
- "loss": 0.0995,
391
- "step": 13000
392
- },
393
- {
394
- "epoch": 2.94,
395
- "eval_loss": 0.06872580200433731,
396
- "eval_runtime": 27.2296,
397
- "eval_samples_per_second": 288.583,
398
- "eval_steps_per_second": 18.069,
399
- "step": 13000
400
- },
401
- {
402
- "epoch": 3.05,
403
- "grad_norm": 0.3978405296802521,
404
- "learning_rate": 0.0006945701357466064,
405
- "loss": 0.0999,
406
- "step": 13500
407
- },
408
- {
409
- "epoch": 3.05,
410
- "eval_loss": 0.06932587921619415,
411
- "eval_runtime": 27.2271,
412
- "eval_samples_per_second": 288.609,
413
- "eval_steps_per_second": 18.07,
414
- "step": 13500
415
- },
416
- {
417
- "epoch": 3.17,
418
- "grad_norm": 0.26857316493988037,
419
- "learning_rate": 0.0006832579185520362,
420
- "loss": 0.0959,
421
- "step": 14000
422
- },
423
- {
424
- "epoch": 3.17,
425
- "eval_loss": 0.07186341285705566,
426
- "eval_runtime": 27.231,
427
- "eval_samples_per_second": 288.569,
428
- "eval_steps_per_second": 18.068,
429
- "step": 14000
430
- },
431
- {
432
- "epoch": 3.28,
433
- "grad_norm": 0.4276795983314514,
434
- "learning_rate": 0.0006719457013574661,
435
- "loss": 0.0982,
436
- "step": 14500
437
- },
438
- {
439
- "epoch": 3.28,
440
- "eval_loss": 0.07152236998081207,
441
- "eval_runtime": 27.2215,
442
- "eval_samples_per_second": 288.669,
443
- "eval_steps_per_second": 18.074,
444
- "step": 14500
445
- },
446
- {
447
- "epoch": 3.39,
448
- "grad_norm": 0.41015538573265076,
449
- "learning_rate": 0.000660633484162896,
450
- "loss": 0.0969,
451
- "step": 15000
452
- },
453
- {
454
- "epoch": 3.39,
455
- "eval_loss": 0.07108399271965027,
456
- "eval_runtime": 27.2286,
457
- "eval_samples_per_second": 288.594,
458
- "eval_steps_per_second": 18.069,
459
- "step": 15000
460
- },
461
- {
462
- "epoch": 3.51,
463
- "grad_norm": 0.180690735578537,
464
- "learning_rate": 0.0006493212669683258,
465
- "loss": 0.0995,
466
- "step": 15500
467
- },
468
- {
469
- "epoch": 3.51,
470
- "eval_loss": 0.06466764211654663,
471
- "eval_runtime": 27.2483,
472
- "eval_samples_per_second": 288.385,
473
- "eval_steps_per_second": 18.056,
474
- "step": 15500
475
- },
476
- {
477
- "epoch": 3.62,
478
- "grad_norm": 0.2916184067726135,
479
- "learning_rate": 0.0006380090497737556,
480
- "loss": 0.0962,
481
- "step": 16000
482
- },
483
- {
484
- "epoch": 3.62,
485
- "eval_loss": 0.06967472285032272,
486
- "eval_runtime": 27.2534,
487
- "eval_samples_per_second": 288.331,
488
- "eval_steps_per_second": 18.053,
489
- "step": 16000
490
- },
491
- {
492
- "epoch": 3.73,
493
- "grad_norm": 0.444690465927124,
494
- "learning_rate": 0.0006266968325791855,
495
- "loss": 0.0959,
496
- "step": 16500
497
- },
498
- {
499
- "epoch": 3.73,
500
- "eval_loss": 0.06753501296043396,
501
- "eval_runtime": 27.2523,
502
- "eval_samples_per_second": 288.343,
503
- "eval_steps_per_second": 18.054,
504
- "step": 16500
505
- },
506
- {
507
- "epoch": 3.85,
508
- "grad_norm": 0.3559369146823883,
509
- "learning_rate": 0.0006153846153846154,
510
- "loss": 0.0949,
511
- "step": 17000
512
- },
513
- {
514
- "epoch": 3.85,
515
- "eval_loss": 0.06987947970628738,
516
- "eval_runtime": 27.2447,
517
- "eval_samples_per_second": 288.423,
518
- "eval_steps_per_second": 18.059,
519
- "step": 17000
520
- },
521
- {
522
- "epoch": 3.96,
523
- "grad_norm": 0.3376706838607788,
524
- "learning_rate": 0.0006040723981900453,
525
- "loss": 0.096,
526
- "step": 17500
527
- },
528
- {
529
- "epoch": 3.96,
530
- "eval_loss": 0.06431511789560318,
531
- "eval_runtime": 27.2316,
532
- "eval_samples_per_second": 288.562,
533
- "eval_steps_per_second": 18.067,
534
- "step": 17500
535
- },
536
- {
537
- "epoch": 4.07,
538
- "grad_norm": 0.4778081476688385,
539
- "learning_rate": 0.0005927601809954751,
540
- "loss": 0.0916,
541
- "step": 18000
542
- },
543
- {
544
- "epoch": 4.07,
545
- "eval_loss": 0.06719387322664261,
546
- "eval_runtime": 27.1935,
547
- "eval_samples_per_second": 288.967,
548
- "eval_steps_per_second": 18.093,
549
- "step": 18000
550
- },
551
- {
552
- "epoch": 4.19,
553
- "grad_norm": 0.6138429641723633,
554
- "learning_rate": 0.000581447963800905,
555
- "loss": 0.0887,
556
- "step": 18500
557
- },
558
- {
559
- "epoch": 4.19,
560
- "eval_loss": 0.06378566473722458,
561
- "eval_runtime": 27.2223,
562
- "eval_samples_per_second": 288.661,
563
- "eval_steps_per_second": 18.073,
564
- "step": 18500
565
- },
566
- {
567
- "epoch": 4.3,
568
- "grad_norm": 0.48502928018569946,
569
- "learning_rate": 0.0005701357466063349,
570
- "loss": 0.0902,
571
- "step": 19000
572
- },
573
- {
574
- "epoch": 4.3,
575
- "eval_loss": 0.06466159969568253,
576
- "eval_runtime": 27.224,
577
- "eval_samples_per_second": 288.642,
578
- "eval_steps_per_second": 18.072,
579
- "step": 19000
580
- },
581
- {
582
- "epoch": 4.41,
583
- "grad_norm": 0.28751033544540405,
584
- "learning_rate": 0.0005588235294117647,
585
- "loss": 0.089,
586
- "step": 19500
587
- },
588
- {
589
- "epoch": 4.41,
590
- "eval_loss": 0.06292453408241272,
591
- "eval_runtime": 27.2238,
592
- "eval_samples_per_second": 288.644,
593
- "eval_steps_per_second": 18.072,
594
- "step": 19500
595
- },
596
- {
597
- "epoch": 4.52,
598
- "grad_norm": 0.2429145723581314,
599
- "learning_rate": 0.0005475113122171947,
600
- "loss": 0.0881,
601
- "step": 20000
602
- },
603
- {
604
- "epoch": 4.52,
605
- "eval_loss": 0.0646950751543045,
606
- "eval_runtime": 27.2322,
607
- "eval_samples_per_second": 288.555,
608
- "eval_steps_per_second": 18.067,
609
- "step": 20000
610
- },
611
- {
612
- "epoch": 4.64,
613
- "grad_norm": 0.13486433029174805,
614
- "learning_rate": 0.0005361990950226244,
615
- "loss": 0.0875,
616
- "step": 20500
617
- },
618
- {
619
- "epoch": 4.64,
620
- "eval_loss": 0.06334567815065384,
621
- "eval_runtime": 27.2229,
622
- "eval_samples_per_second": 288.654,
623
- "eval_steps_per_second": 18.073,
624
- "step": 20500
625
- },
626
- {
627
- "epoch": 4.75,
628
- "grad_norm": 0.2922358512878418,
629
- "learning_rate": 0.0005248868778280543,
630
- "loss": 0.0894,
631
- "step": 21000
632
- },
633
- {
634
- "epoch": 4.75,
635
- "eval_loss": 0.06537148356437683,
636
- "eval_runtime": 27.2312,
637
- "eval_samples_per_second": 288.566,
638
- "eval_steps_per_second": 18.068,
639
- "step": 21000
640
- },
641
- {
642
- "epoch": 4.86,
643
- "grad_norm": 0.22684411704540253,
644
- "learning_rate": 0.0005135746606334842,
645
- "loss": 0.0901,
646
- "step": 21500
647
- },
648
- {
649
- "epoch": 4.86,
650
- "eval_loss": 0.06314302235841751,
651
- "eval_runtime": 27.237,
652
- "eval_samples_per_second": 288.504,
653
- "eval_steps_per_second": 18.064,
654
- "step": 21500
655
- },
656
- {
657
- "epoch": 4.98,
658
- "grad_norm": 0.641290545463562,
659
- "learning_rate": 0.000502262443438914,
660
- "loss": 0.0898,
661
- "step": 22000
662
- },
663
- {
664
- "epoch": 4.98,
665
- "eval_loss": 0.06266883760690689,
666
- "eval_runtime": 27.2238,
667
- "eval_samples_per_second": 288.645,
668
- "eval_steps_per_second": 18.072,
669
- "step": 22000
670
- },
671
- {
672
- "epoch": 5.09,
673
- "grad_norm": 0.31225764751434326,
674
- "learning_rate": 0.0004909502262443439,
675
- "loss": 0.0813,
676
- "step": 22500
677
- },
678
- {
679
- "epoch": 5.09,
680
- "eval_loss": 0.06273192167282104,
681
- "eval_runtime": 27.2277,
682
- "eval_samples_per_second": 288.603,
683
- "eval_steps_per_second": 18.07,
684
- "step": 22500
685
- },
686
- {
687
- "epoch": 5.2,
688
- "grad_norm": 0.44664525985717773,
689
- "learning_rate": 0.0004796380090497738,
690
- "loss": 0.083,
691
- "step": 23000
692
- },
693
- {
694
- "epoch": 5.2,
695
- "eval_loss": 0.06290117651224136,
696
- "eval_runtime": 27.2049,
697
- "eval_samples_per_second": 288.845,
698
- "eval_steps_per_second": 18.085,
699
- "step": 23000
700
- },
701
- {
702
- "epoch": 5.32,
703
- "grad_norm": 0.1560264378786087,
704
- "learning_rate": 0.00046832579185520365,
705
- "loss": 0.0833,
706
- "step": 23500
707
- },
708
- {
709
- "epoch": 5.32,
710
- "eval_loss": 0.06229640915989876,
711
- "eval_runtime": 27.2246,
712
- "eval_samples_per_second": 288.636,
713
- "eval_steps_per_second": 18.072,
714
- "step": 23500
715
- },
716
- {
717
- "epoch": 5.43,
718
- "grad_norm": 0.11389543116092682,
719
- "learning_rate": 0.00045701357466063346,
720
- "loss": 0.083,
721
- "step": 24000
722
- },
723
- {
724
- "epoch": 5.43,
725
- "eval_loss": 0.06498704105615616,
726
- "eval_runtime": 27.2302,
727
- "eval_samples_per_second": 288.576,
728
- "eval_steps_per_second": 18.068,
729
- "step": 24000
730
- },
731
- {
732
- "epoch": 5.54,
733
- "grad_norm": 0.6757131814956665,
734
- "learning_rate": 0.0004457013574660634,
735
- "loss": 0.0825,
736
- "step": 24500
737
- },
738
- {
739
- "epoch": 5.54,
740
- "eval_loss": 0.06173526123166084,
741
- "eval_runtime": 27.2094,
742
- "eval_samples_per_second": 288.798,
743
- "eval_steps_per_second": 18.082,
744
- "step": 24500
745
- },
746
- {
747
- "epoch": 5.66,
748
- "grad_norm": 0.2726614475250244,
749
- "learning_rate": 0.00043438914027149324,
750
- "loss": 0.0829,
751
- "step": 25000
752
- },
753
- {
754
- "epoch": 5.66,
755
- "eval_loss": 0.060302384197711945,
756
- "eval_runtime": 27.2049,
757
- "eval_samples_per_second": 288.845,
758
- "eval_steps_per_second": 18.085,
759
- "step": 25000
760
- },
761
- {
762
- "epoch": 5.77,
763
- "grad_norm": 0.8743285536766052,
764
- "learning_rate": 0.0004230769230769231,
765
- "loss": 0.0818,
766
- "step": 25500
767
- },
768
- {
769
- "epoch": 5.77,
770
- "eval_loss": 0.062085919082164764,
771
- "eval_runtime": 27.2011,
772
- "eval_samples_per_second": 288.885,
773
- "eval_steps_per_second": 18.087,
774
- "step": 25500
775
- },
776
- {
777
- "epoch": 5.88,
778
- "grad_norm": 0.2872491478919983,
779
- "learning_rate": 0.0004117647058823529,
780
- "loss": 0.0807,
781
- "step": 26000
782
- },
783
- {
784
- "epoch": 5.88,
785
- "eval_loss": 0.059214599430561066,
786
- "eval_runtime": 27.2158,
787
- "eval_samples_per_second": 288.73,
788
- "eval_steps_per_second": 18.078,
789
- "step": 26000
790
- },
791
- {
792
- "epoch": 6.0,
793
- "grad_norm": 0.5603688955307007,
794
- "learning_rate": 0.0004004524886877828,
795
- "loss": 0.082,
796
- "step": 26500
797
- },
798
- {
799
- "epoch": 6.0,
800
- "eval_loss": 0.05830477178096771,
801
- "eval_runtime": 27.2102,
802
- "eval_samples_per_second": 288.788,
803
- "eval_steps_per_second": 18.081,
804
- "step": 26500
805
- },
806
- {
807
- "epoch": 6.11,
808
- "grad_norm": 0.4404628574848175,
809
- "learning_rate": 0.0003891402714932127,
810
- "loss": 0.0763,
811
- "step": 27000
812
- },
813
- {
814
- "epoch": 6.11,
815
- "eval_loss": 0.05895010381937027,
816
- "eval_runtime": 27.2169,
817
- "eval_samples_per_second": 288.718,
818
- "eval_steps_per_second": 18.077,
819
- "step": 27000
820
- },
821
- {
822
- "epoch": 6.22,
823
- "grad_norm": 0.27021318674087524,
824
- "learning_rate": 0.00037782805429864254,
825
- "loss": 0.0781,
826
- "step": 27500
827
- },
828
- {
829
- "epoch": 6.22,
830
- "eval_loss": 0.06117743253707886,
831
- "eval_runtime": 27.2077,
832
- "eval_samples_per_second": 288.815,
833
- "eval_steps_per_second": 18.083,
834
- "step": 27500
835
- },
836
- {
837
- "epoch": 6.33,
838
- "grad_norm": 0.5952714681625366,
839
- "learning_rate": 0.0003665158371040724,
840
- "loss": 0.077,
841
- "step": 28000
842
- },
843
- {
844
- "epoch": 6.33,
845
- "eval_loss": 0.06172608584165573,
846
- "eval_runtime": 27.2143,
847
- "eval_samples_per_second": 288.745,
848
- "eval_steps_per_second": 18.079,
849
- "step": 28000
850
- },
851
- {
852
- "epoch": 6.45,
853
- "grad_norm": 0.11397124826908112,
854
- "learning_rate": 0.00035520361990950226,
855
- "loss": 0.0763,
856
- "step": 28500
857
- },
858
- {
859
- "epoch": 6.45,
860
- "eval_loss": 0.06007913500070572,
861
- "eval_runtime": 27.1971,
862
- "eval_samples_per_second": 288.928,
863
- "eval_steps_per_second": 18.09,
864
- "step": 28500
865
- },
866
- {
867
- "epoch": 6.56,
868
- "grad_norm": 0.18584699928760529,
869
- "learning_rate": 0.0003438914027149321,
870
- "loss": 0.0741,
871
- "step": 29000
872
- },
873
- {
874
- "epoch": 6.56,
875
- "eval_loss": 0.05769050493836403,
876
- "eval_runtime": 27.1856,
877
- "eval_samples_per_second": 289.05,
878
- "eval_steps_per_second": 18.098,
879
- "step": 29000
880
- },
881
- {
882
- "epoch": 6.67,
883
- "grad_norm": 0.26046234369277954,
884
- "learning_rate": 0.000332579185520362,
885
- "loss": 0.0746,
886
- "step": 29500
887
- },
888
- {
889
- "epoch": 6.67,
890
- "eval_loss": 0.05827530845999718,
891
- "eval_runtime": 27.1863,
892
- "eval_samples_per_second": 289.043,
893
- "eval_steps_per_second": 18.097,
894
- "step": 29500
895
- },
896
- {
897
- "epoch": 6.79,
898
- "grad_norm": 0.12222661823034286,
899
- "learning_rate": 0.0003212669683257919,
900
- "loss": 0.0735,
901
- "step": 30000
902
- },
903
- {
904
- "epoch": 6.79,
905
- "eval_loss": 0.05913107842206955,
906
- "eval_runtime": 27.1918,
907
- "eval_samples_per_second": 288.984,
908
- "eval_steps_per_second": 18.094,
909
- "step": 30000
910
- },
911
- {
912
- "epoch": 6.9,
913
- "grad_norm": 0.28610703349113464,
914
- "learning_rate": 0.0003099547511312217,
915
- "loss": 0.0726,
916
- "step": 30500
917
- },
918
- {
919
- "epoch": 6.9,
920
- "eval_loss": 0.05818793550133705,
921
- "eval_runtime": 27.2089,
922
- "eval_samples_per_second": 288.803,
923
- "eval_steps_per_second": 18.082,
924
- "step": 30500
925
- },
926
- {
927
- "epoch": 7.01,
928
- "grad_norm": 0.3682945966720581,
929
- "learning_rate": 0.00029864253393665157,
930
- "loss": 0.0741,
931
- "step": 31000
932
- },
933
- {
934
- "epoch": 7.01,
935
- "eval_loss": 0.05868174880743027,
936
- "eval_runtime": 27.1916,
937
- "eval_samples_per_second": 288.986,
938
- "eval_steps_per_second": 18.094,
939
- "step": 31000
940
- },
941
- {
942
- "epoch": 7.13,
943
- "grad_norm": 0.16477471590042114,
944
- "learning_rate": 0.00028733031674208143,
945
- "loss": 0.0715,
946
- "step": 31500
947
- },
948
- {
949
- "epoch": 7.13,
950
- "eval_loss": 0.05955735221505165,
951
- "eval_runtime": 27.1945,
952
- "eval_samples_per_second": 288.955,
953
- "eval_steps_per_second": 18.092,
954
- "step": 31500
955
- },
956
- {
957
- "epoch": 7.24,
958
- "grad_norm": 0.24769556522369385,
959
- "learning_rate": 0.00027601809954751135,
960
- "loss": 0.07,
961
- "step": 32000
962
- },
963
- {
964
- "epoch": 7.24,
965
- "eval_loss": 0.057150740176439285,
966
- "eval_runtime": 27.1825,
967
- "eval_samples_per_second": 289.083,
968
- "eval_steps_per_second": 18.1,
969
- "step": 32000
970
- },
971
- {
972
- "epoch": 7.35,
973
- "grad_norm": 0.3199273347854614,
974
- "learning_rate": 0.0002647058823529412,
975
- "loss": 0.0686,
976
- "step": 32500
977
- },
978
- {
979
- "epoch": 7.35,
980
- "eval_loss": 0.05786846950650215,
981
- "eval_runtime": 27.2001,
982
- "eval_samples_per_second": 288.896,
983
- "eval_steps_per_second": 18.088,
984
- "step": 32500
985
- },
986
- {
987
- "epoch": 7.47,
988
- "grad_norm": 0.3163066804409027,
989
- "learning_rate": 0.000253393665158371,
990
- "loss": 0.0703,
991
- "step": 33000
992
- },
993
- {
994
- "epoch": 7.47,
995
- "eval_loss": 0.05759541690349579,
996
- "eval_runtime": 27.1994,
997
- "eval_samples_per_second": 288.904,
998
- "eval_steps_per_second": 18.089,
999
- "step": 33000
1000
- },
1001
- {
1002
- "epoch": 7.58,
1003
- "grad_norm": 0.4390794336795807,
1004
- "learning_rate": 0.0002420814479638009,
1005
- "loss": 0.0694,
1006
- "step": 33500
1007
- },
1008
- {
1009
- "epoch": 7.58,
1010
- "eval_loss": 0.06044788658618927,
1011
- "eval_runtime": 27.2196,
1012
- "eval_samples_per_second": 288.689,
1013
- "eval_steps_per_second": 18.075,
1014
- "step": 33500
1015
- },
1016
- {
1017
- "epoch": 7.69,
1018
- "grad_norm": 0.19777078926563263,
1019
- "learning_rate": 0.0002307692307692308,
1020
- "loss": 0.0683,
1021
- "step": 34000
1022
- },
1023
- {
1024
- "epoch": 7.69,
1025
- "eval_loss": 0.05697755515575409,
1026
- "eval_runtime": 27.2282,
1027
- "eval_samples_per_second": 288.598,
1028
- "eval_steps_per_second": 18.069,
1029
- "step": 34000
1030
- },
1031
- {
1032
- "epoch": 7.81,
1033
- "grad_norm": 0.418797105550766,
1034
- "learning_rate": 0.00021945701357466062,
1035
- "loss": 0.0712,
1036
- "step": 34500
1037
- },
1038
- {
1039
- "epoch": 7.81,
1040
- "eval_loss": 0.05598929896950722,
1041
- "eval_runtime": 27.2203,
1042
- "eval_samples_per_second": 288.682,
1043
- "eval_steps_per_second": 18.075,
1044
- "step": 34500
1045
- },
1046
- {
1047
- "epoch": 7.92,
1048
- "grad_norm": 0.4459814727306366,
1049
- "learning_rate": 0.0002081447963800905,
1050
- "loss": 0.0672,
1051
- "step": 35000
1052
- },
1053
- {
1054
- "epoch": 7.92,
1055
- "eval_loss": 0.05849257484078407,
1056
- "eval_runtime": 27.2068,
1057
- "eval_samples_per_second": 288.825,
1058
- "eval_steps_per_second": 18.084,
1059
- "step": 35000
1060
- },
1061
- {
1062
- "epoch": 8.03,
1063
- "grad_norm": 0.2313721477985382,
1064
- "learning_rate": 0.00019683257918552037,
1065
- "loss": 0.0675,
1066
- "step": 35500
1067
- },
1068
- {
1069
- "epoch": 8.03,
1070
- "eval_loss": 0.05674152076244354,
1071
- "eval_runtime": 27.2055,
1072
- "eval_samples_per_second": 288.839,
1073
- "eval_steps_per_second": 18.085,
1074
- "step": 35500
1075
- },
1076
- {
1077
- "epoch": 8.14,
1078
- "grad_norm": 0.2439548671245575,
1079
- "learning_rate": 0.00018552036199095024,
1080
- "loss": 0.0651,
1081
- "step": 36000
1082
- },
1083
- {
1084
- "epoch": 8.14,
1085
- "eval_loss": 0.05658886954188347,
1086
- "eval_runtime": 27.2233,
1087
- "eval_samples_per_second": 288.65,
1088
- "eval_steps_per_second": 18.073,
1089
- "step": 36000
1090
- },
1091
- {
1092
- "epoch": 8.26,
1093
- "grad_norm": 0.3285837471485138,
1094
- "learning_rate": 0.0001742081447963801,
1095
- "loss": 0.0648,
1096
- "step": 36500
1097
- },
1098
- {
1099
- "epoch": 8.26,
1100
- "eval_loss": 0.05789176747202873,
1101
- "eval_runtime": 27.2295,
1102
- "eval_samples_per_second": 288.584,
1103
- "eval_steps_per_second": 18.069,
1104
- "step": 36500
1105
- },
1106
- {
1107
- "epoch": 8.37,
1108
- "grad_norm": 0.3167458772659302,
1109
- "learning_rate": 0.00016289592760180996,
1110
- "loss": 0.067,
1111
- "step": 37000
1112
- },
1113
- {
1114
- "epoch": 8.37,
1115
- "eval_loss": 0.05568605288863182,
1116
- "eval_runtime": 27.2118,
1117
- "eval_samples_per_second": 288.772,
1118
- "eval_steps_per_second": 18.08,
1119
- "step": 37000
1120
- },
1121
- {
1122
- "epoch": 8.48,
1123
- "grad_norm": 0.1530727595090866,
1124
- "learning_rate": 0.00015158371040723982,
1125
- "loss": 0.0651,
1126
- "step": 37500
1127
- },
1128
- {
1129
- "epoch": 8.48,
1130
- "eval_loss": 0.057902004569768906,
1131
- "eval_runtime": 27.219,
1132
- "eval_samples_per_second": 288.695,
1133
- "eval_steps_per_second": 18.076,
1134
- "step": 37500
1135
- },
1136
- {
1137
- "epoch": 8.6,
1138
- "grad_norm": 0.21044595539569855,
1139
- "learning_rate": 0.00014027149321266968,
1140
- "loss": 0.0666,
1141
- "step": 38000
1142
- },
1143
- {
1144
- "epoch": 8.6,
1145
- "eval_loss": 0.05458011105656624,
1146
- "eval_runtime": 27.2446,
1147
- "eval_samples_per_second": 288.424,
1148
- "eval_steps_per_second": 18.059,
1149
- "step": 38000
1150
- },
1151
- {
1152
- "epoch": 8.71,
1153
- "grad_norm": 0.23161017894744873,
1154
- "learning_rate": 0.00012895927601809957,
1155
- "loss": 0.0635,
1156
- "step": 38500
1157
- },
1158
- {
1159
- "epoch": 8.71,
1160
- "eval_loss": 0.056671272963285446,
1161
- "eval_runtime": 27.2141,
1162
- "eval_samples_per_second": 288.748,
1163
- "eval_steps_per_second": 18.079,
1164
- "step": 38500
1165
- },
1166
- {
1167
- "epoch": 8.82,
1168
- "grad_norm": 0.14228539168834686,
1169
- "learning_rate": 0.00011764705882352942,
1170
- "loss": 0.0622,
1171
- "step": 39000
1172
- },
1173
- {
1174
- "epoch": 8.82,
1175
- "eval_loss": 0.05409713461995125,
1176
- "eval_runtime": 27.229,
1177
- "eval_samples_per_second": 288.59,
1178
- "eval_steps_per_second": 18.069,
1179
- "step": 39000
1180
- },
1181
- {
1182
- "epoch": 8.94,
1183
- "grad_norm": 0.19111554324626923,
1184
- "learning_rate": 0.00010633484162895928,
1185
- "loss": 0.0645,
1186
- "step": 39500
1187
- },
1188
- {
1189
- "epoch": 8.94,
1190
- "eval_loss": 0.05430610105395317,
1191
- "eval_runtime": 27.2287,
1192
- "eval_samples_per_second": 288.592,
1193
- "eval_steps_per_second": 18.069,
1194
- "step": 39500
1195
- },
1196
- {
1197
- "epoch": 9.05,
1198
- "grad_norm": 0.1508806049823761,
1199
- "learning_rate": 9.502262443438914e-05,
1200
- "loss": 0.0631,
1201
- "step": 40000
1202
- },
1203
- {
1204
- "epoch": 9.05,
1205
- "eval_loss": 0.05481436848640442,
1206
- "eval_runtime": 27.2111,
1207
- "eval_samples_per_second": 288.78,
1208
- "eval_steps_per_second": 18.081,
1209
- "step": 40000
1210
- },
1211
- {
1212
- "epoch": 9.16,
1213
- "grad_norm": 0.26917019486427307,
1214
- "learning_rate": 8.3710407239819e-05,
1215
- "loss": 0.063,
1216
- "step": 40500
1217
- },
1218
- {
1219
- "epoch": 9.16,
1220
- "eval_loss": 0.056788042187690735,
1221
- "eval_runtime": 27.2329,
1222
- "eval_samples_per_second": 288.548,
1223
- "eval_steps_per_second": 18.066,
1224
- "step": 40500
1225
- },
1226
- {
1227
- "epoch": 9.28,
1228
- "grad_norm": 0.26919251680374146,
1229
- "learning_rate": 7.239819004524887e-05,
1230
- "loss": 0.0614,
1231
- "step": 41000
1232
- },
1233
- {
1234
- "epoch": 9.28,
1235
- "eval_loss": 0.056851934641599655,
1236
- "eval_runtime": 27.2442,
1237
- "eval_samples_per_second": 288.428,
1238
- "eval_steps_per_second": 18.059,
1239
- "step": 41000
1240
- },
1241
- {
1242
- "epoch": 9.39,
1243
- "grad_norm": 0.222616046667099,
1244
- "learning_rate": 6.108597285067873e-05,
1245
- "loss": 0.0588,
1246
- "step": 41500
1247
- },
1248
- {
1249
- "epoch": 9.39,
1250
- "eval_loss": 0.05487231910228729,
1251
- "eval_runtime": 27.23,
1252
- "eval_samples_per_second": 288.579,
1253
- "eval_steps_per_second": 18.068,
1254
- "step": 41500
1255
- },
1256
- {
1257
- "epoch": 9.5,
1258
- "grad_norm": 0.2073131799697876,
1259
- "learning_rate": 4.9773755656108595e-05,
1260
- "loss": 0.0616,
1261
- "step": 42000
1262
- },
1263
- {
1264
- "epoch": 9.5,
1265
- "eval_loss": 0.05528046563267708,
1266
- "eval_runtime": 27.2327,
1267
- "eval_samples_per_second": 288.55,
1268
- "eval_steps_per_second": 18.067,
1269
- "step": 42000
1270
- },
1271
- {
1272
- "epoch": 9.62,
1273
- "grad_norm": 0.19287574291229248,
1274
- "learning_rate": 3.846153846153846e-05,
1275
- "loss": 0.0609,
1276
- "step": 42500
1277
- },
1278
- {
1279
- "epoch": 9.62,
1280
- "eval_loss": 0.055462516844272614,
1281
- "eval_runtime": 27.2342,
1282
- "eval_samples_per_second": 288.535,
1283
- "eval_steps_per_second": 18.066,
1284
- "step": 42500
1285
- },
1286
- {
1287
- "epoch": 9.73,
1288
- "grad_norm": 0.11690975725650787,
1289
- "learning_rate": 2.7149321266968327e-05,
1290
- "loss": 0.0612,
1291
- "step": 43000
1292
- },
1293
- {
1294
- "epoch": 9.73,
1295
- "eval_loss": 0.055802907794713974,
1296
- "eval_runtime": 27.263,
1297
- "eval_samples_per_second": 288.23,
1298
- "eval_steps_per_second": 18.046,
1299
- "step": 43000
1300
- },
1301
- {
1302
- "epoch": 9.84,
1303
- "grad_norm": 0.19802606105804443,
1304
- "learning_rate": 1.583710407239819e-05,
1305
- "loss": 0.0588,
1306
- "step": 43500
1307
- },
1308
- {
1309
- "epoch": 9.84,
1310
- "eval_loss": 0.05586336553096771,
1311
- "eval_runtime": 27.2516,
1312
- "eval_samples_per_second": 288.35,
1313
- "eval_steps_per_second": 18.054,
1314
- "step": 43500
1315
- },
1316
- {
1317
- "epoch": 9.95,
1318
- "grad_norm": 0.29080289602279663,
1319
- "learning_rate": 4.5248868778280546e-06,
1320
- "loss": 0.0622,
1321
- "step": 44000
1322
- },
1323
- {
1324
- "epoch": 9.95,
1325
- "eval_loss": 0.05555348098278046,
1326
- "eval_runtime": 27.2559,
1327
- "eval_samples_per_second": 288.305,
1328
- "eval_steps_per_second": 18.051,
1329
- "step": 44000
1330
- },
1331
- {
1332
- "epoch": 10.0,
1333
- "step": 44200,
1334
- "total_flos": 8.389179359649792e+16,
1335
- "train_loss": 0.09247852528257068,
1336
- "train_runtime": 8790.0948,
1337
- "train_samples_per_second": 80.453,
1338
- "train_steps_per_second": 5.028
1339
  }
1340
  ],
1341
  "logging_steps": 500,
1342
- "max_steps": 44200,
1343
  "num_input_tokens_seen": 0,
1344
- "num_train_epochs": 10,
1345
  "save_steps": 500,
1346
- "total_flos": 8.389179359649792e+16,
1347
  "train_batch_size": 16,
1348
  "trial_name": null,
1349
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.0011312217194570137,
5
  "eval_steps": 500,
6
+ "global_step": 5,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.0,
13
+ "step": 5,
14
+ "total_flos": 2585060966400.0,
15
+ "train_loss": 2.5696409225463865,
16
+ "train_runtime": 1.0509,
17
+ "train_samples_per_second": 67.292,
18
+ "train_steps_per_second": 4.758
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  }
20
  ],
21
  "logging_steps": 500,
22
+ "max_steps": 5,
23
  "num_input_tokens_seen": 0,
24
+ "num_train_epochs": 1,
25
  "save_steps": 500,
26
+ "total_flos": 2585060966400.0,
27
  "train_batch_size": 16,
28
  "trial_name": null,
29
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:234fbc7bf0d159d3d95c13453a8fd74105a86470e2dc26447e416696e864f884
3
  size 5048
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5964aa42a58f9aa0442e4ab87d38a471e1527ac951e91bea56c4d8f43acfd26
3
  size 5048