yaful commited on
Commit
61d3aab
·
1 Parent(s): 0e4f5ea

update model

Browse files
.gitattributes CHANGED
@@ -32,3 +32,16 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ optimizer.pt filter=lfs diff=lfs merge=lfs -text
36
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
37
+ scaler.pt filter=lfs diff=lfs merge=lfs -text
38
+ scheduler.pt filter=lfs diff=lfs merge=lfs -text
39
+ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
40
+ training_args.bin filter=lfs diff=lfs merge=lfs -text
41
+ config.json filter=lfs diff=lfs merge=lfs -text
42
+ rng_state.pth filter=lfs diff=lfs merge=lfs -text
43
+ tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
44
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
45
+ trainer_state.json filter=lfs diff=lfs merge=lfs -text
46
+ vocab.json filter=lfs diff=lfs merge=lfs -text
47
+ merges.txt filter=lfs diff=lfs merge=lfs -text
README.md DELETED
@@ -1,35 +0,0 @@
1
- This is a deepfake text detector (Longformer) trained using the testbeds in Github project 🏃 [Deepfake Text Detection in the Wild](https://github.com/yafuly/DeepfakeTextDetect).
2
- Here is a simple example of how to use the detector.
3
- However, we recommend utilizing the full detection pipeline available on our Github, which includes text preprocessing.
4
-
5
-
6
- ```python
7
- import torch
8
- import os
9
- from transformers import AutoModelForSequenceClassification,AutoTokenizer
10
-
11
-
12
- device = 'cuda:0'
13
- model_dir = "nealcly/detection-longformer"
14
- tokenizer = AutoTokenizer.from_pretrained(model_dir)
15
- model = AutoModelForSequenceClassification.from_pretrained(model_dir).to(device)
16
-
17
- label2decisions = {
18
- 0: "machine-generated",
19
- 1: "human-written",
20
- }
21
- def detect(input_text,th=-3.08583984375):
22
- tokenize_input = tokenizer(input_text)
23
- tensor_input = torch.tensor([tokenize_input["input_ids"]]).to(device)
24
- outputs = model(tensor_input)
25
- is_machine = -outputs.logits[0][0].item()
26
- if is_machine < th:
27
- decision = 0
28
- else:
29
- decision = 1
30
- print(f"The text is {label2decisions[decision]}.")
31
-
32
- input_text = "Researchers at Stanford University and the SLAC National Accelerator Laboratory have discovered a way to transform a substance found in fossil fuels into diamonds with pressure and low heat. Diamond synthesis usually requires a large amount of energy, time, or the addition of a catalyst, which adds impurities. Diamondoids are tiny, odorless, and slightly sticky powders that resemble rock salt. They are made up of atoms arranged in the same pattern as diamonds, but they contain hydrogen. Diamondoids can reorganize into diamonds with surprisingly little energy, without passing through other forms of carbon, such as graphite. The method is currently only able to make specks of diamonds, and it is impractical until larger crystals can be formed." # human-written
33
- input_text = "Reddit Talk is a new social audio product that allows subreddit moderators to start Clubhouse-like Talks. While moderators will have control over who can speak in the sessions, anybody on Reddit or Discord can join and listen in. It's like an open mic with your own personal mods in charge of taking care of everything else (like banning trolls). The idea is to create more friendly and interactive conversations among users rather than just endless battles between assholes. There are even 'subreddits for each type of topic moderated by their in context moderation team members."" The current moderation was created very quickly as popularity spiked within days after Reddit acquired it back in February 2019. We think this could be a great way to keep discussions active without having someone run them off into the abyss." # machine-generated
34
- detect(input_text)
35
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,53 +1,3 @@
1
- {
2
- "_name_or_path": "/apdcephfs/share_916081/effidit_shared_data/yafuli/Codes/LLM-results/classfication/models/longformer_base_4096",
3
- "architectures": [
4
- "LongformerForSequenceClassification"
5
- ],
6
- "attention_mode": "longformer",
7
- "attention_probs_dropout_prob": 0.1,
8
- "attention_window": [
9
- 512,
10
- 512,
11
- 512,
12
- 512,
13
- 512,
14
- 512,
15
- 512,
16
- 512,
17
- 512,
18
- 512,
19
- 512,
20
- 512
21
- ],
22
- "bos_token_id": 0,
23
- "eos_token_id": 2,
24
- "gradient_checkpointing": false,
25
- "hidden_act": "gelu",
26
- "hidden_dropout_prob": 0.1,
27
- "hidden_size": 768,
28
- "id2label": {
29
- "0": 0,
30
- "1": 1
31
- },
32
- "ignore_attention_mask": false,
33
- "initializer_range": 0.02,
34
- "intermediate_size": 3072,
35
- "label2id": {
36
- "0": 0,
37
- "1": 1
38
- },
39
- "layer_norm_eps": 1e-05,
40
- "max_position_embeddings": 4098,
41
- "model_type": "longformer",
42
- "num_attention_heads": 12,
43
- "num_hidden_layers": 12,
44
- "onnx_export": false,
45
- "pad_token_id": 1,
46
- "position_embedding_type": "absolute",
47
- "problem_type": "single_label_classification",
48
- "sep_token_id": 2,
49
- "torch_dtype": "float32",
50
- "transformers_version": "4.27.4",
51
- "type_vocab_size": 1,
52
- "vocab_size": 50265
53
- }
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce51c2bbfd353433e384ea042883ef20a67c458aa3f8d88b95fc582728382328
3
+ size 1161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
merges.txt CHANGED
The diff for this file is too large to render. See raw diff
 
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f6b9b31fc7deb68db3738f6a96fe2b0adc075b9e0592418ce49056340664b658
3
  size 1189446589
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82e8a624dc1273ba5eb689856ab591869eadd100edbd884ec5d4b7026fdebdc1
3
  size 1189446589
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a8853ee3563ba5d6f251dba1421a028dd2251ff80c0ba110650dfed5a53766e
3
  size 594737055
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99b8df5eff67a96c9cd47c53bd914d31dd80bbe9a3c3b8b5fb101c60458a03bf
3
  size 594737055
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bae425a9a22152b4243f1294e3c4f6f1ac182f90d2c74645f2626619079787a7
3
  size 21579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:525961baca9138129946e3341b7059e0fb39e609f2aa53c27e1291a8e85cb774
3
  size 21579
scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4eb96d3c159ec49bfc76551332a7f52392c99a69b649688e4dc3f1979f6d527b
3
  size 559
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a7b2e65afa3c51b1ad587b8bc79d39d4d0b86949cde1725008fb133af677ff8
3
  size 559
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5d95c1e1c6ec2c56daac9bd96a1fee70862322dcd845123edfab43941f796769
3
  size 623
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecdb69babb24440944c1b22dc3e00801f9b10a77365526be86a60fd6738e17d4
3
  size 623
special_tokens_map.json CHANGED
@@ -1,15 +1,3 @@
1
- {
2
- "bos_token": "<s>",
3
- "cls_token": "<s>",
4
- "eos_token": "</s>",
5
- "mask_token": {
6
- "content": "<mask>",
7
- "lstrip": true,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false
11
- },
12
- "pad_token": "<pad>",
13
- "sep_token": "</s>",
14
- "unk_token": "<unk>"
15
- }
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06e405a36dfe4b9604f484f6a1e619af1a7f7d09e34a8555eb0b77b66318067f
3
+ size 280
 
 
 
 
 
 
 
 
 
 
 
 
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,15 +1,3 @@
1
- {
2
- "add_prefix_space": false,
3
- "bos_token": "<s>",
4
- "cls_token": "<s>",
5
- "eos_token": "</s>",
6
- "errors": "replace",
7
- "mask_token": "<mask>",
8
- "model_max_length": 1000000000000000019884624838656,
9
- "pad_token": "<pad>",
10
- "sep_token": "</s>",
11
- "special_tokens_map_file": null,
12
- "tokenizer_class": "LongformerTokenizer",
13
- "trim_offsets": true,
14
- "unk_token": "<unk>"
15
- }
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e667b8a2c557b7012bfdec7dd4cbb6066cf6ec9069f0515db0c4fc8cdb162c0f
3
+ size 377
 
 
 
 
 
 
 
 
 
 
 
 
trainer_state.json CHANGED
@@ -1,478 +1,3 @@
1
- {
2
- "best_metric": 0.262811541557312,
3
- "best_model_checkpoint": "./output_samples_202302241350_lfbase/checkpoint-6000",
4
- "epoch": 4.245465071401004,
5
- "global_step": 22000,
6
- "is_hyper_param_search": false,
7
- "is_local_process_zero": true,
8
- "is_world_process_zero": true,
9
- "log_history": [
10
- {
11
- "epoch": 0.1,
12
- "learning_rate": 2.94245465071401e-05,
13
- "loss": 0.2869,
14
- "step": 500
15
- },
16
- {
17
- "epoch": 0.19,
18
- "learning_rate": 2.8845619451949058e-05,
19
- "loss": 0.1797,
20
- "step": 1000
21
- },
22
- {
23
- "epoch": 0.19,
24
- "eval_accuracy": 0.8138957619667053,
25
- "eval_loss": 0.49187058210372925,
26
- "eval_runtime": 490.7956,
27
- "eval_samples_per_second": 119.795,
28
- "eval_steps_per_second": 1.872,
29
- "step": 1000
30
- },
31
- {
32
- "epoch": 0.29,
33
- "learning_rate": 2.826669239675801e-05,
34
- "loss": 0.1589,
35
- "step": 1500
36
- },
37
- {
38
- "epoch": 0.39,
39
- "learning_rate": 2.7687765341566965e-05,
40
- "loss": 0.1369,
41
- "step": 2000
42
- },
43
- {
44
- "epoch": 0.39,
45
- "eval_accuracy": 0.8265668749809265,
46
- "eval_loss": 0.4808556139469147,
47
- "eval_runtime": 472.7272,
48
- "eval_samples_per_second": 124.374,
49
- "eval_steps_per_second": 1.944,
50
- "step": 2000
51
- },
52
- {
53
- "epoch": 0.48,
54
- "learning_rate": 2.71099961404863e-05,
55
- "loss": 0.128,
56
- "step": 2500
57
- },
58
- {
59
- "epoch": 0.58,
60
- "learning_rate": 2.6531069085295255e-05,
61
- "loss": 0.1274,
62
- "step": 3000
63
- },
64
- {
65
- "epoch": 0.58,
66
- "eval_accuracy": 0.8054426312446594,
67
- "eval_loss": 0.6082525253295898,
68
- "eval_runtime": 485.5844,
69
- "eval_samples_per_second": 121.081,
70
- "eval_steps_per_second": 1.893,
71
- "step": 3000
72
- },
73
- {
74
- "epoch": 0.68,
75
- "learning_rate": 2.5953299884214592e-05,
76
- "loss": 0.1161,
77
- "step": 3500
78
- },
79
- {
80
- "epoch": 0.77,
81
- "learning_rate": 2.5374372829023545e-05,
82
- "loss": 0.1109,
83
- "step": 4000
84
- },
85
- {
86
- "epoch": 0.77,
87
- "eval_accuracy": 0.8235734105110168,
88
- "eval_loss": 0.510552704334259,
89
- "eval_runtime": 488.158,
90
- "eval_samples_per_second": 120.443,
91
- "eval_steps_per_second": 1.883,
92
- "step": 4000
93
- },
94
- {
95
- "epoch": 0.87,
96
- "learning_rate": 2.47954457738325e-05,
97
- "loss": 0.1078,
98
- "step": 4500
99
- },
100
- {
101
- "epoch": 0.96,
102
- "learning_rate": 2.4216518718641452e-05,
103
- "loss": 0.1036,
104
- "step": 5000
105
- },
106
- {
107
- "epoch": 0.96,
108
- "eval_accuracy": 0.8834084272384644,
109
- "eval_loss": 0.312045156955719,
110
- "eval_runtime": 486.8674,
111
- "eval_samples_per_second": 120.762,
112
- "eval_steps_per_second": 1.888,
113
- "step": 5000
114
- },
115
- {
116
- "epoch": 1.06,
117
- "learning_rate": 2.3637591663450405e-05,
118
- "loss": 0.0823,
119
- "step": 5500
120
- },
121
- {
122
- "epoch": 1.16,
123
- "learning_rate": 2.305866460825936e-05,
124
- "loss": 0.0718,
125
- "step": 6000
126
- },
127
- {
128
- "epoch": 1.16,
129
- "eval_accuracy": 0.9076281785964966,
130
- "eval_loss": 0.262811541557312,
131
- "eval_runtime": 488.7776,
132
- "eval_samples_per_second": 120.29,
133
- "eval_steps_per_second": 1.88,
134
- "step": 6000
135
- },
136
- {
137
- "epoch": 1.25,
138
- "learning_rate": 2.2479737553068312e-05,
139
- "loss": 0.0714,
140
- "step": 6500
141
- },
142
- {
143
- "epoch": 1.35,
144
- "learning_rate": 2.190081049787727e-05,
145
- "loss": 0.0729,
146
- "step": 7000
147
- },
148
- {
149
- "epoch": 1.35,
150
- "eval_accuracy": 0.8230121731758118,
151
- "eval_loss": 0.6299644112586975,
152
- "eval_runtime": 480.4556,
153
- "eval_samples_per_second": 122.373,
154
- "eval_steps_per_second": 1.913,
155
- "step": 7000
156
- },
157
- {
158
- "epoch": 1.45,
159
- "learning_rate": 2.1321883442686222e-05,
160
- "loss": 0.0687,
161
- "step": 7500
162
- },
163
- {
164
- "epoch": 1.54,
165
- "learning_rate": 2.0742956387495175e-05,
166
- "loss": 0.0664,
167
- "step": 8000
168
- },
169
- {
170
- "epoch": 1.54,
171
- "eval_accuracy": 0.8701080083847046,
172
- "eval_loss": 0.5767059922218323,
173
- "eval_runtime": 472.3986,
174
- "eval_samples_per_second": 124.461,
175
- "eval_steps_per_second": 1.945,
176
- "step": 8000
177
- },
178
- {
179
- "epoch": 1.64,
180
- "learning_rate": 2.0164029332304132e-05,
181
- "loss": 0.0665,
182
- "step": 8500
183
- },
184
- {
185
- "epoch": 1.74,
186
- "learning_rate": 1.9585102277113085e-05,
187
- "loss": 0.0622,
188
- "step": 9000
189
- },
190
- {
191
- "epoch": 1.74,
192
- "eval_accuracy": 0.8938174843788147,
193
- "eval_loss": 0.46610942482948303,
194
- "eval_runtime": 472.4306,
195
- "eval_samples_per_second": 124.452,
196
- "eval_steps_per_second": 1.945,
197
- "step": 9000
198
- },
199
- {
200
- "epoch": 1.83,
201
- "learning_rate": 1.9007333076032422e-05,
202
- "loss": 0.0639,
203
- "step": 9500
204
- },
205
- {
206
- "epoch": 1.93,
207
- "learning_rate": 1.8428406020841375e-05,
208
- "loss": 0.0638,
209
- "step": 10000
210
- },
211
- {
212
- "epoch": 1.93,
213
- "eval_accuracy": 0.913428008556366,
214
- "eval_loss": 0.27155911922454834,
215
- "eval_runtime": 484.5228,
216
- "eval_samples_per_second": 121.346,
217
- "eval_steps_per_second": 1.897,
218
- "step": 10000
219
- },
220
- {
221
- "epoch": 2.03,
222
- "learning_rate": 1.784947896565033e-05,
223
- "loss": 0.0564,
224
- "step": 10500
225
- },
226
- {
227
- "epoch": 2.12,
228
- "learning_rate": 1.7270551910459282e-05,
229
- "loss": 0.0385,
230
- "step": 11000
231
- },
232
- {
233
- "epoch": 2.12,
234
- "eval_accuracy": 0.9116081595420837,
235
- "eval_loss": 0.38816747069358826,
236
- "eval_runtime": 488.2826,
237
- "eval_samples_per_second": 120.412,
238
- "eval_steps_per_second": 1.882,
239
- "step": 11000
240
- },
241
- {
242
- "epoch": 2.22,
243
- "learning_rate": 1.669162485526824e-05,
244
- "loss": 0.0412,
245
- "step": 11500
246
- },
247
- {
248
- "epoch": 2.32,
249
- "learning_rate": 1.6113855654187572e-05,
250
- "loss": 0.0406,
251
- "step": 12000
252
- },
253
- {
254
- "epoch": 2.32,
255
- "eval_accuracy": 0.863440752029419,
256
- "eval_loss": 0.6258434653282166,
257
- "eval_runtime": 473.3305,
258
- "eval_samples_per_second": 124.216,
259
- "eval_steps_per_second": 1.942,
260
- "step": 12000
261
- },
262
- {
263
- "epoch": 2.41,
264
- "learning_rate": 1.553492859899653e-05,
265
- "loss": 0.0391,
266
- "step": 12500
267
- },
268
- {
269
- "epoch": 2.51,
270
- "learning_rate": 1.495600154380548e-05,
271
- "loss": 0.037,
272
- "step": 13000
273
- },
274
- {
275
- "epoch": 2.51,
276
- "eval_accuracy": 0.891011118888855,
277
- "eval_loss": 0.48729509115219116,
278
- "eval_runtime": 473.8644,
279
- "eval_samples_per_second": 124.076,
280
- "eval_steps_per_second": 1.939,
281
- "step": 13000
282
- },
283
- {
284
- "epoch": 2.61,
285
- "learning_rate": 1.4377074488614436e-05,
286
- "loss": 0.04,
287
- "step": 13500
288
- },
289
- {
290
- "epoch": 2.7,
291
- "learning_rate": 1.3798147433423389e-05,
292
- "loss": 0.0382,
293
- "step": 14000
294
- },
295
- {
296
- "epoch": 2.7,
297
- "eval_accuracy": 0.8833404183387756,
298
- "eval_loss": 0.5917666554450989,
299
- "eval_runtime": 473.8023,
300
- "eval_samples_per_second": 124.092,
301
- "eval_steps_per_second": 1.94,
302
- "step": 14000
303
- },
304
- {
305
- "epoch": 2.8,
306
- "learning_rate": 1.3220378232342726e-05,
307
- "loss": 0.0392,
308
- "step": 14500
309
- },
310
- {
311
- "epoch": 2.89,
312
- "learning_rate": 1.2642609031262061e-05,
313
- "loss": 0.0367,
314
- "step": 15000
315
- },
316
- {
317
- "epoch": 2.89,
318
- "eval_accuracy": 0.8793604969978333,
319
- "eval_loss": 0.569683313369751,
320
- "eval_runtime": 474.4494,
321
- "eval_samples_per_second": 123.923,
322
- "eval_steps_per_second": 1.937,
323
- "step": 15000
324
- },
325
- {
326
- "epoch": 2.99,
327
- "learning_rate": 1.2063681976071016e-05,
328
- "loss": 0.0358,
329
- "step": 15500
330
- },
331
- {
332
- "epoch": 3.09,
333
- "learning_rate": 1.148475492087997e-05,
334
- "loss": 0.0216,
335
- "step": 16000
336
- },
337
- {
338
- "epoch": 3.09,
339
- "eval_accuracy": 0.8624373078346252,
340
- "eval_loss": 0.8281469345092773,
341
- "eval_runtime": 473.8459,
342
- "eval_samples_per_second": 124.08,
343
- "eval_steps_per_second": 1.939,
344
- "step": 16000
345
- },
346
- {
347
- "epoch": 3.18,
348
- "learning_rate": 1.0905827865688923e-05,
349
- "loss": 0.0211,
350
- "step": 16500
351
- },
352
- {
353
- "epoch": 3.28,
354
- "learning_rate": 1.032805866460826e-05,
355
- "loss": 0.0227,
356
- "step": 17000
357
- },
358
- {
359
- "epoch": 3.28,
360
- "eval_accuracy": 0.8539331555366516,
361
- "eval_loss": 0.9032775163650513,
362
- "eval_runtime": 474.5611,
363
- "eval_samples_per_second": 123.893,
364
- "eval_steps_per_second": 1.937,
365
- "step": 17000
366
- },
367
- {
368
- "epoch": 3.38,
369
- "learning_rate": 9.749131609417213e-06,
370
- "loss": 0.0225,
371
- "step": 17500
372
- },
373
- {
374
- "epoch": 3.47,
375
- "learning_rate": 9.170204554226166e-06,
376
- "loss": 0.0216,
377
- "step": 18000
378
- },
379
- {
380
- "epoch": 3.47,
381
- "eval_accuracy": 0.888221800327301,
382
- "eval_loss": 0.6849754452705383,
383
- "eval_runtime": 487.296,
384
- "eval_samples_per_second": 120.656,
385
- "eval_steps_per_second": 1.886,
386
- "step": 18000
387
- },
388
- {
389
- "epoch": 3.57,
390
- "learning_rate": 8.591277499035123e-06,
391
- "loss": 0.0217,
392
- "step": 18500
393
- },
394
- {
395
- "epoch": 3.67,
396
- "learning_rate": 8.012350443844076e-06,
397
- "loss": 0.0199,
398
- "step": 19000
399
- },
400
- {
401
- "epoch": 3.67,
402
- "eval_accuracy": 0.8972702026367188,
403
- "eval_loss": 0.6052428483963013,
404
- "eval_runtime": 490.4243,
405
- "eval_samples_per_second": 119.886,
406
- "eval_steps_per_second": 1.874,
407
- "step": 19000
408
- },
409
- {
410
- "epoch": 3.76,
411
- "learning_rate": 7.43342338865303e-06,
412
- "loss": 0.0207,
413
- "step": 19500
414
- },
415
- {
416
- "epoch": 3.86,
417
- "learning_rate": 6.855654187572366e-06,
418
- "loss": 0.0206,
419
- "step": 20000
420
- },
421
- {
422
- "epoch": 3.86,
423
- "eval_accuracy": 0.8873884081840515,
424
- "eval_loss": 0.6865639090538025,
425
- "eval_runtime": 487.3484,
426
- "eval_samples_per_second": 120.643,
427
- "eval_steps_per_second": 1.886,
428
- "step": 20000
429
- },
430
- {
431
- "epoch": 3.96,
432
- "learning_rate": 6.276727132381321e-06,
433
- "loss": 0.0187,
434
- "step": 20500
435
- },
436
- {
437
- "epoch": 4.05,
438
- "learning_rate": 5.697800077190274e-06,
439
- "loss": 0.0148,
440
- "step": 21000
441
- },
442
- {
443
- "epoch": 4.05,
444
- "eval_accuracy": 0.8811633586883545,
445
- "eval_loss": 0.9089483618736267,
446
- "eval_runtime": 487.5135,
447
- "eval_samples_per_second": 120.602,
448
- "eval_steps_per_second": 1.885,
449
- "step": 21000
450
- },
451
- {
452
- "epoch": 4.15,
453
- "learning_rate": 5.118873021999228e-06,
454
- "loss": 0.0107,
455
- "step": 21500
456
- },
457
- {
458
- "epoch": 4.25,
459
- "learning_rate": 4.5399459668081825e-06,
460
- "loss": 0.0121,
461
- "step": 22000
462
- },
463
- {
464
- "epoch": 4.25,
465
- "eval_accuracy": 0.9049409031867981,
466
- "eval_loss": 0.641972541809082,
467
- "eval_runtime": 487.5431,
468
- "eval_samples_per_second": 120.594,
469
- "eval_steps_per_second": 1.885,
470
- "step": 22000
471
- }
472
- ],
473
- "max_steps": 25910,
474
- "num_train_epochs": 5,
475
- "total_flos": 1.8496468450847293e+18,
476
- "trial_name": null,
477
- "trial_params": null
478
- }
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff464999e57de1ac0bd298f3fa3ce56feaa891ca78b348e781a8786778d55fe3
3
+ size 3362
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vocab.json CHANGED
The diff for this file is too large to render. See raw diff