sanchit-gandhi HF staff commited on
Commit
e8e564f
β€’
1 Parent(s): 8507953

Model save

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ wandb/run-20240122_192258-yf2elmz6/run-yf2elmz6.wandb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: openai/whisper-large-v3
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - common_voice_16_0
8
+ metrics:
9
+ - wer
10
+ model-index:
11
+ - name: openai/whisper-large-v3
12
+ results:
13
+ - task:
14
+ name: Automatic Speech Recognition
15
+ type: automatic-speech-recognition
16
+ dataset:
17
+ name: common_voice_16_0
18
+ type: common_voice_16_0
19
+ config: mn
20
+ split: test
21
+ args: mn
22
+ metrics:
23
+ - name: Wer
24
+ type: wer
25
+ value: 41.048913043478265
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # openai/whisper-large-v3
32
+
33
+ This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_16_0 dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.5425
36
+ - Wer: 41.0489
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 0.0001
56
+ - train_batch_size: 32
57
+ - eval_batch_size: 32
58
+ - seed: 42
59
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
60
+ - lr_scheduler_type: linear
61
+ - lr_scheduler_warmup_steps: 500
62
+ - num_epochs: 10.0
63
+ - mixed_precision_training: Native AMP
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
68
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|
69
+ | 0.1378 | 4.35 | 500 | 0.5576 | 51.2554 |
70
+ | 0.0024 | 8.7 | 1000 | 0.5425 | 41.0489 |
71
+
72
+
73
+ ### Framework versions
74
+
75
+ - Transformers 4.38.0.dev0
76
+ - Pytorch 2.1.2+cu121
77
+ - Datasets 2.16.2.dev0
78
+ - Tokenizers 0.15.0
generation_config.json ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 7,
5
+ 0
6
+ ],
7
+ [
8
+ 10,
9
+ 17
10
+ ],
11
+ [
12
+ 12,
13
+ 18
14
+ ],
15
+ [
16
+ 13,
17
+ 12
18
+ ],
19
+ [
20
+ 16,
21
+ 1
22
+ ],
23
+ [
24
+ 17,
25
+ 14
26
+ ],
27
+ [
28
+ 19,
29
+ 11
30
+ ],
31
+ [
32
+ 21,
33
+ 4
34
+ ],
35
+ [
36
+ 24,
37
+ 1
38
+ ],
39
+ [
40
+ 25,
41
+ 6
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ 50259
55
+ ],
56
+ [
57
+ 2,
58
+ 50360
59
+ ],
60
+ [
61
+ 3,
62
+ 50364
63
+ ]
64
+ ],
65
+ "is_multilingual": true,
66
+ "lang_to_id": {
67
+ "<|af|>": 50327,
68
+ "<|am|>": 50334,
69
+ "<|ar|>": 50272,
70
+ "<|as|>": 50350,
71
+ "<|az|>": 50304,
72
+ "<|ba|>": 50355,
73
+ "<|be|>": 50330,
74
+ "<|bg|>": 50292,
75
+ "<|bn|>": 50302,
76
+ "<|bo|>": 50347,
77
+ "<|br|>": 50309,
78
+ "<|bs|>": 50315,
79
+ "<|ca|>": 50270,
80
+ "<|cs|>": 50283,
81
+ "<|cy|>": 50297,
82
+ "<|da|>": 50285,
83
+ "<|de|>": 50261,
84
+ "<|el|>": 50281,
85
+ "<|en|>": 50259,
86
+ "<|es|>": 50262,
87
+ "<|et|>": 50307,
88
+ "<|eu|>": 50310,
89
+ "<|fa|>": 50300,
90
+ "<|fi|>": 50277,
91
+ "<|fo|>": 50338,
92
+ "<|fr|>": 50265,
93
+ "<|gl|>": 50319,
94
+ "<|gu|>": 50333,
95
+ "<|haw|>": 50352,
96
+ "<|ha|>": 50354,
97
+ "<|he|>": 50279,
98
+ "<|hi|>": 50276,
99
+ "<|hr|>": 50291,
100
+ "<|ht|>": 50339,
101
+ "<|hu|>": 50286,
102
+ "<|hy|>": 50312,
103
+ "<|id|>": 50275,
104
+ "<|is|>": 50311,
105
+ "<|it|>": 50274,
106
+ "<|ja|>": 50266,
107
+ "<|jw|>": 50356,
108
+ "<|ka|>": 50329,
109
+ "<|kk|>": 50316,
110
+ "<|km|>": 50323,
111
+ "<|kn|>": 50306,
112
+ "<|ko|>": 50264,
113
+ "<|la|>": 50294,
114
+ "<|lb|>": 50345,
115
+ "<|ln|>": 50353,
116
+ "<|lo|>": 50336,
117
+ "<|lt|>": 50293,
118
+ "<|lv|>": 50301,
119
+ "<|mg|>": 50349,
120
+ "<|mi|>": 50295,
121
+ "<|mk|>": 50308,
122
+ "<|ml|>": 50296,
123
+ "<|mn|>": 50314,
124
+ "<|mr|>": 50320,
125
+ "<|ms|>": 50282,
126
+ "<|mt|>": 50343,
127
+ "<|my|>": 50346,
128
+ "<|ne|>": 50313,
129
+ "<|nl|>": 50271,
130
+ "<|nn|>": 50342,
131
+ "<|no|>": 50288,
132
+ "<|oc|>": 50328,
133
+ "<|pa|>": 50321,
134
+ "<|pl|>": 50269,
135
+ "<|ps|>": 50340,
136
+ "<|pt|>": 50267,
137
+ "<|ro|>": 50284,
138
+ "<|ru|>": 50263,
139
+ "<|sa|>": 50344,
140
+ "<|sd|>": 50332,
141
+ "<|si|>": 50322,
142
+ "<|sk|>": 50298,
143
+ "<|sl|>": 50305,
144
+ "<|sn|>": 50324,
145
+ "<|so|>": 50326,
146
+ "<|sq|>": 50317,
147
+ "<|sr|>": 50303,
148
+ "<|su|>": 50357,
149
+ "<|sv|>": 50273,
150
+ "<|sw|>": 50318,
151
+ "<|ta|>": 50287,
152
+ "<|te|>": 50299,
153
+ "<|tg|>": 50331,
154
+ "<|th|>": 50289,
155
+ "<|tk|>": 50341,
156
+ "<|tl|>": 50348,
157
+ "<|tr|>": 50268,
158
+ "<|tt|>": 50351,
159
+ "<|uk|>": 50280,
160
+ "<|ur|>": 50290,
161
+ "<|uz|>": 50337,
162
+ "<|vi|>": 50278,
163
+ "<|yi|>": 50335,
164
+ "<|yo|>": 50325,
165
+ "<|yue|>": 50358,
166
+ "<|zh|>": 50260
167
+ },
168
+ "max_initial_timestamp_index": 50,
169
+ "max_length": 448,
170
+ "no_timestamps_token_id": 50364,
171
+ "pad_token_id": 50257,
172
+ "prev_sot_token_id": 50362,
173
+ "return_timestamps": false,
174
+ "suppress_tokens": [
175
+ 1,
176
+ 2,
177
+ 7,
178
+ 8,
179
+ 9,
180
+ 10,
181
+ 14,
182
+ 25,
183
+ 26,
184
+ 27,
185
+ 28,
186
+ 29,
187
+ 31,
188
+ 58,
189
+ 59,
190
+ 60,
191
+ 61,
192
+ 62,
193
+ 63,
194
+ 90,
195
+ 91,
196
+ 92,
197
+ 93,
198
+ 359,
199
+ 503,
200
+ 522,
201
+ 542,
202
+ 873,
203
+ 893,
204
+ 902,
205
+ 918,
206
+ 922,
207
+ 931,
208
+ 1350,
209
+ 1853,
210
+ 1982,
211
+ 2460,
212
+ 2627,
213
+ 3246,
214
+ 3253,
215
+ 3268,
216
+ 3536,
217
+ 3846,
218
+ 3961,
219
+ 4183,
220
+ 4667,
221
+ 6585,
222
+ 6647,
223
+ 7273,
224
+ 9061,
225
+ 9383,
226
+ 10428,
227
+ 10929,
228
+ 11938,
229
+ 12033,
230
+ 12331,
231
+ 12562,
232
+ 13793,
233
+ 14157,
234
+ 14635,
235
+ 15265,
236
+ 15618,
237
+ 16553,
238
+ 16604,
239
+ 18362,
240
+ 18956,
241
+ 20075,
242
+ 21675,
243
+ 22520,
244
+ 26130,
245
+ 26161,
246
+ 26435,
247
+ 28279,
248
+ 29464,
249
+ 31650,
250
+ 32302,
251
+ 32470,
252
+ 36865,
253
+ 42863,
254
+ 47425,
255
+ 49870,
256
+ 50254,
257
+ 50258,
258
+ 50359,
259
+ 50360,
260
+ 50361,
261
+ 50362,
262
+ 50363
263
+ ],
264
+ "task_to_id": {
265
+ "transcribe": 50360,
266
+ "translate": 50359
267
+ },
268
+ "transformers_version": "4.38.0.dev0"
269
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:935d3be09e427feb5bfadd9b6ffca450f35bf726852737a9d17d29a53b110be0
3
+ size 4993448880
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c7f273083f1c88a66832619a4bbe6e5d96347b16fc1c37f58c04df43c4962fb
3
+ size 1180663192
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
runs/Jan22_19-21-13_hf-dgx-01/events.out.tfevents.1705947776.hf-dgx-01.122884.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eb9cbc3107fcf33514115ff94fb983ab240b1e6f96b39a69d8f00b6408d48f1b
3
- size 12038
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8e83828e291d4e7f702d494cb64c2051f06c34477d5fa69e7ab4856e6e7624d
3
+ size 13334
wandb/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
wandb/run-20240122_192258-yf2elmz6/files/config.yaml CHANGED
@@ -84,6 +84,26 @@ _wandb:
84
  5: 1
85
  6:
86
  - 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  vocab_size:
88
  desc: null
89
  value: 51866
 
84
  5: 1
85
  6:
86
  - 1
87
+ - 1: train/train_runtime
88
+ 5: 1
89
+ 6:
90
+ - 1
91
+ - 1: train/train_samples_per_second
92
+ 5: 1
93
+ 6:
94
+ - 1
95
+ - 1: train/train_steps_per_second
96
+ 5: 1
97
+ 6:
98
+ - 1
99
+ - 1: train/total_flos
100
+ 5: 1
101
+ 6:
102
+ - 1
103
+ - 1: train/train_loss
104
+ 5: 1
105
+ 6:
106
+ - 1
107
  vocab_size:
108
  desc: null
109
  value: 51866
wandb/run-20240122_192258-yf2elmz6/files/output.log CHANGED
@@ -1025,3 +1025,172 @@ Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens':
1025
  [INFO|configuration_utils.py:595] 2024-01-22 20:18:14,572 >> Configuration saved in ./tmp-checkpoint-1000/generation_config.json
1026
  {'eval_loss': 0.5424743294715881, 'eval_wer': 41.048913043478265, 'eval_runtime': 781.806, 'eval_samples_per_second': 2.421, 'eval_steps_per_second': 0.077, 'epoch': 8.7}
1027
  [INFO|modeling_utils.py:2503] 2024-01-22 20:18:29,139 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./tmp-checkpoint-1000/model.safetensors.index.json.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1025
  [INFO|configuration_utils.py:595] 2024-01-22 20:18:14,572 >> Configuration saved in ./tmp-checkpoint-1000/generation_config.json
1026
  {'eval_loss': 0.5424743294715881, 'eval_wer': 41.048913043478265, 'eval_runtime': 781.806, 'eval_samples_per_second': 2.421, 'eval_steps_per_second': 0.077, 'epoch': 8.7}
1027
  [INFO|modeling_utils.py:2503] 2024-01-22 20:18:29,139 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./tmp-checkpoint-1000/model.safetensors.index.json.
1028
+ [INFO|feature_extraction_utils.py:425] 2024-01-22 20:18:29,140 >> Feature extractor saved in ./tmp-checkpoint-1000/preprocessor_config.json
1029
+ [INFO|feature_extraction_utils.py:425] 2024-01-22 20:18:46,797 >> Feature extractor saved in ./preprocessor_config.json
1030
+ /home/sanchit/hf/lib/python3.8/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
1031
+ warnings.warn(
1032
+ /home/sanchit/hf/lib/python3.8/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
1033
+ warnings.warn(
1034
+
1035
+
1036
+
1037
+
1038
+
1039
+
1040
+
1041
+
1042
+
1043
+
1044
+
1045
+
1046
+
1047
+
1048
+
1049
+
1050
+
1051
+
1052
+
1053
+
1054
+
1055
+ 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1025/1150 [56:36<04:06, 1.97s/it]
1056
+
1057
+
1058
+
1059
+
1060
+
1061
+
1062
+
1063
+
1064
+
1065
+
1066
+
1067
+
1068
+
1069
+
1070
+
1071
+
1072
+
1073
+
1074
+
1075
+
1076
+
1077
+ 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1050/1150 [57:22<02:57, 1.78s/it]
1078
+
1079
+
1080
+
1081
+
1082
+
1083
+
1084
+
1085
+
1086
+
1087
+
1088
+
1089
+
1090
+
1091
+
1092
+
1093
+
1094
+
1095
+
1096
+
1097
+
1098
+
1099
+
1100
+
1101
+ 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1074/1150 [58:08<02:17, 1.81s/it]
1102
+
1103
+
1104
+
1105
+
1106
+
1107
+
1108
+
1109
+
1110
+
1111
+
1112
+
1113
+
1114
+
1115
+
1116
+
1117
+
1118
+
1119
+
1120
+
1121
+
1122
+
1123
+
1124
+
1125
+
1126
+ 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1099/1150 [58:56<01:44, 2.06s/it]
1127
+
1128
+
1129
+
1130
+
1131
+
1132
+
1133
+
1134
+
1135
+
1136
+
1137
+
1138
+
1139
+
1140
+
1141
+
1142
+
1143
+
1144
+
1145
+
1146
+
1147
+
1148
+
1149
+
1150
+
1151
+
1152
+ 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1125/1150 [59:46<00:46, 1.87s/it]
1153
+
1154
+
1155
+
1156
+
1157
+
1158
+
1159
+
1160
+
1161
+
1162
+
1163
+
1164
+
1165
+
1166
+
1167
+
1168
+
1169
+
1170
+
1171
+
1172
+
1173
+
1174
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1148/1150 [1:00:31<00:04, 2.05s/it]
1175
+ {'loss': 0.0008, 'learning_rate': 0.0, 'epoch': 10.0}
1176
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1150/1150 [1:00:34<00:00, 1.72s/it][INFO|trainer.py:1962] 2024-01-22 20:23:33,135 >>
1177
+ Training completed. Do not forget to share your model on huggingface.co/models =)
1178
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1150/1150 [1:00:34<00:00, 3.16s/it]
1179
+ [INFO|trainer.py:2926] 2024-01-22 20:23:33,137 >> Saving model checkpoint to ./
1180
+ [WARNING|configuration_utils.py:449] 2024-01-22 20:23:33,137 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
1181
+ Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
1182
+ [INFO|configuration_utils.py:473] 2024-01-22 20:23:33,138 >> Configuration saved in ./config.json
1183
+ [INFO|configuration_utils.py:595] 2024-01-22 20:23:33,138 >> Configuration saved in ./generation_config.json
1184
+ [INFO|modeling_utils.py:2503] 2024-01-22 20:23:45,892 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json.
1185
+ [INFO|feature_extraction_utils.py:425] 2024-01-22 20:23:45,893 >> Feature extractor saved in ./preprocessor_config.json
1186
+ [INFO|trainer.py:2926] 2024-01-22 20:23:45,894 >> Saving model checkpoint to ./
1187
+ [WARNING|configuration_utils.py:449] 2024-01-22 20:23:45,894 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
1188
+ Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
1189
+ [INFO|configuration_utils.py:473] 2024-01-22 20:23:45,895 >> Configuration saved in ./config.json
1190
+ [INFO|configuration_utils.py:595] 2024-01-22 20:23:45,895 >> Configuration saved in ./generation_config.json
1191
+ [INFO|modeling_utils.py:2503] 2024-01-22 20:24:01,028 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json.
1192
+ [INFO|feature_extraction_utils.py:425] 2024-01-22 20:24:01,030 >> Feature extractor saved in ./preprocessor_config.json
1193
+ events.out.tfevents.1705947776.hf-dgx-01.122884.0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13.3k/13.3k [00:00<00:00, 95.8kB/s]
1194
+ run-yf2elmz6.wandb: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.05M/1.05M [00:00<00:00, 1.52MB/s]
1195
+ model-00001-of-00002.safetensors: 1%|β–Ž | 29.1M/4.99G [00:01<02:52, 28.7MB/s]
1196
+ model-00002-of-00002.safetensors: 2%|β–ˆ | 29.3M/1.18G [00:01<00:50, 22.8MB/s]
wandb/run-20240122_192258-yf2elmz6/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 0.0024, "train/learning_rate": 2.307692307692308e-05, "train/epoch": 8.7, "train/global_step": 1000, "_timestamp": 1705951094.569476, "_runtime": 3316.540144920349, "_step": 41, "eval/loss": 0.5424743294715881, "eval/wer": 41.048913043478265, "eval/runtime": 781.806, "eval/samples_per_second": 2.421, "eval/steps_per_second": 0.077}
 
1
+ {"train/loss": 0.0008, "train/learning_rate": 0.0, "train/epoch": 10.0, "train/global_step": 1150, "_timestamp": 1705951413.136174, "_runtime": 3635.10684299469, "_step": 48, "eval/loss": 0.5424743294715881, "eval/wer": 41.048913043478265, "eval/runtime": 781.806, "eval/samples_per_second": 2.421, "eval/steps_per_second": 0.077, "train/train_runtime": 3636.3151, "train/train_samples_per_second": 10.071, "train/train_steps_per_second": 0.316, "train/total_flos": 1.244163879862272e+20, "train/train_loss": 0.1943072140314009}
wandb/run-20240122_192258-yf2elmz6/logs/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
wandb/run-20240122_192258-yf2elmz6/run-yf2elmz6.wandb CHANGED
Binary files a/wandb/run-20240122_192258-yf2elmz6/run-yf2elmz6.wandb and b/wandb/run-20240122_192258-yf2elmz6/run-yf2elmz6.wandb differ