akreal commited on
Commit
e59aea4
·
unverified ·
1 Parent(s): b11761e

Update model

Browse files
README.md ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - swbd_da
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `akreal/espnet2_swbd_da_hubert_conformer`
15
+
16
+ This model was trained by Pavel Denisov using swbd_da recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 08c6efbc6299c972301236625f9abafe087c9f9c
23
+ pip install -e .
24
+ cd egs2/swbd_da/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model akreal/espnet2_swbd_da_hubert_conformer
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Thu Jan 20 19:31:21 CET 2022`
32
+ - python version: `3.8.12 (default, Aug 30 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]`
33
+ - espnet version: `espnet 0.10.6a1`
34
+ - pytorch version: `pytorch 1.10.1+cu113`
35
+ - Git hash: `08c6efbc6299c972301236625f9abafe087c9f9c`
36
+ - Commit date: `Tue Jan 4 13:40:33 2022 +0100`
37
+
38
+ ## asr_train_asr_raw_en_word_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_asr_model_valid.loss.ave/test_context3|2379|2379|66.3|33.7|0.0|0.0|33.7|33.7|
44
+ |decode_asr_asr_model_valid.loss.ave/valid_context3|8116|8116|69.5|30.5|0.0|0.0|30.5|30.5|
45
+
46
+ ### CER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_asr_asr_model_valid.loss.ave/test_context3|2379|19440|76.1|17.7|6.2|8.1|32.0|33.7|
51
+ |decode_asr_asr_model_valid.loss.ave/valid_context3|8116|66353|79.5|16.1|4.4|8.0|28.5|30.5|
52
+
53
+ ### TER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+
58
+ ## ASR config
59
+
60
+ <details><summary>expand</summary>
61
+
62
+ ```
63
+ config: conf/tuning/train_asr_conformer_hubert_context3.yaml
64
+ print_config: false
65
+ log_level: INFO
66
+ dry_run: false
67
+ iterator_type: sequence
68
+ output_dir: exp/asr_train_asr_conformer_hubert_context3_raw_en_word_sp
69
+ ngpu: 1
70
+ seed: 0
71
+ num_workers: 1
72
+ num_att_plot: 3
73
+ dist_backend: nccl
74
+ dist_init_method: env://
75
+ dist_world_size: null
76
+ dist_rank: null
77
+ local_rank: 0
78
+ dist_master_addr: null
79
+ dist_master_port: null
80
+ dist_launcher: null
81
+ multiprocessing_distributed: false
82
+ unused_parameters: false
83
+ sharded_ddp: false
84
+ cudnn_enabled: true
85
+ cudnn_benchmark: false
86
+ cudnn_deterministic: true
87
+ collect_stats: false
88
+ write_collected_feats: false
89
+ max_epoch: 35
90
+ patience: null
91
+ val_scheduler_criterion:
92
+ - valid
93
+ - loss
94
+ early_stopping_criterion:
95
+ - valid
96
+ - loss
97
+ - min
98
+ best_model_criterion:
99
+ - - valid
100
+ - loss
101
+ - min
102
+ keep_nbest_models: 7
103
+ nbest_averaging_interval: 0
104
+ grad_clip: 5.0
105
+ grad_clip_type: 2.0
106
+ grad_noise: false
107
+ accum_grad: 1
108
+ no_forward_run: false
109
+ resume: true
110
+ train_dtype: float32
111
+ use_amp: false
112
+ log_interval: null
113
+ use_matplotlib: true
114
+ use_tensorboard: true
115
+ use_wandb: false
116
+ wandb_project: null
117
+ wandb_id: null
118
+ wandb_entity: null
119
+ wandb_name: null
120
+ wandb_model_log_interval: -1
121
+ detect_anomaly: false
122
+ pretrain_path: null
123
+ init_param: []
124
+ ignore_init_mismatch: false
125
+ freeze_param:
126
+ - frontend.upstream
127
+ num_iters_per_epoch: null
128
+ batch_size: 20
129
+ valid_batch_size: null
130
+ batch_bins: 4000000
131
+ valid_batch_bins: null
132
+ train_shape_file:
133
+ - exp/asr_stats_context3_raw_en_word_sp/train/speech_shape
134
+ - exp/asr_stats_context3_raw_en_word_sp/train/text_shape.word
135
+ valid_shape_file:
136
+ - exp/asr_stats_context3_raw_en_word_sp/valid/speech_shape
137
+ - exp/asr_stats_context3_raw_en_word_sp/valid/text_shape.word
138
+ batch_type: numel
139
+ valid_batch_type: null
140
+ fold_length:
141
+ - 80000
142
+ - 150
143
+ sort_in_batch: descending
144
+ sort_batch: descending
145
+ multiple_iterator: false
146
+ chunk_length: 500
147
+ chunk_shift_ratio: 0.5
148
+ num_cache_chunks: 1024
149
+ train_data_path_and_name_and_type:
150
+ - - dump/raw/train_context3_sp/wav.scp
151
+ - speech
152
+ - sound
153
+ - - dump/raw/train_context3_sp/text
154
+ - text
155
+ - text
156
+ valid_data_path_and_name_and_type:
157
+ - - dump/raw/valid_context3/wav.scp
158
+ - speech
159
+ - sound
160
+ - - dump/raw/valid_context3/text
161
+ - text
162
+ - text
163
+ allow_variable_data_keys: false
164
+ max_cache_size: 0.0
165
+ max_cache_fd: 32
166
+ valid_max_cache_size: null
167
+ optim: adam
168
+ optim_conf:
169
+ lr: 0.0001
170
+ scheduler: warmuplr
171
+ scheduler_conf:
172
+ warmup_steps: 25000
173
+ token_list:
174
+ - <blank>
175
+ - <unk>
176
+ - statement
177
+ - backchannel
178
+ - opinion
179
+ - abandon
180
+ - agree
181
+ - yn_q
182
+ - apprec
183
+ - 'yes'
184
+ - uninterp
185
+ - close
186
+ - wh_q
187
+ - acknowledge
188
+ - 'no'
189
+ - yn_decl_q
190
+ - hedge
191
+ - backchannel_q
192
+ - sum
193
+ - quote
194
+ - affirm
195
+ - other
196
+ - directive
197
+ - repeat
198
+ - open_q
199
+ - completion
200
+ - rhet_q
201
+ - hold
202
+ - reject
203
+ - answer
204
+ - neg
205
+ - ans_dispref
206
+ - repeat_q
207
+ - open
208
+ - or
209
+ - commit
210
+ - maybe
211
+ - decl_q
212
+ - third_pty
213
+ - self_talk
214
+ - thank
215
+ - apology
216
+ - tag_q
217
+ - downplay
218
+ - <sos/eos>
219
+ init: null
220
+ input_size: null
221
+ ctc_conf:
222
+ dropout_rate: 0.0
223
+ ctc_type: builtin
224
+ reduce: true
225
+ ignore_nan_grad: true
226
+ joint_net_conf: null
227
+ model_conf:
228
+ ctc_weight: 0.0
229
+ extract_feats_in_collect_stats: false
230
+ use_preprocessor: true
231
+ token_type: word
232
+ bpemodel: null
233
+ non_linguistic_symbols: null
234
+ cleaner: null
235
+ g2p: null
236
+ speech_volume_normalize: null
237
+ rir_scp: null
238
+ rir_apply_prob: 1.0
239
+ noise_scp: null
240
+ noise_apply_prob: 1.0
241
+ noise_db_range: '13_15'
242
+ frontend: s3prl
243
+ frontend_conf:
244
+ frontend_conf:
245
+ upstream: hubert_large_ll60k
246
+ download_dir: ./hub
247
+ multilayer_feature: true
248
+ fs: 16k
249
+ specaug: specaug
250
+ specaug_conf:
251
+ apply_time_warp: true
252
+ time_warp_window: 5
253
+ time_warp_mode: bicubic
254
+ apply_freq_mask: true
255
+ freq_mask_width_range:
256
+ - 0
257
+ - 30
258
+ num_freq_mask: 2
259
+ apply_time_mask: true
260
+ time_mask_width_range:
261
+ - 0
262
+ - 40
263
+ num_time_mask: 2
264
+ normalize: utterance_mvn
265
+ normalize_conf: {}
266
+ preencoder: linear
267
+ preencoder_conf:
268
+ input_size: 1024
269
+ output_size: 80
270
+ encoder: conformer
271
+ encoder_conf:
272
+ output_size: 512
273
+ attention_heads: 8
274
+ linear_units: 2048
275
+ num_blocks: 12
276
+ dropout_rate: 0.1
277
+ positional_dropout_rate: 0.1
278
+ attention_dropout_rate: 0.1
279
+ input_layer: conv2d
280
+ normalize_before: true
281
+ macaron_style: true
282
+ pos_enc_layer_type: rel_pos
283
+ selfattention_layer_type: rel_selfattn
284
+ activation_type: swish
285
+ use_cnn_module: true
286
+ cnn_module_kernel: 31
287
+ postencoder: null
288
+ postencoder_conf: {}
289
+ decoder: transformer
290
+ decoder_conf:
291
+ attention_heads: 8
292
+ linear_units: 2048
293
+ num_blocks: 6
294
+ dropout_rate: 0.1
295
+ positional_dropout_rate: 0.1
296
+ self_attention_dropout_rate: 0.1
297
+ src_attention_dropout_rate: 0.1
298
+ required:
299
+ - output_dir
300
+ - token_list
301
+ version: 0.10.5a1
302
+ distributed: false
303
+ ```
304
+
305
+ </details>
306
+
307
+
308
+
309
+ ### Citing ESPnet
310
+
311
+ ```BibTex
312
+ @inproceedings{watanabe2018espnet,
313
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
314
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
315
+ year={2018},
316
+ booktitle={Proceedings of Interspeech},
317
+ pages={2207--2211},
318
+ doi={10.21437/Interspeech.2018-1456},
319
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
320
+ }
321
+
322
+
323
+
324
+
325
+ ```
326
+
327
+ or arXiv:
328
+
329
+ ```bibtex
330
+ @misc{watanabe2018espnet,
331
+ title={ESPnet: End-to-End Speech Processing Toolkit},
332
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
333
+ year={2018},
334
+ eprint={1804.00015},
335
+ archivePrefix={arXiv},
336
+ primaryClass={cs.CL}
337
+ }
338
+ ```
exp/asr_train_asr_raw_en_word_sp/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Thu Jan 20 19:24:54 CET 2022`
5
+ - python version: `3.8.12 (default, Aug 30 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]`
6
+ - espnet version: `espnet 0.10.6a1`
7
+ - pytorch version: `pytorch 1.10.1+cu113`
8
+ - Git hash: `08c6efbc6299c972301236625f9abafe087c9f9c`
9
+ - Commit date: `Tue Jan 4 13:40:33 2022 +0100`
10
+
11
+ ## asr_train_asr_raw_en_word_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.loss.ave/test_context3|2379|2379|66.3|33.7|0.0|0.0|33.7|33.7|
17
+ |decode_asr_asr_model_valid.loss.ave/valid_context3|8116|8116|69.5|30.5|0.0|0.0|30.5|30.5|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_asr_model_valid.loss.ave/test_context3|2379|19440|76.1|17.7|6.2|8.1|32.0|33.7|
24
+ |decode_asr_asr_model_valid.loss.ave/valid_context3|8116|66353|79.5|16.1|4.4|8.0|28.5|30.5|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_raw_en_word_sp/config.yaml ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_conformer_hubert_context3.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_conformer_hubert_context3_raw_en_word_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 35
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 7
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ wandb_entity: null
57
+ wandb_name: null
58
+ wandb_model_log_interval: -1
59
+ detect_anomaly: false
60
+ pretrain_path: null
61
+ init_param: []
62
+ ignore_init_mismatch: false
63
+ freeze_param:
64
+ - frontend.upstream
65
+ num_iters_per_epoch: null
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 4000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_context3_raw_en_word_sp/train/speech_shape
72
+ - exp/asr_stats_context3_raw_en_word_sp/train/text_shape.word
73
+ valid_shape_file:
74
+ - exp/asr_stats_context3_raw_en_word_sp/valid/speech_shape
75
+ - exp/asr_stats_context3_raw_en_word_sp/valid/text_shape.word
76
+ batch_type: numel
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/train_context3_sp/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/raw/train_context3_sp/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/valid_context3/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/valid_context3/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.0001
108
+ scheduler: warmuplr
109
+ scheduler_conf:
110
+ warmup_steps: 25000
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - statement
115
+ - backchannel
116
+ - opinion
117
+ - abandon
118
+ - agree
119
+ - yn_q
120
+ - apprec
121
+ - 'yes'
122
+ - uninterp
123
+ - close
124
+ - wh_q
125
+ - acknowledge
126
+ - 'no'
127
+ - yn_decl_q
128
+ - hedge
129
+ - backchannel_q
130
+ - sum
131
+ - quote
132
+ - affirm
133
+ - other
134
+ - directive
135
+ - repeat
136
+ - open_q
137
+ - completion
138
+ - rhet_q
139
+ - hold
140
+ - reject
141
+ - answer
142
+ - neg
143
+ - ans_dispref
144
+ - repeat_q
145
+ - open
146
+ - or
147
+ - commit
148
+ - maybe
149
+ - decl_q
150
+ - third_pty
151
+ - self_talk
152
+ - thank
153
+ - apology
154
+ - tag_q
155
+ - downplay
156
+ - <sos/eos>
157
+ init: null
158
+ input_size: null
159
+ ctc_conf:
160
+ dropout_rate: 0.0
161
+ ctc_type: builtin
162
+ reduce: true
163
+ ignore_nan_grad: true
164
+ joint_net_conf: null
165
+ model_conf:
166
+ ctc_weight: 0.0
167
+ extract_feats_in_collect_stats: false
168
+ use_preprocessor: true
169
+ token_type: word
170
+ bpemodel: null
171
+ non_linguistic_symbols: null
172
+ cleaner: null
173
+ g2p: null
174
+ speech_volume_normalize: null
175
+ rir_scp: null
176
+ rir_apply_prob: 1.0
177
+ noise_scp: null
178
+ noise_apply_prob: 1.0
179
+ noise_db_range: '13_15'
180
+ frontend: s3prl
181
+ frontend_conf:
182
+ frontend_conf:
183
+ upstream: hubert_large_ll60k
184
+ download_dir: ./hub
185
+ multilayer_feature: true
186
+ fs: 16k
187
+ specaug: specaug
188
+ specaug_conf:
189
+ apply_time_warp: true
190
+ time_warp_window: 5
191
+ time_warp_mode: bicubic
192
+ apply_freq_mask: true
193
+ freq_mask_width_range:
194
+ - 0
195
+ - 30
196
+ num_freq_mask: 2
197
+ apply_time_mask: true
198
+ time_mask_width_range:
199
+ - 0
200
+ - 40
201
+ num_time_mask: 2
202
+ normalize: utterance_mvn
203
+ normalize_conf: {}
204
+ preencoder: linear
205
+ preencoder_conf:
206
+ input_size: 1024
207
+ output_size: 80
208
+ encoder: conformer
209
+ encoder_conf:
210
+ output_size: 512
211
+ attention_heads: 8
212
+ linear_units: 2048
213
+ num_blocks: 12
214
+ dropout_rate: 0.1
215
+ positional_dropout_rate: 0.1
216
+ attention_dropout_rate: 0.1
217
+ input_layer: conv2d
218
+ normalize_before: true
219
+ macaron_style: true
220
+ pos_enc_layer_type: rel_pos
221
+ selfattention_layer_type: rel_selfattn
222
+ activation_type: swish
223
+ use_cnn_module: true
224
+ cnn_module_kernel: 31
225
+ postencoder: null
226
+ postencoder_conf: {}
227
+ decoder: transformer
228
+ decoder_conf:
229
+ attention_heads: 8
230
+ linear_units: 2048
231
+ num_blocks: 6
232
+ dropout_rate: 0.1
233
+ positional_dropout_rate: 0.1
234
+ self_attention_dropout_rate: 0.1
235
+ src_attention_dropout_rate: 0.1
236
+ required:
237
+ - output_dir
238
+ - token_list
239
+ version: 0.10.5a1
240
+ distributed: false
exp/asr_train_asr_raw_en_word_sp/images/acc.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/backward_time.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/cer.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/cer_transducer.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/forward_time.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/iter_time.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/loss.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/loss_att.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/loss_transducer.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/train_time.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/wer.png ADDED
exp/asr_train_asr_raw_en_word_sp/images/wer_transducer.png ADDED
exp/asr_train_asr_raw_en_word_sp/valid.loss.ave_7best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f90363a91d0bec8dad19f036a0c72a8d223b03846ebf5a55255bde1f7a98ec9c
3
+ size 1701237227
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.6a1
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_raw_en_word_sp/valid.loss.ave_7best.pth
4
+ python: "3.8.12 (default, Aug 30 2021, 00:00:00) \n[GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]"
5
+ timestamp: 1642703096.443818
6
+ torch: 1.10.1+cu113
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_raw_en_word_sp/config.yaml