roberta-base-fa / preprocessor /tokenizer_config.yaml

Upload preprocessor with huggingface_hub

9fe7716 over 1 year ago

327 Bytes

	name: bpe_tokenizer
	config_type: preprocessor
	truncation_strategy: no_truncation
	padding_strategy: no_padding
	continuing_subword_prefix: ''
	end_of_word_suffix: ''
	fuse_unk: false
	train_config:
	name: bpe_tokenizer
	config_type: preprocessor
	vocab_size: 30000
	min_frequency: 2
	limit_alphabet: 1000
	show_progress: true