mt5-large-ko`

This model is a trimmed version of google/mt5-large by vocabtrimmer, a tool for trimming vocabulary of language models to compress the model size. Following table shows a summary of the trimming process.

	google/mt5-large	seonjeongh/mt5-large-ko
parameter_size_full	1,229,581,312	867,585,024
parameter_size_embedding	512,229,376	150,233,088
vocab_size	250,112	73,356
compression_rate_full	100.0	70.56
compression_rate_embedding	100.0	29.33

Following table shows the parameter used to trim vocabulary.

language	dataset	dataset_column	dataset_name	dataset_split	target_vocab_size	min_frequency
ko	vocabtrimmer/mc4_validation	text	ko	validation		2

Vocabulary Trimmed google/mt5-large: seonjeongh/mt5-large-ko

Vocabulary Trimmed google/mt5-large: `seonjeongh/mt5-large-ko`