Update README.md
Browse files
README.md
CHANGED
@@ -181,7 +181,7 @@ For datasets with predefined `train`, `validation`, and `test` sets, we simply t
|
|
181 |
The evaluation results are shown in the table.
|
182 |
`#Param.` represents the number of parameters in both the input embedding layer and the Transformer layers, while `#Param. w/o Emb.` indicates the number of parameters in the Transformer layers only.
|
183 |
|
184 |
-
According to our evaluation results, our ModernBERT-Ja-310M archives
|
185 |
Despite being a long-context model capable of processing sequences of up to 8,192 tokens, our ModernBERT-Ja-310M also exhibited strong performance in short-sequence evaluations.
|
186 |
|
187 |
## Ethical Considerations
|
|
|
181 |
The evaluation results are shown in the table.
|
182 |
`#Param.` represents the number of parameters in both the input embedding layer and the Transformer layers, while `#Param. w/o Emb.` indicates the number of parameters in the Transformer layers only.
|
183 |
|
184 |
+
According to our evaluation results, **our ModernBERT-Ja-310M archives state-of-the-art performance** across the evaluation tasks, even when compared with much larger models.
|
185 |
Despite being a long-context model capable of processing sequences of up to 8,192 tokens, our ModernBERT-Ja-310M also exhibited strong performance in short-sequence evaluations.
|
186 |
|
187 |
## Ethical Considerations
|