Update README.md
Browse files
README.md
CHANGED
@@ -181,6 +181,7 @@ For datasets with predefined `train`, `validation`, and `test` sets, we simply t
|
|
181 |
The evaluation results are shown in the table.
|
182 |
`#Param.` represents the number of parameters in both the input embedding layer and the Transformer layers, while `#Param. w/o Emb.` indicates the number of parameters in the Transformer layers only.
|
183 |
|
|
|
184 |
Despite being a long-context model capable of processing sequences of up to 8,192 tokens, our ModernBERT-Ja-310M also exhibited strong performance in short-sequence evaluations.
|
185 |
|
186 |
## Ethical Considerations
|
|
|
181 |
The evaluation results are shown in the table.
|
182 |
`#Param.` represents the number of parameters in both the input embedding layer and the Transformer layers, while `#Param. w/o Emb.` indicates the number of parameters in the Transformer layers only.
|
183 |
|
184 |
+
According to our evaluation results, our ModernBERT-Ja-310M archives **state-of-the-art** performance across the evaluation tasks, even when compared with much larger models.
|
185 |
Despite being a long-context model capable of processing sequences of up to 8,192 tokens, our ModernBERT-Ja-310M also exhibited strong performance in short-sequence evaluations.
|
186 |
|
187 |
## Ethical Considerations
|