indiejoseph
commited on
Commit
•
6284427
1
Parent(s):
ad9a733
Update README.md
Browse files
README.md
CHANGED
@@ -25,9 +25,9 @@ should probably proofread and complete it, then remove this comment. -->
|
|
25 |
|
26 |
# bart-base-cantonese
|
27 |
|
28 |
-
This model is a continue pre-train version of [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) on filtered Cantonese common crawl dataset with
|
29 |
|
30 |
-
This tokenizer has extended the Bert tokenizer from fnlp/bart-base-chinese with
|
31 |
|
32 |
## Intended uses & limitations
|
33 |
|
|
|
25 |
|
26 |
# bart-base-cantonese
|
27 |
|
28 |
+
This model is a continue pre-train version of [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) on filtered Cantonese common crawl dataset with 472M tokens.
|
29 |
|
30 |
+
This tokenizer has extended the Bert tokenizer from fnlp/bart-base-chinese with 100 more Chinese characters commonly found in Cantonese
|
31 |
|
32 |
## Intended uses & limitations
|
33 |
|