monsoon-nlp
commited on
Commit
·
a9fee4d
1
Parent(s):
0803e25
warning
Browse files
README.md
CHANGED
@@ -7,6 +7,8 @@ language: dv
|
|
7 |
Pretrained from scratch on Dhivei (language of the Maldives)
|
8 |
with ByT5, Google's new byte-level tokenizer strategy.
|
9 |
|
|
|
|
|
10 |
Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
|
11 |
|
12 |
Pretraining Notebook:
|
@@ -17,3 +19,7 @@ https://colab.research.google.com/drive/1ERIZ1PyHn-yN_jo7dTQeODn22vrt-d1d?usp=sh
|
|
17 |
On Dhivehi news classification task
|
18 |
|
19 |
https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
|
|
|
|
|
|
|
|
|
|
7 |
Pretrained from scratch on Dhivei (language of the Maldives)
|
8 |
with ByT5, Google's new byte-level tokenizer strategy.
|
9 |
|
10 |
+
**Use byt5-dv for now; this is less accurate**
|
11 |
+
|
12 |
Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
|
13 |
|
14 |
Pretraining Notebook:
|
|
|
19 |
On Dhivehi news classification task
|
20 |
|
21 |
https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
|
22 |
+
|
23 |
+
## Issues
|
24 |
+
|
25 |
+
There was an issue with the vocabulary size, final layer, and/or accuracy on fine-tuning.
|