links
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
|
|
20 |
| mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 |
|
21 |
| bert-base-uncased | 110M | 512 | 0.7905 |
|
22 |
| roberta-base | 125M | 514 | 0.86 |
|
23 |
-
| bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 |
|
24 |
|
25 |
<details>
|
26 |
<summary><strong>GLUE Details</strong></summary>
|
@@ -56,6 +56,7 @@ Details:
|
|
56 |
- We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
|
57 |
6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
|
58 |
- This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
|
|
|
59 |
- The tokenizer's vocab is **exactly** the same as Roberta's
|
60 |
</details>
|
61 |
|
|
|
20 |
| mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 |
|
21 |
| bert-base-uncased | 110M | 512 | 0.7905 |
|
22 |
| roberta-base | 125M | 514 | 0.86 |
|
23 |
+
| [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
|
24 |
|
25 |
<details>
|
26 |
<summary><strong>GLUE Details</strong></summary>
|
|
|
56 |
- We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
|
57 |
6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
|
58 |
- This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
|
59 |
+
- - Example usage of MEGA+BART to create an encoder-decoder [here](https://colab.research.google.com/gist/pszemraj/4bac8635361543b66207d73e4b25a13a/mega-encoder-small-16k-v1-for-text2text.ipynb)
|
60 |
- The tokenizer's vocab is **exactly** the same as Roberta's
|
61 |
</details>
|
62 |
|