Fill-Mask
Transformers
Safetensors
English
mega
16384
16k
Inference Endpoints
pszemraj commited on
Commit
86ff964
·
verified ·
1 Parent(s): aa7e669
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -20,7 +20,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
20
  | mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 |
21
  | bert-base-uncased | 110M | 512 | 0.7905 |
22
  | roberta-base | 125M | 514 | 0.86 |
23
- | bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 |
24
 
25
  <details>
26
  <summary><strong>GLUE Details</strong></summary>
@@ -56,6 +56,7 @@ Details:
56
  - We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
57
  6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
58
  - This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
 
59
  - The tokenizer's vocab is **exactly** the same as Roberta's
60
  </details>
61
 
 
20
  | mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 |
21
  | bert-base-uncased | 110M | 512 | 0.7905 |
22
  | roberta-base | 125M | 514 | 0.86 |
23
+ | [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
24
 
25
  <details>
26
  <summary><strong>GLUE Details</strong></summary>
 
56
  - We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
57
  6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
58
  - This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
59
+ - - Example usage of MEGA+BART to create an encoder-decoder [here](https://colab.research.google.com/gist/pszemraj/4bac8635361543b66207d73e4b25a13a/mega-encoder-small-16k-v1-for-text2text.ipynb)
60
  - The tokenizer's vocab is **exactly** the same as Roberta's
61
  </details>
62