BEE-spoke-data
/

mega-encoder-small-16k-v1

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Mar 4, 2024

Commit

86ff964

·

verified ·

1 Parent(s): aa7e669

links

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
 | mega-encoder-small-16k-v1 | 122M  | 16384 |  0.777 |
 | bert-base-uncased         | 110M  |   512 | 0.7905 |
 | roberta-base              | 125M  |   514 |   0.86 |
-| bert-plus-L8-4096-v1.0    | 88.1M |  4096 | 0.8278 |
 <details>
 <summary><strong>GLUE Details</strong></summary>
@@ -56,6 +56,7 @@ Details:
 	- We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
 6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
 	- This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
 	- The tokenizer's vocab is **exactly** the same as Roberta's
 </details>

 | mega-encoder-small-16k-v1 | 122M  | 16384 |  0.777 |
 | bert-base-uncased         | 110M  |   512 | 0.7905 |
 | roberta-base              | 125M  |   514 |   0.86 |
+| [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0)    | 88.1M |  4096 | 0.8278 |
 <details>
 <summary><strong>GLUE Details</strong></summary>
 	- We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
 6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
 	- This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
+ -  - Example usage of MEGA+BART to create an encoder-decoder [here](https://colab.research.google.com/gist/pszemraj/4bac8635361543b66207d73e4b25a13a/mega-encoder-small-16k-v1-for-text2text.ipynb)
 	- The tokenizer's vocab is **exactly** the same as Roberta's
 </details>