BEE-spoke-data
/

mega-ar-126m-4k

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Dec 27, 2023

Commit

0b99e8d

•

1 Parent(s): 11384f1

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -69,6 +69,7 @@ Details:
 - 768 hidden size, 12 layers
 - no MEGA chunking, 4096 context length
 - EMA dimension 16, shared dimension 192
 - train-from-scratch

 - 768 hidden size, 12 layers
 - no MEGA chunking, 4096 context length
 - EMA dimension 16, shared dimension 192
+- tokenizer: GPT NeoX
 - train-from-scratch