File size: 1,090 Bytes
91ada37 50f69fa 91ada37 021f6f7 91ada37 021f6f7 e6c742d 50f69fa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
datasets:
- JeanKaddour/minipile
language:
- en
pipeline_tag: text-generation
---
# megalodon-200m: minipile
Small pretraining experiment:
- 8192 ctx, approx 1 epoch
- codebase: https://github.com/pszemraj/megalodon/tree/dataload-fixes
- [training logs](https://huggingface.co./pszemraj/megalodon-200m-minipile/raw/main/train.log)
### Model Configuration
- **Number of Layers:** 12
- **Model Dimension:** 1024
- **Z Dimension:** 256
- **Value Dimension:** 2048
- **Number of Heads:** 1
- **FFN Hidden Dimension:** 2560
- **CEMA NDIM:** 16
- **Chunk Size:** 2048
- **Efficient Attention:** None
- **Initialization Mode:** He
- **Vocabulary Size:** 20480
- **Output Size:** 20480
- **Normalization Groups:** 32
- **Normalization Affine:** True
- **Normalization Epsilon:** 1e-05
- **ROPE Base:** None
- **Dropout:** 0.0
- **Hidden Dropout:** 0.0
- **Attention Dropout:** 0.0
- **SWIGLU:** False
- **Rescale NFFN:** False
- **Scale Embedding:** False
- **Share Embedding:** False
- **Layerwise Checkpointing:** False |