File size: 1,090 Bytes
91ada37
 
 
 
 
 
50f69fa
91ada37
 
 
 
 
021f6f7
91ada37
021f6f7
 
 
e6c742d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50f69fa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
datasets:
- JeanKaddour/minipile
language:
- en
pipeline_tag: text-generation
---


# megalodon-200m: minipile

Small pretraining experiment:


- 8192 ctx, approx 1 epoch
- codebase: https://github.com/pszemraj/megalodon/tree/dataload-fixes
- [training logs](https://huggingface.co./pszemraj/megalodon-200m-minipile/raw/main/train.log)
 

### Model Configuration
- **Number of Layers:** 12  
- **Model Dimension:** 1024  
- **Z Dimension:** 256  
- **Value Dimension:** 2048  
- **Number of Heads:** 1  
- **FFN Hidden Dimension:** 2560  
- **CEMA NDIM:** 16  
- **Chunk Size:** 2048  
- **Efficient Attention:** None  
- **Initialization Mode:** He  
- **Vocabulary Size:** 20480  
- **Output Size:** 20480  
- **Normalization Groups:** 32  
- **Normalization Affine:** True  
- **Normalization Epsilon:** 1e-05  
- **ROPE Base:** None  
- **Dropout:** 0.0  
- **Hidden Dropout:** 0.0  
- **Attention Dropout:** 0.0  
- **SWIGLU:** False  
- **Rescale NFFN:** False  
- **Scale Embedding:** False  
- **Share Embedding:** False  
- **Layerwise Checkpointing:** False