pszemraj's picture
Update README.md
50f69fa verified
metadata
license: apache-2.0
datasets:
  - JeanKaddour/minipile
language:
  - en
pipeline_tag: text-generation

megalodon-200m: minipile

Small pretraining experiment:

Model Configuration

  • Number of Layers: 12
  • Model Dimension: 1024
  • Z Dimension: 256
  • Value Dimension: 2048
  • Number of Heads: 1
  • FFN Hidden Dimension: 2560
  • CEMA NDIM: 16
  • Chunk Size: 2048
  • Efficient Attention: None
  • Initialization Mode: He
  • Vocabulary Size: 20480
  • Output Size: 20480
  • Normalization Groups: 32
  • Normalization Affine: True
  • Normalization Epsilon: 1e-05
  • ROPE Base: None
  • Dropout: 0.0
  • Hidden Dropout: 0.0
  • Attention Dropout: 0.0
  • SWIGLU: False
  • Rescale NFFN: False
  • Scale Embedding: False
  • Share Embedding: False
  • Layerwise Checkpointing: False