File size: 3,628 Bytes
a1d5101 15af4e5 1bc9149 15af4e5 1bc9149 15af4e5 e35b056 1bc9149 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: apache-2.0
---
# Grok-1
---
_This repository contains the weights of the Grok-1 open-weights model._
**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`

<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>
---
ββββββββββββββββββββββββββββ
β _______ β
β /\ |_ _| β
β __ __ / \ | | β
β \ \/ / / /\ \ | | β
β > < / ____ \ _| |_ β
β /_/\_\/_/ \_\_____| β
β β
β Understand the Universe β
β [https://x.ai] β
ββββββββββββββββββββββββββββ
βββββββββββββββββββββ
β xAI Grok-1 (314B) β
βββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββ
β 314B parameter Mixture of Experts model β
β - Base model (not finetuned) β
β - 8 experts (2 active) β
β - 86B active parameters β
β - Apache 2.0 license β
β - Code: https://github.com/xai-org/grok-1 β
β - Happy coding! β
ββββββββββββββββββββββββββββββββββββββββββββββ
## Model Configuration Details
**Vocabulary Size**: 131,072
**Special Tokens**:
- Pad Token: 0
- End of Sequence Token: 2
**Sequence Length**: 8192
### **Model Architecture**: MoE
- **Embedding Size**: 6,144
- Rotary Embedding (RoPE)
- **Layers**: 64
- **Experts**: 8
- **Selected Experts**: 2
- **Widening Factor**: 8
- **Key Size**: 128
- **Query Heads**: 48
- **Key Value Heads**: 8
- **Activation Sharding**: Data-wise, Model-wise
- **Tokenizer** : SentencePiece tokenizer
### **Inference Configuration**:
- Batch Size per Device: 0.125
- Tokenizer: `./tokenizer.model`
- Local Mesh: 1x8
- Between Hosts: 1x1
## Inference Details
Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
```shell
pip install -r requirements.txt
python transformer.py
```
to test the code.
You should be seeing output from the language model.
Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
**p.s. we're hiring: https://x.ai/careers** |