Add training details
Browse files
README.md
CHANGED
@@ -119,11 +119,12 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
|
|
119 |
|
120 |
**Training Procedure**
|
121 |
|
122 |
-
- **Hardware:**
|
123 |
-
- **Optimizer:**
|
124 |
-
- **
|
|
|
125 |
- **Num of Tokens:** 800B Tokens
|
126 |
-
- **Learning rate:**
|
127 |
|
128 |
## Community
|
129 |
|
|
|
119 |
|
120 |
**Training Procedure**
|
121 |
|
122 |
+
- **Hardware:** 256 nodes of 6xV100 (IBM Power9), on the OLCF Summit cluster
|
123 |
+
- **Optimizer:** Apex FusedAdam
|
124 |
+
- **Parallelism:** Pipeline parallel 6, tensor parallel 2
|
125 |
+
- **Gradient Accumulations**: 8 (global batch size 4M tokens)
|
126 |
- **Num of Tokens:** 800B Tokens
|
127 |
+
- **Learning rate:** 0.00016
|
128 |
|
129 |
## Community
|
130 |
|