bigscience
/

bloom-560m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Muennighoff commited on Aug 8, 2022

Commit

7b5134c

•

1 Parent(s): ef09465

Update README.md (#10)

- Update README.md (9f8fbfd40745829b76f70843bd2349110c66261f)
- Update README.md (6773df99000cd2a545c238e6a9ec17ea7fa64177)

Files changed (1) hide show

README.md +8 -15

README.md CHANGED Viewed

@@ -122,13 +122,15 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
-* 350 million parameters:
     * 24 layers, 16 attention heads
     * Hidden layers are 1024-dimensional
-    * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
 **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
@@ -165,18 +167,9 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
 #### **Training**
-_In progress._
-Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
-- Checkpoint size:
-    - Bf16 weights: 329GB
-    - Full checkpoint with optimizer states: 2.3TB
-- Training throughput: About 150 TFLOP per GPU per second
 - Number of epochs: 1 (*current target*)
@@ -184,9 +177,9 @@ Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/big
     - Started 11th March, 2022 11:42am PST
-    - Estimated end: 5th July, 2022
-- Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
 - Server training location: Île-de-France, France

 * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
+* 559,214,592 parameters:
+    * 256,901,120 embedding parameters
     * 24 layers, 16 attention heads
     * Hidden layers are 1024-dimensional
+    * Sequence length of 2048 tokens (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
 **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
 #### **Training**
+Training logs: [Tensorboard link](https://huggingface.co/bigscience/tr11e-350M-logs)
+- Training throughput: About 150 TFLOPs per GPU
 - Number of epochs: 1 (*current target*)
     - Started 11th March, 2022 11:42am PST
+    - Ended 5th July, 2022
+- Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments and other model sizes)
 - Server training location: Île-de-France, France