TRI-ML
/

DCLM-1B-v0

Transformers

Safetensors

openlm

Inference Endpoints

Model card Files Files and versions Community

Achal Dave commited on Jul 16, 2024

Commit

77ec709

1 Parent(s): 39adea4

README updates

Browse files

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -7,9 +7,9 @@ license: apache-2.0
 <img src="https://cdn-uploads.huggingface.co/production/uploads/63118add64939fabc0108b28/BB42g4V8HTxb5dR4tcy8A.png" alt="DCLM Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
-# Model Card for DCLM-Baseline-1B
-DCLM-Baseline-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
 ## Model Details
@@ -48,12 +48,13 @@ The model was trained using the following setup:
 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs
-For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
 To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0)  with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata)  and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
 ## Evaluation
-Here are the evaluation results for DCLM-Baseline-7B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite)
 | Task                                     | Score   |
 |------------------------------------------|---------|
@@ -116,7 +117,7 @@ Note: All scores are presented as decimal values between 0 and 1, representing t
 ## Limitations and Biases
-While DCLM-Baseline-1B demonstrates strong performance across a range of tasks, it's important to note:
 1. The model may exhibit biases present in its training data, which is derived from web crawl data.
 2. It has not undergone specific alignment or safety fine-tuning, so outputs should be used with caution.

 <img src="https://cdn-uploads.huggingface.co/production/uploads/63118add64939fabc0108b28/BB42g4V8HTxb5dR4tcy8A.png" alt="DCLM Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Model Card for DCLM-1B
+DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
 ## Model Details
 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs
+For more detailed training information, please refer to Appendix P.3 of the
+paper.
 To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0)  with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata)  and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
 ## Evaluation
+Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite)
 | Task                                     | Score   |
 |------------------------------------------|---------|
 ## Limitations and Biases
+While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note:
 1. The model may exhibit biases present in its training data, which is derived from web crawl data.
 2. It has not undergone specific alignment or safety fine-tuning, so outputs should be used with caution.