Achal Dave commited on
Commit
77ec709
1 Parent(s): 39adea4

README updates

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -7,9 +7,9 @@ license: apache-2.0
7
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63118add64939fabc0108b28/BB42g4V8HTxb5dR4tcy8A.png" alt="DCLM Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
8
 
9
 
10
- # Model Card for DCLM-Baseline-1B
11
 
12
- DCLM-Baseline-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
13
 
14
  ## Model Details
15
 
@@ -48,12 +48,13 @@ The model was trained using the following setup:
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50
 
51
- For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
 
52
  To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
53
 
54
  ## Evaluation
55
 
56
- Here are the evaluation results for DCLM-Baseline-7B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite)
57
 
58
  | Task | Score |
59
  |------------------------------------------|---------|
@@ -116,7 +117,7 @@ Note: All scores are presented as decimal values between 0 and 1, representing t
116
 
117
  ## Limitations and Biases
118
 
119
- While DCLM-Baseline-1B demonstrates strong performance across a range of tasks, it's important to note:
120
 
121
  1. The model may exhibit biases present in its training data, which is derived from web crawl data.
122
  2. It has not undergone specific alignment or safety fine-tuning, so outputs should be used with caution.
 
7
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63118add64939fabc0108b28/BB42g4V8HTxb5dR4tcy8A.png" alt="DCLM Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
8
 
9
 
10
+ # Model Card for DCLM-1B
11
 
12
+ DCLM-1B is a 1.4 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
13
 
14
  ## Model Details
15
 
 
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50
 
51
+ For more detailed training information, please refer to Appendix P.3 of the
52
+ paper.
53
  To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
54
 
55
  ## Evaluation
56
 
57
+ Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry](https://github.com/mosaicml/llm-foundry) eval suite)
58
 
59
  | Task | Score |
60
  |------------------------------------------|---------|
 
117
 
118
  ## Limitations and Biases
119
 
120
+ While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note:
121
 
122
  1. The model may exhibit biases present in its training data, which is derived from web crawl data.
123
  2. It has not undergone specific alignment or safety fine-tuning, so outputs should be used with caution.