Update link to tech report
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
|
|
10 |
|
11 |
## Model Overview
|
12 |
|
13 |
-
Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
14 |
|
15 |
**Model Developer:** NVIDIA
|
16 |
|
@@ -140,5 +140,6 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
|
|
140 |
|
141 |
|
142 |
## References
|
|
|
143 |
* [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
|
144 |
-
* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://
|
|
|
10 |
|
11 |
## Model Overview
|
12 |
|
13 |
+
Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
|
14 |
|
15 |
**Model Developer:** NVIDIA
|
16 |
|
|
|
140 |
|
141 |
|
142 |
## References
|
143 |
+
|
144 |
* [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
|
145 |
+
* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)
|