Text Generation
Transformers
NeMo
Safetensors
mistral
text-generation-inference
Inference Endpoints
srvm commited on
Commit
cc94637
1 Parent(s): 4ea0f55

Update link to tech report

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
10
 
11
  ## Model Overview
12
 
13
- Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
14
 
15
  **Model Developer:** NVIDIA
16
 
@@ -140,5 +140,6 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
140
 
141
 
142
  ## References
 
143
  * [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
144
- * [LLM Pruning and Distillation in Practice: The Minitron Approach](https://research.nvidia.com/publication/_llm-pruning-and-distillation-practice-minitron-approach)
 
10
 
11
  ## Model Overview
12
 
13
+ Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
14
 
15
  **Model Developer:** NVIDIA
16
 
 
140
 
141
 
142
  ## References
143
+
144
  * [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
145
+ * [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)