juliehunter
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -31,11 +31,11 @@ pipeline_tag: text-generation
|
|
31 |
|
32 |
## Model Description
|
33 |
|
34 |
-
Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](), an open-source, multilingual causal language model created by OpenLLM-France.
|
35 |
|
36 |
Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
|
37 |
|
38 |
-
|
39 |
|
40 |
|
41 |
## Training details
|
@@ -61,11 +61,12 @@ Lucie-7B-Instruct is trained on the following datasets:
|
|
61 |
### Training procedure
|
62 |
|
63 |
The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
|
64 |
-
* context length: 4096
|
65 |
* batch size: 1024
|
66 |
* max learning rate: 3e-5
|
67 |
* min learning rate: 3e-6
|
68 |
|
|
|
69 |
|
70 |
## Testing the model
|
71 |
|
|
|
31 |
|
32 |
## Model Description
|
33 |
|
34 |
+
Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B), an open-source, multilingual causal language model created by OpenLLM-France.
|
35 |
|
36 |
Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
|
37 |
|
38 |
+
While Lucie-7B-Instruct is trained on sequences of 4096 tokens, its base model, Lucie-7B has a context size of 32K tokens. Based on Needle-in-a-haystack evaluations, Lucie-7B-Instruct maintains the capacity of the base model to handle 32K-size context windows.
|
39 |
|
40 |
|
41 |
## Training details
|
|
|
61 |
### Training procedure
|
62 |
|
63 |
The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
|
64 |
+
* context length: 4096<sup>*</sup>
|
65 |
* batch size: 1024
|
66 |
* max learning rate: 3e-5
|
67 |
* min learning rate: 3e-6
|
68 |
|
69 |
+
(<sup>*</sup>As noted above, while Lucie-7B is trained on sequences of 4096 tokens, it maintains the capacity of the base model, Lucie-7B, to handle context sizes of up to 32K tokens.)
|
70 |
|
71 |
## Testing the model
|
72 |
|