OpenLLM-France
/

Lucie-7B-Instruct

Text Generation

Model card Files Files and versions Community

juliehunter commited on 11 days ago

Commit

826086c

·

verified ·

1 Parent(s): 4632991

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -31,11 +31,11 @@ pipeline_tag: text-generation
 ## Model Description
-Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](), an open-source, multilingual causal language model created by OpenLLM-France.
 Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
 ## Training details
@@ -61,11 +61,12 @@ Lucie-7B-Instruct is trained on the following datasets:
 ### Training procedure
 The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
-* context length: 4096
 * batch size: 1024
 * max learning rate: 3e-5
 * min learning rate: 3e-6
 ## Testing the model

 ## Model Description
+Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B), an open-source, multilingual causal language model created by OpenLLM-France.
 Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
+While Lucie-7B-Instruct is trained on sequences of 4096 tokens, its base model, Lucie-7B has a context size of 32K tokens. Based on Needle-in-a-haystack evaluations, Lucie-7B-Instruct maintains the capacity of the base model to handle 32K-size context windows.
 ## Training details
 ### Training procedure
 The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
+* context length: 4096<sup>*</sup>
 * batch size: 1024
 * max learning rate: 3e-5
 * min learning rate: 3e-6
+(<sup>*</sup>As noted above, while Lucie-7B is trained on sequences of 4096 tokens, it maintains the capacity of the base model, Lucie-7B, to handle context sizes of up to 32K tokens.)
 ## Testing the model