juliehunter commited on
Commit
826086c
·
verified ·
1 Parent(s): 4632991

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -31,11 +31,11 @@ pipeline_tag: text-generation
31
 
32
  ## Model Description
33
 
34
- Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](), an open-source, multilingual causal language model created by OpenLLM-France.
35
 
36
  Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
37
 
38
-
39
 
40
 
41
  ## Training details
@@ -61,11 +61,12 @@ Lucie-7B-Instruct is trained on the following datasets:
61
  ### Training procedure
62
 
63
  The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
64
- * context length: 4096
65
  * batch size: 1024
66
  * max learning rate: 3e-5
67
  * min learning rate: 3e-6
68
 
 
69
 
70
  ## Testing the model
71
 
 
31
 
32
  ## Model Description
33
 
34
+ Lucie-7B-Instruct is a fine-tuned version of [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B), an open-source, multilingual causal language model created by OpenLLM-France.
35
 
36
  Lucie-7B-Instruct is fine-tuned on synthetic instructions produced by ChatGPT and Gemma and a small set of customized prompts about OpenLLM and Lucie.
37
 
38
+ While Lucie-7B-Instruct is trained on sequences of 4096 tokens, its base model, Lucie-7B has a context size of 32K tokens. Based on Needle-in-a-haystack evaluations, Lucie-7B-Instruct maintains the capacity of the base model to handle 32K-size context windows.
39
 
40
 
41
  ## Training details
 
61
  ### Training procedure
62
 
63
  The model architecture and hyperparameters are the same as for [Lucie-7B](https://huggingface.co/OpenLLM-France/Lucie-7B) during the annealing phase with the following exceptions:
64
+ * context length: 4096<sup>*</sup>
65
  * batch size: 1024
66
  * max learning rate: 3e-5
67
  * min learning rate: 3e-6
68
 
69
+ (<sup>*</sup>As noted above, while Lucie-7B is trained on sequences of 4096 tokens, it maintains the capacity of the base model, Lucie-7B, to handle context sizes of up to 32K tokens.)
70
 
71
  ## Testing the model
72