artidoro commited on
Commit
db0835a
1 Parent(s): 24a4437

update readme

Browse files
Files changed (1) hide show
  1. README.md +15 -20
README.md CHANGED
@@ -6,7 +6,7 @@ license: cc-by-nc-4.0
6
 
7
  | [Paper](https://arxiv.org/abs/2305.14314) | [Code](https://github.com/artidoro/qlora) |
8
 
9
- **The `LLaMA-2 QLoRA OpenOrca` are open-source models obtained through 4-bit QLoRA tuning of LLaMA-2 base models 240k exmaples of OpenOrca.**
10
 
11
  ⚠️ These models are purely intended for research purposes and could produce problematic outputs.
12
 
@@ -17,7 +17,7 @@ license: cc-by-nc-4.0
17
  - **Lightweight** checkpoints which only contain adapter weights.
18
 
19
  ## License and Intended Use
20
- Note the use of these adapter weights, requires access to the LLaMA-2 model weighs and therefore should be used according to the LLaMA-2 license.
21
 
22
  ## Usage
23
  Here is an example of how you would load the model 4-bits:
@@ -47,28 +47,21 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
  ```
48
  Inference can then be performed as usual with HF models as follows:
49
  ```python
50
- prompt = "Introduce yourself"
51
  formatted_prompt = (
52
- f"A chat between a curious human and an artificial intelligence assistant."
53
- f"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
54
- f"### Human: {prompt} ### Assistant:"
55
  )
56
  inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda:0")
57
  outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
58
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
59
  ```
60
- Expected output similar to the following:
61
- ```
62
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
63
- ### Human: Introduce yourself ### Assistant: I am an artificial intelligence assistant. I am here to help you with any questions you may have.
64
- ```
65
 
66
  ## Model Card
67
  **Architecture**: The models released here are LoRA adapters to be used on top of LLaMA-2 models. They are added to all layers. For all model sizes, we use $r=64$.
68
 
69
  **Base Model**: These models use LLaMA-2 as base model. LLaMA is a causal language model pretrained on a large corpus of text. See [LLaMA-2 paper](https://arxiv.org/abs/2307.09288) for more details. Note that these models can inherit biases and limitations of the base model.
70
 
71
- **Finetuning Data**: These models are finetuned on 240k examples of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset.
72
 
73
 
74
  **Languages**: The different datasets cover different languages. We direct to the various papers and resources describing the datasets for more details.
@@ -84,13 +77,20 @@ For the finetuning process, we use constant learning rate schedule and paged Ada
84
  ### Training hyperparameters
85
  | Parameters | Dataset | Batch size | LR | Steps | Source Length | Target Length |
86
  |------------|----------|------------|------|-------|---------------|---------------|
87
- | 7B | All | 16 | 2e-4 | 10000 | 384 | 128 |
88
- | 13B | All | 16 | 2e-4 | 10000 | 384 | 128 |
89
- | 70B | All | 64 | 1e-4 | 2500 | 384 | 128 |
90
 
91
  ### Evaluation
92
  We use the MMLU benchmark to measure performance on a range of language understanding tasks. This is a multiple-choice benchmark covering 57 tasks including elementary mathematics, US history, computer science, law, and more. We report 5-shot test accuracy.
93
 
 
 
 
 
 
 
 
94
  Dataset | 7B | 13B | 33B | 65B
95
  ---|---|---|---|---
96
  LLaMA-1 no tuning | 35.1 | 46.9 | 57.8 | 63.4
@@ -103,11 +103,6 @@ We use the MMLU benchmark to measure performance on a range of language understa
103
  Alpaca | 38.8 | 47.8 | 57.3 | 62.5
104
  FLAN v2 | 44.5 | 51.4 | 59.2 | 63.9
105
 
106
- Dataset | 7B | 13B | 34B | 70B
107
- ---|---|---|---|---
108
- LLaMA-2 no tuning | 45.3 | 54.8 | 62.6 | 68.9
109
- OpenOrca | 45.0 | | | 69.0
110
-
111
 
112
  ## Citation
113
 
 
6
 
7
  | [Paper](https://arxiv.org/abs/2305.14314) | [Code](https://github.com/artidoro/qlora) |
8
 
9
+ **The `LLaMA-2 QLoRA OpenOrca` are open-source models obtained through 4-bit QLoRA tuning of LLaMA-2 base models 240k exmaples of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca).**
10
 
11
  ⚠️ These models are purely intended for research purposes and could produce problematic outputs.
12
 
 
17
  - **Lightweight** checkpoints which only contain adapter weights.
18
 
19
  ## License and Intended Use
20
+ Note the use of these adapter weights, requires access to the LLaMA-2 model weighs and therefore should be used according to the LLaMA-2 license. The adapter weights are trained on data obtained from OpenAI GPT-3.5 and GPT-4 models (see more details in the Finetuning Data section). As such any use of these adapters should follow their license.
21
 
22
  ## Usage
23
  Here is an example of how you would load the model 4-bits:
 
47
  ```
48
  Inference can then be performed as usual with HF models as follows:
49
  ```python
50
+ question = "Explain Einstein's theory of special relativity."
51
  formatted_prompt = (
52
+ f"### Instruction: {question}\n\n### Response:"
 
 
53
  )
54
  inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda:0")
55
  outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
56
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
57
  ```
 
 
 
 
 
58
 
59
  ## Model Card
60
  **Architecture**: The models released here are LoRA adapters to be used on top of LLaMA-2 models. They are added to all layers. For all model sizes, we use $r=64$.
61
 
62
  **Base Model**: These models use LLaMA-2 as base model. LLaMA is a causal language model pretrained on a large corpus of text. See [LLaMA-2 paper](https://arxiv.org/abs/2307.09288) for more details. Note that these models can inherit biases and limitations of the base model.
63
 
64
+ **Finetuning Data**: These models are finetuned on 240k examples of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset. The OpenOrca dataset is a replica of the [Orca](https://arxiv.org/abs/2306.02707) dataset which uses FLAN v2 prompts and GPT3.5/4 completions.
65
 
66
 
67
  **Languages**: The different datasets cover different languages. We direct to the various papers and resources describing the datasets for more details.
 
77
  ### Training hyperparameters
78
  | Parameters | Dataset | Batch size | LR | Steps | Source Length | Target Length |
79
  |------------|----------|------------|------|-------|---------------|---------------|
80
+ | 7B | OpenOrca | 16 | 2e-4 | 15000 | 384 | 128 |
81
+ | 13B | OpenOrca | 16 | 2e-4 | 15000 | 384 | 128 |
82
+ | 70B | OpenOrca | 64 | 1e-4 | 3750 | 384 | 128 |
83
 
84
  ### Evaluation
85
  We use the MMLU benchmark to measure performance on a range of language understanding tasks. This is a multiple-choice benchmark covering 57 tasks including elementary mathematics, US history, computer science, law, and more. We report 5-shot test accuracy.
86
 
87
+ Dataset | 7B | 13B | 34B | 70B
88
+ ---|---|---|---|---
89
+ LLaMA-2 no tuning | 45.3 | 54.8 | 62.6 | 68.9
90
+ OpenOrca | 45.0 | | | 69.0
91
+
92
+ For reference here are the MMLU results of QLoRA finetuning on other datasets:
93
+
94
  Dataset | 7B | 13B | 33B | 65B
95
  ---|---|---|---|---
96
  LLaMA-1 no tuning | 35.1 | 46.9 | 57.8 | 63.4
 
103
  Alpaca | 38.8 | 47.8 | 57.3 | 62.5
104
  FLAN v2 | 44.5 | 51.4 | 59.2 | 63.9
105
 
 
 
 
 
 
106
 
107
  ## Citation
108