fix multiple typo in README (#2)
Browse files- fix multiple typo in README (ca5f034d4be3ffcc1fbc682ffc30cdea0f55de7e)
Co-authored-by: wen wen <[email protected]>
README.md
CHANGED
@@ -103,9 +103,7 @@ tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")
|
|
103 |
messages = [
|
104 |
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
|
105 |
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
106 |
-
{"role": "
|
107 |
-
{"role": "system", "content": "1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey."},
|
108 |
-
{"role": "system", "content": "2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
109 |
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
110 |
]
|
111 |
|
@@ -129,8 +127,7 @@ print(output[0]['generated_text'])
|
|
129 |
Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
|
130 |
|
131 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
132 |
-
+
|
133 |
-
+ Optimized inference: use the **ONNX** models [4K](https://aka.ms/Phi3-mini-128k-instruct-onnx)
|
134 |
|
135 |
## Responsible AI Considerations
|
136 |
|
@@ -156,7 +153,7 @@ Developers should apply responsible AI best practices and are responsible for en
|
|
156 |
|
157 |
* Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
|
158 |
* Inputs: Text. It is best suited for prompts using chat format.
|
159 |
-
* Context length:
|
160 |
* GPUs: 512 H100-80G
|
161 |
* Training time: 7 days
|
162 |
* Training data: 3.3T tokens
|
@@ -187,7 +184,7 @@ More specifically, we do not change prompts, pick different few-shot examples, c
|
|
187 |
|
188 |
The number of k–shot examples is listed per-benchmark.
|
189 |
|
190 |
-
| | Phi-3-Mini-
|
191 |
|---|---|---|---|---|---|---|---|---|---|
|
192 |
| MMLU <br>5-Shot | 68.8 | 75.3 | 78.2 | 56.3 | 61.7 | 63.6 | 66.0 | 68.4 | 71.4 |
|
193 |
| HellaSwag <br> 5-Shot | 76.7 | 78.7 | 83.2 | 53.6 | 58.5 | 49.8 | 69.5 | 70.4 | 78.8 |
|
|
|
103 |
messages = [
|
104 |
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
|
105 |
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
106 |
+
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
|
|
|
|
107 |
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
108 |
]
|
109 |
|
|
|
127 |
Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
|
128 |
|
129 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
130 |
+
+ Optimized inference: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
|
|
|
131 |
|
132 |
## Responsible AI Considerations
|
133 |
|
|
|
153 |
|
154 |
* Architecture: Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
|
155 |
* Inputs: Text. It is best suited for prompts using chat format.
|
156 |
+
* Context length: 128K tokens
|
157 |
* GPUs: 512 H100-80G
|
158 |
* Training time: 7 days
|
159 |
* Training data: 3.3T tokens
|
|
|
184 |
|
185 |
The number of k–shot examples is listed per-benchmark.
|
186 |
|
187 |
+
| | Phi-3-Mini-128K-In<br>3.8b | Phi-3-Small<br>7b (preview) | Phi-3-Medium<br>14b (preview) | Phi-2<br>2.7b | Mistral<br>7b | Gemma<br>7b | Llama-3-In<br>8b | Mixtral<br>8x7b | GPT-3.5<br>version 1106 |
|
188 |
|---|---|---|---|---|---|---|---|---|---|
|
189 |
| MMLU <br>5-Shot | 68.8 | 75.3 | 78.2 | 56.3 | 61.7 | 63.6 | 66.0 | 68.4 | 71.4 |
|
190 |
| HellaSwag <br> 5-Shot | 76.7 | 78.7 | 83.2 | 53.6 | 58.5 | 49.8 | 69.5 | 70.4 | 78.8 |
|