Update README.md
Browse files
README.md
CHANGED
@@ -167,6 +167,7 @@ model-index:
|
|
167 |
source:
|
168 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
169 |
---
|
|
|
170 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
171 |
should probably proofread and complete it, then remove this comment. -->
|
172 |
|
@@ -235,9 +236,12 @@ Here's how you can run the model using the `pipeline()` function from 🤗 Trans
|
|
235 |
# Install transformers from source - only needed for versions <= v4.34
|
236 |
# pip install git+https://github.com/huggingface/transformers.git
|
237 |
# pip install accelerate
|
|
|
238 |
import torch
|
239 |
from transformers import pipeline
|
|
|
240 |
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
|
|
|
241 |
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
242 |
messages = [
|
243 |
{
|
@@ -299,6 +303,8 @@ The following hyperparameters were used during training:
|
|
299 |
### Training results
|
300 |
|
301 |
The table below shows the full set of DPO training metrics:
|
|
|
|
|
302 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
303 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
304 |
| 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
|
@@ -360,6 +366,7 @@ The table below shows the full set of DPO training metrics:
|
|
360 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
361 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
362 |
|
|
|
363 |
### Framework versions
|
364 |
|
365 |
- Transformers 4.35.0.dev0
|
@@ -370,6 +377,7 @@ The table below shows the full set of DPO training metrics:
|
|
370 |
## Citation
|
371 |
|
372 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
|
|
373 |
```
|
374 |
@misc{tunstall2023zephyr,
|
375 |
title={Zephyr: Direct Distillation of LM Alignment},
|
@@ -382,6 +390,7 @@ If you find Zephyr-7B-β is useful in your work, please cite it with:
|
|
382 |
```
|
383 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
384 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
|
|
|
385 |
| Metric | Value |
|
386 |
|-----------------------|---------------------------|
|
387 |
| Avg. | 52.15 |
|
@@ -391,4 +400,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
391 |
| TruthfulQA (0-shot) | 57.45 |
|
392 |
| Winogrande (5-shot) | 77.74 |
|
393 |
| GSM8K (5-shot) | 12.74 |
|
394 |
-
| DROP (3-shot) | 9.66 |
|
|
|
167 |
source:
|
168 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
169 |
---
|
170 |
+
|
171 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
172 |
should probably proofread and complete it, then remove this comment. -->
|
173 |
|
|
|
236 |
# Install transformers from source - only needed for versions <= v4.34
|
237 |
# pip install git+https://github.com/huggingface/transformers.git
|
238 |
# pip install accelerate
|
239 |
+
|
240 |
import torch
|
241 |
from transformers import pipeline
|
242 |
+
|
243 |
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
|
244 |
+
|
245 |
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
246 |
messages = [
|
247 |
{
|
|
|
303 |
### Training results
|
304 |
|
305 |
The table below shows the full set of DPO training metrics:
|
306 |
+
|
307 |
+
|
308 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
309 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
310 |
| 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
|
|
|
366 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
367 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
368 |
|
369 |
+
|
370 |
### Framework versions
|
371 |
|
372 |
- Transformers 4.35.0.dev0
|
|
|
377 |
## Citation
|
378 |
|
379 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
380 |
+
|
381 |
```
|
382 |
@misc{tunstall2023zephyr,
|
383 |
title={Zephyr: Direct Distillation of LM Alignment},
|
|
|
390 |
```
|
391 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
392 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
|
393 |
+
|
394 |
| Metric | Value |
|
395 |
|-----------------------|---------------------------|
|
396 |
| Avg. | 52.15 |
|
|
|
400 |
| TruthfulQA (0-shot) | 57.45 |
|
401 |
| Winogrande (5-shot) | 77.74 |
|
402 |
| GSM8K (5-shot) | 12.74 |
|
403 |
+
| DROP (3-shot) | 9.66 |
|