File size: 9,562 Bytes
578a2e6
 
ab7a191
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
578a2e6
7dcd27c
673525f
9279c3a
673525f
 
 
 
 
ab7a191
7dcd27c
ab7a191
 
 
 
7dcd27c
 
 
 
f77c6a5
ab7a191
f77c6a5
05b63d7
f77c6a5
7dcd27c
 
 
 
 
e79a3da
7dcd27c
 
 
 
 
 
 
 
 
 
 
e79a3da
7dcd27c
 
 
 
 
 
 
 
 
 
 
 
ab7a191
 
 
 
 
 
1f7a3d3
ab7a191
dc2636c
f77c6a5
dc2636c
479dc62
f77c6a5
 
 
 
 
 
 
 
5cdc34e
f77c6a5
 
d869ce1
 
 
 
 
f77c6a5
61e75d6
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: llama2
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: WizardCoder-Python-34B-V1.0
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.732
      verified: false
---

<p align="center">
πŸ€— <a href="https://huggingface.co./WizardLM" target="_blank">HF Repo</a>  β€’πŸ± <a href="https://github.com/nlpxucan/WizardLM" target="_blank">Github Repo</a> β€’ 🐦 <a href="https://twitter.com/WizardLM_AI" target="_blank">Twitter</a> β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a>  β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>    β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a> <br>
</p>
<p align="center">
    πŸ‘‹ Join our <a href="https://discord.gg/VZjjHtWrKs" target="_blank">Discord</a>
</p>

## News

- πŸ”₯πŸ”₯πŸ”₯[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
- [2023/06/16] We released **WizardCoder-15B-V1.0** , which achieves the **57.3 pass@1** and surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).

❗Note: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of [OpenAI](https://arxiv.org/abs/2303.08774). The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).


|  Model  |  Checkpoint  | Paper    | HumanEval  |   MBPP | Demo | License |
| ----- |------| ---- |------|-------| ----- |  ----- | 
|  WizardCoder-Python-34B-V1.0  |   πŸ€— <a href="https://huggingface.co./WizardLM/WizardCoder-Python-34B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  73.2   | 61.2 | [Demo](http://47.103.63.15:50085/) |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>  |
|  WizardCoder-15B-V1.0  |   πŸ€— <a href="https://huggingface.co./WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  59.8   |50.6 | -- |  <a href="https://huggingface.co./spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |
|  WizardCoder-Python-13B-V1.0  |   πŸ€— <a href="https://huggingface.co./WizardLM/WizardCoder-Python-13B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  64.0   | 55.6 | -- |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>  |
|  WizardCoder-3B-V1.0  |   πŸ€— <a href="https://huggingface.co./WizardLM/WizardCoder-3B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  34.8   |37.4 | [Demo](http://47.103.63.15:50086/) |  <a href="https://huggingface.co./spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |
|  WizardCoder-1B-V1.0  |   πŸ€— <a href="https://huggingface.co./WizardLM/WizardCoder-1B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  23.8   |28.6 | -- |  <a href="https://huggingface.co./spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |


-  Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
-  Our **WizardMath-70B-V1.0** model achieves  **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM, and achieves  **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.

<font size=4>
    
| Model | Checkpoint | Paper  | GSM8k | MATH  |Online Demo| License|
| ----- |------| ---- |------|-------| ----- | ----- |
| WizardMath-70B-V1.0 | πŸ€— <a href="https://huggingface.co./WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **81.6**  |  **22.7**	|[Demo](http://47.103.63.15:50083/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a> |
| WizardMath-13B-V1.0 | πŸ€— <a href="https://huggingface.co./WizardLM/WizardMath-13B-V1.0" target="_blank">HF Link</a> |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **63.9**  |  **14.0** |[Demo](http://47.103.63.15:50082/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> |
| WizardMath-7B-V1.0 | πŸ€— <a href="https://huggingface.co./WizardLM/WizardMath-7B-V1.0" target="_blank">HF Link</a>  |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| 	 **54.9**  |  **10.7** | [Demo ](http://47.103.63.15:50080/)|  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a>|
</font>


- [08/09/2023] We released **WizardLM-70B-V1.0** model. Here is [Full Model Weight](https://huggingface.co./WizardLM/WizardLM-70B-V1.0). 

<font size=4>
    
   
| <sup>Model</sup> | <sup>Checkpoint</sup> | <sup>Paper</sup> |<sup>MT-Bench</sup> | <sup>AlpacaEval</sup>  | <sup>GSM8k</sup> | <sup>HumanEval</sup>  | <sup>License</sup>|
| ----- |------| ---- |------|-------| ----- | ----- | ----- | 
| <sup>**WizardLM-70B-V1.0**</sup> | <sup>πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>|<sup>πŸ“ƒ**Coming Soon**</sup>| <sup>**7.78**</sup> | <sup>**92.91%**</sup>	 |<sup>**77.6%**</sup>	 | <sup>   **50.6**</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
| <sup>WizardLM-13B-V1.2</sup> | <sup>πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>|  | <sup>7.06</sup> | <sup>89.17%</sup>	 |<sup>55.3%</sup>	 | <sup>36.6   </sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
| <sup>WizardLM-13B-V1.1</sup> |<sup> πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup> |  | <sup>6.76</sup>  |<sup>86.32%</sup>	 | 	 | <sup>25.0   </sup>| <sup>Non-commercial</sup>|
| <sup>WizardLM-30B-V1.0</sup> | <sup>πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-30B-V1.0" target="_blank">HF Link</a></sup>  | | <sup>7.01</sup> |                    | |  <sup>37.8  </sup>| <sup>Non-commercial</sup> |
| <sup>WizardLM-13B-V1.0</sup> | <sup>πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-13B-V1.0" target="_blank">HF Link</a> </sup> |  | <sup>6.35</sup> | <sup>75.31%</sup> |  | <sup> 24.0   </sup> | <sup>Non-commercial</sup>|
| <sup>WizardLM-7B-V1.0 </sup>|  <sup>πŸ€— <a href="https://huggingface.co./WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup> |<sup> πŸ“ƒ <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>|  |  |  |<sup>19.1 </sup>|<sup> Non-commercial</sup>|
</font>


## Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.

πŸ”₯ The following figure shows that our **WizardCoder-Python-34B-V1.0 attains the second position in this benchmark**, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).

<p align="center" width="100%">
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/compare_sota.png" alt="WizardCoder" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a>
</p>

## Prompt Format
```
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
```

## Inference Demo Script

We provide the inference demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo).

## Citation

Please cite the repo if you use the data, method or code in this repo.

```
@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_WizardLM__WizardCoder-Python-34B-V1.0)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 46.83   |
| ARC (25-shot)         | 52.13          |
| HellaSwag (10-shot)   | 74.78    |
| MMLU (5-shot)         | 49.15         |
| TruthfulQA (0-shot)   | 48.85   |
| Winogrande (5-shot)   | 68.35   |
| GSM8K (5-shot)        | 9.48        |
| DROP (3-shot)         | 25.06         |