Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,37 @@
|
|
1 |
---
|
2 |
license: llama2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
|
|
5 |
|
6 |
-
- We released **WizardCoder-
|
|
|
|
|
|
|
7 |
|
8 |
|
9 |
| Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License |
|
10 |
| ----- |------| ---- |------|-------| ----- | ----- |
|
11 |
-
| WizardCoder-
|
12 |
-
|
13 |
|
14 |
|
15 |
- Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
|
@@ -40,3 +62,11 @@ license: llama2
|
|
40 |
| <sup>WizardLM-7B-V1.0 </sup>| <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup> |<sup> π <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>| | | |<sup>19.1 </sup>|<sup> Non-commercial</sup>|
|
41 |
</font>
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: llama2
|
3 |
+
metrics:
|
4 |
+
- code_eval
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- code
|
8 |
+
model-index:
|
9 |
+
- name: WizardCoder-Python-34B-V1.0
|
10 |
+
results:
|
11 |
+
- task:
|
12 |
+
type: text-generation
|
13 |
+
dataset:
|
14 |
+
type: openai_humaneval
|
15 |
+
name: HumanEval
|
16 |
+
metrics:
|
17 |
+
- name: pass@1
|
18 |
+
type: pass@1
|
19 |
+
value: 0.732
|
20 |
+
verified: false
|
21 |
---
|
22 |
|
23 |
+
## News
|
24 |
|
25 |
+
- π₯π₯π₯[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
|
26 |
+
- [2023/06/16] We released **WizardCoder-15B-V1.0** , which achieves the **57.3 pass@1** and surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
|
27 |
+
|
28 |
+
βNote: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of [OpenAI](https://arxiv.org/abs/2303.08774). The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).
|
29 |
|
30 |
|
31 |
| Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License |
|
32 |
| ----- |------| ---- |------|-------| ----- | ----- |
|
33 |
+
| WizardCoder-Python-34B-V1.0 | π€ <a href="" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 73.2 | 61.2 | TBD | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> |
|
34 |
+
| WizardCoder-15B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 59.8 |50.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> |
|
35 |
|
36 |
|
37 |
- Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
|
|
|
62 |
| <sup>WizardLM-7B-V1.0 </sup>| <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup> |<sup> π <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>| | | |<sup>19.1 </sup>|<sup> Non-commercial</sup>|
|
63 |
</font>
|
64 |
|
65 |
+
|
66 |
+
## Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.
|
67 |
+
|
68 |
+
π₯ The following figure shows that our **WizardCoder-Python-34B-V1.0 attains the second position in this benchmark**, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).
|
69 |
+
|
70 |
+
<p align="center" width="100%">
|
71 |
+
<a ><img src="imgs/compare_sota.png" alt="WizardCoder" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a>
|
72 |
+
</p>
|