yuewang-sf
commited on
Commit
•
ddd5b2c
1
Parent(s):
b22bdf0
Update README.md
Browse files
README.md
CHANGED
@@ -9,12 +9,12 @@ license: bsd-3-clause
|
|
9 |
[CodeT5+](https://github.com/salesforce/CodeT5/tree/main/CodeT5+) is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
|
10 |
It is introduced in the paper:
|
11 |
|
12 |
-
[CodeT5+: Open Code Large Language Models for Code Understanding and Generation](https://
|
13 |
by [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution).
|
14 |
|
15 |
Compared to the original CodeT5 family (base: `220M`, large: `770M`), CodeT5+ is pretrained with a diverse set of pretraining tasks including _span denoising_, _causal language modeling_, _contrastive learning_, and _text-code matching_ to learn rich representations from both unimodal code data and bimodal code-text data.
|
16 |
Additionally, it employs a simple yet effective _compute-efficient pretraining_ method to initialize the model components with frozen off-the-shelf LLMs such as [CodeGen](https://github.com/salesforce/CodeGen) to efficiently scale up the model (i.e. `2B`, `6B`, `16B`), and adopts a "shallow encoder and deep decoder" architecture.
|
17 |
-
Furthermore, it is instruction-tuned to align with natural language instructions (
|
18 |
|
19 |
## How to use
|
20 |
|
@@ -54,7 +54,7 @@ Specifically, CodeT5+ yields substantial performance gains on many downstream ta
|
|
54 |
8 text-to-code retrieval tasks (+3.2 avg. MRR), 2 line-level code completion tasks (+2.1 avg. Exact Match), and 2 retrieval-augmented code generation tasks (+5.8 avg. BLEU-4).
|
55 |
In 2 math programming tasks on MathQA-Python and GSM8K-Python, CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters.
|
56 |
Particularly, in the zero-shot text-to-code generation task on HumanEval benchmark, InstructCodeT5+ 16B sets new SoTA results of 35.0% pass@1 and 54.5% pass@10 against other open code LLMs, even surpassing the closed-source OpenAI code-cushman-001 mode
|
57 |
-
Please refer to the [paper](https://
|
58 |
|
59 |
Specifically for this checkpoint, it achieves 12.0% pass@1 on HumanEval in the zero-shot setting, which outperforms much larger LLMs such as Incoder 1.3B’s 8.9%, GPT-Neo 2.7B's 6.4%, and GPT-J 6B's 11.6%.
|
60 |
|
|
|
9 |
[CodeT5+](https://github.com/salesforce/CodeT5/tree/main/CodeT5+) is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. _encoder-only_, _decoder-only_, and _encoder-decoder_) to support a wide range of code understanding and generation tasks.
|
10 |
It is introduced in the paper:
|
11 |
|
12 |
+
[CodeT5+: Open Code Large Language Models for Code Understanding and Generation](https://arxiv.org/pdf/2305.07922.pdf)
|
13 |
by [Yue Wang](https://yuewang-cuhk.github.io/)\*, [Hung Le](https://sites.google.com/view/henryle2018/home?pli=1)\*, [Akhilesh Deepak Gotmare](https://akhileshgotmare.github.io/), [Nghi D.Q. Bui](https://bdqnghi.github.io/), [Junnan Li](https://sites.google.com/site/junnanlics), [Steven C.H. Hoi](https://sites.google.com/view/stevenhoi/home) (* indicates equal contribution).
|
14 |
|
15 |
Compared to the original CodeT5 family (base: `220M`, large: `770M`), CodeT5+ is pretrained with a diverse set of pretraining tasks including _span denoising_, _causal language modeling_, _contrastive learning_, and _text-code matching_ to learn rich representations from both unimodal code data and bimodal code-text data.
|
16 |
Additionally, it employs a simple yet effective _compute-efficient pretraining_ method to initialize the model components with frozen off-the-shelf LLMs such as [CodeGen](https://github.com/salesforce/CodeGen) to efficiently scale up the model (i.e. `2B`, `6B`, `16B`), and adopts a "shallow encoder and deep decoder" architecture.
|
17 |
+
Furthermore, it is instruction-tuned to align with natural language instructions (i.e. InstructCodeT5+ 16B) following [Code Alpaca](https://github.com/sahil280114/codealpaca).
|
18 |
|
19 |
## How to use
|
20 |
|
|
|
54 |
8 text-to-code retrieval tasks (+3.2 avg. MRR), 2 line-level code completion tasks (+2.1 avg. Exact Match), and 2 retrieval-augmented code generation tasks (+5.8 avg. BLEU-4).
|
55 |
In 2 math programming tasks on MathQA-Python and GSM8K-Python, CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters.
|
56 |
Particularly, in the zero-shot text-to-code generation task on HumanEval benchmark, InstructCodeT5+ 16B sets new SoTA results of 35.0% pass@1 and 54.5% pass@10 against other open code LLMs, even surpassing the closed-source OpenAI code-cushman-001 mode
|
57 |
+
Please refer to the [paper](https://arxiv.org/pdf/2305.07922.pdf) for more details.
|
58 |
|
59 |
Specifically for this checkpoint, it achieves 12.0% pass@1 on HumanEval in the zero-shot setting, which outperforms much larger LLMs such as Incoder 1.3B’s 8.9%, GPT-Neo 2.7B's 6.4%, and GPT-J 6B's 11.6%.
|
60 |
|