Move example section higher
#2
by
katek
- opened
README.md
CHANGED
@@ -589,6 +589,58 @@ You can start using it right now by downloading the
|
|
589 |
|
590 |
And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
|
591 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
592 |
|
593 |
# Architecture
|
594 |
|
@@ -646,58 +698,6 @@ and to perform well on a wide range of metrics. The best attempt took 40B tokens
|
|
646 |
The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
|
647 |
code comments. Its performance on non-English languages is lower, for sure.
|
648 |
|
649 |
-
|
650 |
-
# It Works As a Chat
|
651 |
-
|
652 |
-
The primary application of this model is code completion (infill) in multiple programming languages.
|
653 |
-
But it works as a chat quite well.
|
654 |
-
|
655 |
-
HumanEval results using instruction following (chat) format, against models specialized for chat only:
|
656 |
-
|
657 |
-
Model | Size | pass@1 | pass@10 |
|
658 |
-
-----------------------|--------|----------|----------|
|
659 |
-
<b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
|
660 |
-
StableCode-instruct | 3b | 26.9% | 36.2% |
|
661 |
-
OctoGeeX | 6b | 44.7% | |
|
662 |
-
CodeLlama-instruct | 7b | 34.8% | 64.3% |
|
663 |
-
CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
|
664 |
-
CodeLlama-instruct | 13b | 42.7% | 71.6% |
|
665 |
-
StarChat-β | 15b | 33.5% | |
|
666 |
-
OctoCoder | 15b | 46.2% | |
|
667 |
-
|
668 |
-
|
669 |
-
# Example
|
670 |
-
|
671 |
-
Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
|
672 |
-
|
673 |
-
```python
|
674 |
-
# pip install -q transformers
|
675 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
676 |
-
|
677 |
-
checkpoint = "smallcloudai/Refact-1_6B-fim"
|
678 |
-
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
679 |
-
|
680 |
-
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
681 |
-
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
|
682 |
-
|
683 |
-
prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
|
684 |
-
|
685 |
-
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
|
686 |
-
outputs = model.generate(inputs, max_length=100, temperature=0.2)
|
687 |
-
print("-"*80)
|
688 |
-
print(tokenizer.decode(outputs[0]))
|
689 |
-
```
|
690 |
-
|
691 |
-
# Chat Format
|
692 |
-
|
693 |
-
The same model works as chat (experimental).
|
694 |
-
|
695 |
-
```python
|
696 |
-
prompt_template = "<empty_output>SYSTEM {system}\n" \
|
697 |
-
"<empty_output>USER {query}\n" \
|
698 |
-
"<empty_output>ASSISTANT"
|
699 |
-
prompt = prompt_template.format(system="You are a programming assistant",
|
700 |
-
query="How do I sort a list in Python?")
|
701 |
```
|
702 |
|
703 |
# Model Stats
|
|
|
589 |
|
590 |
And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
|
591 |
|
592 |
+
# It Works As a Chat
|
593 |
+
|
594 |
+
The primary application of this model is code completion (infill) in multiple programming languages.
|
595 |
+
But it works as a chat quite well.
|
596 |
+
|
597 |
+
HumanEval results using instruction following (chat) format, against models specialized for chat only:
|
598 |
+
|
599 |
+
Model | Size | pass@1 | pass@10 |
|
600 |
+
-----------------------|--------|----------|----------|
|
601 |
+
<b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
|
602 |
+
StableCode-instruct | 3b | 26.9% | 36.2% |
|
603 |
+
OctoGeeX | 6b | 44.7% | |
|
604 |
+
CodeLlama-instruct | 7b | 34.8% | 64.3% |
|
605 |
+
CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
|
606 |
+
CodeLlama-instruct | 13b | 42.7% | 71.6% |
|
607 |
+
StarChat-β | 15b | 33.5% | |
|
608 |
+
OctoCoder | 15b | 46.2% | |
|
609 |
+
|
610 |
+
|
611 |
+
# Example
|
612 |
+
|
613 |
+
Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
|
614 |
+
|
615 |
+
```python
|
616 |
+
# pip install -q transformers
|
617 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
618 |
+
|
619 |
+
checkpoint = "smallcloudai/Refact-1_6B-fim"
|
620 |
+
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
621 |
+
|
622 |
+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
623 |
+
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
|
624 |
+
|
625 |
+
prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
|
626 |
+
|
627 |
+
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
|
628 |
+
outputs = model.generate(inputs, max_length=100, temperature=0.2)
|
629 |
+
print("-"*80)
|
630 |
+
print(tokenizer.decode(outputs[0]))
|
631 |
+
```
|
632 |
+
|
633 |
+
# Chat Format
|
634 |
+
|
635 |
+
The same model works as chat (experimental).
|
636 |
+
|
637 |
+
```python
|
638 |
+
prompt_template = "<empty_output>SYSTEM {system}\n" \
|
639 |
+
"<empty_output>USER {query}\n" \
|
640 |
+
"<empty_output>ASSISTANT"
|
641 |
+
prompt = prompt_template.format(system="You are a programming assistant",
|
642 |
+
query="How do I sort a list in Python?")
|
643 |
+
|
644 |
|
645 |
# Architecture
|
646 |
|
|
|
698 |
The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
|
699 |
code comments. Its performance on non-English languages is lower, for sure.
|
700 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
701 |
```
|
702 |
|
703 |
# Model Stats
|