smallcloudai
/

Refact-1_6B-fim

@@ -589,6 +589,58 @@ You can start using it right now by downloading the
 And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
 # Architecture
@@ -646,58 +698,6 @@ and to perform well on a wide range of metrics. The best attempt took 40B tokens
 The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
 code comments. Its performance on non-English languages is lower, for sure.
-# It Works As a Chat
-The primary application of this model is code completion (infill) in multiple programming languages.
-But it works as a chat quite well.
-HumanEval results using instruction following (chat) format, against models specialized for chat only:
-Model                  | Size   | pass@1   | pass@10  |
------------------------|--------|----------|----------|
-<b>Refact-1.6-fim</b>  | 1.6b   |  38.4%   | 55.6%    |
-StableCode-instruct    |   3b   |  26.9%   | 36.2%    |
-OctoGeeX               |   6b   |  44.7%   |          |
-CodeLlama-instruct     |   7b   |  34.8%   | 64.3%    |
-CodeGen2.5-instruct    |   7b   |  36.2%   | 60.87    |
-CodeLlama-instruct     |  13b   |  42.7%   | 71.6%    |
-StarChat-β             |  15b   |  33.5%   |          |
-OctoCoder              |  15b   |  46.2%   |          |
-# Example
-Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
-```python
-# pip install -q transformers
-from transformers import AutoModelForCausalLM, AutoTokenizer
-checkpoint = "smallcloudai/Refact-1_6B-fim"
-device = "cuda" # for GPU usage or "cpu" for CPU usage
-tokenizer = AutoTokenizer.from_pretrained(checkpoint)
-model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
-prompt = '<fim_prefix>def print_hello_world():\n    """<fim_suffix>\n    print("Hello world!")<fim_middle>'
-inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
-outputs = model.generate(inputs, max_length=100, temperature=0.2)
-print("-"*80)
-print(tokenizer.decode(outputs[0]))
-```
-# Chat Format
-The same model works as chat (experimental).
-```python
-prompt_template = "<empty_output>SYSTEM {system}\n" \
-                  "<empty_output>USER {query}\n" \
-                  "<empty_output>ASSISTANT"
-prompt = prompt_template.format(system="You are a programming assistant",
-                                query="How do I sort a list in Python?")
 ```
 # Model Stats

 And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
+# It Works As a Chat
+The primary application of this model is code completion (infill) in multiple programming languages.
+But it works as a chat quite well.
+HumanEval results using instruction following (chat) format, against models specialized for chat only:
+Model                  | Size   | pass@1   | pass@10  |
+-----------------------|--------|----------|----------|
+<b>Refact-1.6-fim</b>  | 1.6b   |  38.4%   | 55.6%    |
+StableCode-instruct    |   3b   |  26.9%   | 36.2%    |
+OctoGeeX               |   6b   |  44.7%   |          |
+CodeLlama-instruct     |   7b   |  34.8%   | 64.3%    |
+CodeGen2.5-instruct    |   7b   |  36.2%   | 60.87    |
+CodeLlama-instruct     |  13b   |  42.7%   | 71.6%    |
+StarChat-β             |  15b   |  33.5%   |          |
+OctoCoder              |  15b   |  46.2%   |          |
+# Example
+Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
+```python
+# pip install -q transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "smallcloudai/Refact-1_6B-fim"
+device = "cuda" # for GPU usage or "cpu" for CPU usage
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
+prompt = '<fim_prefix>def print_hello_world():\n    """<fim_suffix>\n    print("Hello world!")<fim_middle>'
+inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
+outputs = model.generate(inputs, max_length=100, temperature=0.2)
+print("-"*80)
+print(tokenizer.decode(outputs[0]))
+```
+# Chat Format
+The same model works as chat (experimental).
+```python
+prompt_template = "<empty_output>SYSTEM {system}\n" \
+                  "<empty_output>USER {query}\n" \
+                  "<empty_output>ASSISTANT"
+prompt = prompt_template.format(system="You are a programming assistant",
+                                query="How do I sort a list in Python?")
 # Architecture
 The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
 code comments. Its performance on non-English languages is lower, for sure.
 ```
 # Model Stats