πŸ₯· Safurai-Csharp-34B

πŸ“ Article

πŸ“„ Paper

This is a codellama/CodeLlama-34b-hf model fine-tuned using QLoRA (4-bit precision) on 13B tokens of csharp evolved Q&A

We obtained state-of-the-art performance on the MultiPL-E code LLM benchmark for csharp, reaching 56% at pass@1 with n=5.

πŸ’» Quantization

This the AWQ quantized version of Safurai-Csharp-34B, it has been made by using the amazing AutoAWQ library.

πŸ”§ Training

It was trained on 2 x NVIDIA A100 PCIe 80GB in 7h 40m with the following configuration file:

base_model: codellama/CodeLlama-34b-hf
base_model_config: codellama/CodeLlama-34b-hf
model_type: LlamaForCausalLM
tokenizer_type: CodeLlamaTokenizer
is_llama_derived_model: true
hub_model_id: "Safurai/Evol-csharp-v1"

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: Safurai/EvolInstruct-csharp-16k-13B-Alpaca
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./qlora-out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: codellama-csharp
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0003

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 40
eval_steps: 40
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

πŸ“‰ Training loss curve:

πŸ“Š Dataset composition:

πŸ’» Usage for AWQ

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

quant_path = "Safurai/Safurai-Csharp-34B-AWQ"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = """\
A chat between a developer and an AI assistant. The assistant is an expert csharp programmer that can give useful and complete code responses.

USER: {prompt}
ASSISTANT:"""

tokens = tokenizer(
    prompt_template.format(prompt="How are you today?"), 
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=1024
)

Built with Axolotl

Downloads last month
12
Safetensors
Model size
4.97B params
Tensor type
I32
Β·
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.