Phi-3-mini-128k-instruct-int4
- Orginal model : microsoft/Phi-3-mini-128k-instruct
- Quantized using intel/auto-round
Description
Phi-3-mini-128k-instruct-int4 is an int4 model with group_size 128 of the microsoft/Phi-3-mini-128k-instruct.
The above model was quantized using AutoRound(Advanced Weight-Only Quantization Algorithm for LLMs) released by intel.
you can find out more in detail through the the GitHub Repository.
Training details
Cloning a repository(AutoRound)
git clone https://github.com/intel/auto-round
Enter into the examples/language-modeling folder
cd auto-round/examples/language-modeling
pip install -r requirements.txt
Install FlashAttention-2
pip install flash_attn==2.5.8
Here's an simplified code for quantization. In order to save memory in quantization, we set the batch size to 1.
python main.py \
--model_name "microsoft/Phi-3-mini-128k-instruct" \
--bits 4 \
--group_size 128 \
--train_bs 1 \
--gradient_accumulate_steps 8 \
--deployment_device 'gpu' \
--output_dir "./save_ckpt"
Model inference
Install the necessary packages
pip install auto_gptq
pip install optimum
pip install -U accelerate bitsandbytes datasets peft transformers
Example codes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
"ssuncheol/Phi-3-mini-128k-instruct-int4",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("ssuncheol/Phi-3-mini-128k-instruct-int4")
messages = [
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
License
The model is licensed under the MIT license.
- Downloads last month
- 118
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.