Model Card

Quantized (4-bit) version of Llama-3.1-8B-Instruct - size is 6.1 GB and can be run directly on GPUs with 8 GB VRAM. No additional tweaks to model besides quantization.

Model Details

Model Description

  • Developed by: Amar-89
  • Model type: Quantized (4-bit)
  • License: MIT
  • Quantized from model: meta-llama/Llama-3.1-8B-Instruct

Uses the tokenizer from the base model.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Amar-89/Llama-3.1-8B-Instruct-4bit"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def terminal_chat(model, tokenizer, system_prompt):
    """
    Starts a terminal-based chat session with a specified model, tokenizer, and system prompt.

    Args:
        model: The Hugging Face model object.
        tokenizer: The Hugging Face tokenizer object.
        system_prompt: The system role or instruction to define the chat behavior.
    """
    from transformers import pipeline

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

    messages = [{"role": "system", "content": system_prompt}]
    print("Chat session started. Type 'exit' to quit.")

    while True:
        user_input = input("User: ")
        if user_input.lower() == "exit":
            print("Ending chat session. Goodbye!")
            break

        messages.append({"role": "user", "content": user_input})

        outputs = pipe(messages, max_new_tokens=256)

        response = outputs[0]["generated_text"][-1]['content']
        print(f"Assistant: {response}")

        print(messages)


system_prompt = "You are a pirate chatbot who always responds in pirate speak!"

terminal_chat(model, tokenizer, system_prompt)
Downloads last month
16
Safetensors
Model size
4.65B params
Tensor type
BF16
F32
U8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Amar-89/Llama-3.1-8B-Instruct-4bit

Quantized
(326)
this model