Medical-Llama3-8B-GPTQ

This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. The model was trained on the AI Medical Chatbot dataset, which can be found at ruslanmv/ai-medical-chatbot. This fine-tuned model leverages technique GPTQ for efficient inference with 4-bit quantization. GPTQ is a technique for compressing deep learning model weights through a 4-bit quantization process that targets efficient GPU inference. This approach aims to reduce model size by converting weights to a 4-bit representation while controlling error. For better performance during inference, GPTQ dynamically restores the weights to float16, balancing the benefits of reduced memory usage with computational efficiency.

Model: ruslanmv/Medical-Llama3-8B-GPTQ

  • Developed by: ruslanmv
  • License: apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Installation

Prerequisites:

  • A system with CUDA support is highly recommended for optimal performance.
  • Python 3.10 or later

Installation Steps:

  1. Install required Python libraries:

    pip install transformers==4.40.0
    

Usage

Here's an example of how to use the Medical-Llama3-8B-GPTQ model to generate an answer to a medical question:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
device = "cuda:0" if torch.cuda.is_available() else "cpu"
repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ"


# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_id, 
                                          device=device, 
                                           use_safetensors=True, 
                                           use_triton=False)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

def create_prompt(user_query):
  B_INST, E_INST = "<s>[INST]", "[/INST]"
  B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
  DEFAULT_SYSTEM_PROMPT = """\
  You are an AI Medical Chatbot Assistant, I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers.
  If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
  SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
  instruction = f"User asks: {user_query}\n"
  prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
  return prompt.strip()

def generate_text(model, tokenizer, prompt,
                  max_length=200,
                  temperature=0.7,
                  num_return_sequences=1):

    prompt = create_prompt(user_query)
    # Tokenize the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)  # Move input_ids to the same device as the model
    # Generate text
    output = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        temperature=temperature,
        num_return_sequences=num_return_sequences,
        pad_token_id=tokenizer.eos_token_id,  # Set pad token to end of sequence token
        do_sample=True
    )    
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
  
   # Split the generated text based on the prompt and take the portion after it
    generated_text = generated_text.split(prompt)[-1].strip()

    return generated_text

Inference Example

This section showcases how to use the model for inference.

User Query:

user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?"

Answer:

generated_text = generate_text(model, tokenizer, user_query)    
print(generated_text)

You will get

I understand your concern. It could be attributed to hypothyroidism. You may also have perifollicular inflammation. I suggest you to get your thyroid profile done to rule out hypothyroidism. I would also suggest you to use a mild moisturizing cream, with sunscreen, to

License

This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.

Downloads last month
24
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ruslanmv/Medical-Llama3-8B-GPTQ

Quantized
(240)
this model

Dataset used to train ruslanmv/Medical-Llama3-8B-GPTQ