Mistral-7B-text-to-sql-flash-attention-2
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the generator dataset.
original model
import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from trl import setup_chat_format
#Hugging Face model id model_id = "mistralai/Mistral-7B-Instruct-v0.1" #01 march 2024 AND 10/03/2024
#BitsAndBytesConfig int-4 config
bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 )
#Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, quantization_config=bnb_config ) tokenizer = AutoTokenizer.from_pretrained(model_id,use_fast=True) tokenizer.padding_side = 'right' # to prevent warnings
#We redefine the pad_token and pad_token_id with out of vocabulary token (unk_token)
tokenizer.pad_token = tokenizer.unk_token tokenizer.pad_token_id = tokenizer.unk_token_id
#set chat template to OAI chatML, remove if you start from a fine-tuned model model, tokenizer = setup_chat_format(model, tokenizer)
Dataset used for the tunning:
from datasets import load_dataset
#Convert dataset to OAI messages
system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA. SCHEMA: {schema}"""
def create_conversation(sample): return { "messages": [ {"role": "system", "content": system_message.format(schema=sample["context"])}, {"role": "user", "content": sample["question"]}, {"role": "assistant", "content": sample["answer"]} ] }
#Load dataset from the hub dataset = load_dataset("b-mc2/sql-create-context", split="train") dataset = dataset.shuffle().select(range(12500))
#Convert dataset to OAI messages dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)
#split dataset into 10,000 training samples and 2,500 test samples dataset = dataset.train_test_split(test_size=2500/12500)
print(dataset["train"][345]["messages"])
#save datasets to disk dataset["train"].to_json("train_dataset.json", orient="records") dataset["test"].to_json("test_dataset.json", orient="records")
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
Testing results
When evaluated on 1000 samples from the evaluation dataset, our model achieved an impressive accuracy of 80.90%. However, there's room for improvement. We could enhance the model's performance by exploring techniques like few-shot learning, RAG, and Self-healing to generate the SQL query.
CODE: https://github.com/frank-morales2020/MLxDL/blob/main/upload_model_hf.ipynb
Training results
Framework versions
- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 24
Model tree for frankmorales2020/Mistral-7B-text-to-sql-flash-attention-2
Base model
mistralai/Mistral-7B-v0.1