File size: 3,454 Bytes
7308e37
7228bc6
d8baa87
7308e37
 
7228bc6
7308e37
1686712
7308e37
 
 
1686712
7308e37
1686712
 
 
 
59dbaf0
 
 
 
 
 
1686712
 
 
59dbaf0
1686712
 
 
 
 
 
 
 
 
 
333eadc
1686712
 
 
 
 
333eadc
1686712
 
333eadc
 
1686712
 
 
 
 
 
 
 
333eadc
1686712
333eadc
1686712
333eadc
 
 
 
1686712
 
333eadc
1686712
333eadc
1686712
 
 
 
 
333eadc
 
 
 
 
 
 
 
1686712
7308e37
59dbaf0
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
library_name: transformers
---

# Mistral-7B-Instruct-SQL-ian

## About the Model

<!-- Provide a longer summary of what this model is. -->

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3. https://huggingface.co./datasets/gretelai/synthetic_text_to_sql

- **Model Name:** Mistral-7B-Instruct-SQL-ian
- **Developed by:** kubwa
- **Base Model Name:** mistralai/Mistral-7B-Instruct-v0.3
- **Base Model URL:** [Mistral-7B-Instruct-v0.3](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.3)
- **Base Model Description:** The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
  Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2

  - Extended vocabulary to 32768
  - Supports v3 Tokenizer
  - Supports function calling
- **Dataset Name:** gretelai/synthetic_text_to_sql
- **Dataset URL:** [synthetic_text_to_sql](https://huggingface.co./datasets/gretelai/synthetic_text_to_sql)
- **Dataset Description:** gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0.
  
## Prompt Template

```
<s>
### Instruction:
{question}

### Context:
{schema}

### Response:
```

## How to Use it

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("kubwa/Mistral-7B-Instruct-SQL-ian")
tokenizer = AutoTokenizer.from_pretrained("kubwa/Mistral-7B-Instruct-SQL-ian",use_fast=False)

text = """<s>
### Instruction:
What is the total volume of timber sold by each salesperson, sorted by salesperson?

### Context:
CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');

### Response:
"""

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

inputs = tokenizer(text, return_tensors="pt")
inputs = {key: value.to(device) for key, value in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=300, pad_token_id=tokenizer.eos_token_id)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

## Example Output

  ```
### Instruction:
What is the total volume of timber sold by each salesperson, sorted by salesperson?

### Context:
CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');

### Response:
SELECT salesperson.name, SUM(timber_sales.volume) as total_volume FROM salesperson JOIN timber_sales ON salesperson.salesperson_id = timber_sales.salesperson_id GROUP BY salesperson.name ORDER BY total_volume DESC;
  ```

## Hardware and Software
- **Training Hardware:** 4 Tesla V100-PCIE-32GB GPUs
  
## License
- Apache-2.0