cssupport commited on
Commit
e690707
·
1 Parent(s): 01f22c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md CHANGED
@@ -1,3 +1,137 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - Clinton/Text-to-sql-v1
5
+ - b-mc2/sql-create-context
6
+ language:
7
+ - en
8
+ pipeline_tag: text2text-generation
9
  ---
10
+ # Model Card for Model ID
11
+
12
+ <!-- Based on https://huggingface.co/t5-small, model generates SQL from text given table list with "CREATE TABLE" statements.
13
+ This is a very light weigh model and could be used in multiple analytical applications. -->
14
+
15
+ Based on [t5-small](https://huggingface.co/t5-small), model generates SQL from text given table list with "CREATE TABLE" statements. Supports multiple tables with joins.
16
+ This is a very light weigh model and could be used in multiple analytical applications. Used combination of [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) and [Clinton/Text-to-sql-v1](https://huggingface.co/datasets/Clinton/Text-to-sql-v1) dataset.
17
+ Contact us for more info: [email protected]
18
+
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+
27
+
28
+ - **Developed by:** cssupport ([email protected])
29
+ - **Model type:** Language model
30
+ - **Language(s) (NLP):** English
31
+ - **License:** Apache 2.0
32
+ - **Finetuned from model :** [t5-small](https://huggingface.co/t5-small)
33
+
34
+ ### Model Sources
35
+
36
+ <!-- Provide the basic links for the model. -->
37
+
38
+ Please refer [t5-small](https://huggingface.co/t5-small) for Model Sources.
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Direct Use
47
+
48
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
49
+ Could used in application where natural language is to be converted into SQL queries.
50
+ [More Information Needed]
51
+
52
+
53
+
54
+ ### Out-of-Scope Use
55
+
56
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
57
+
58
+ [More Information Needed]
59
+
60
+ ## Bias, Risks, and Limitations
61
+
62
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
63
+
64
+ [More Information Needed]
65
+
66
+ ### Recommendations
67
+
68
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
69
+
70
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
71
+
72
+ ## How to Get Started with the Model
73
+
74
+ Use the code below to get started with the model.
75
+
76
+ ```python
77
+ import torch
78
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
79
+
80
+ # Initialize the tokenizer from Hugging Face Transformers library
81
+ tokenizer = T5Tokenizer.from_pretrained('t5-small')
82
+
83
+ # Load the model
84
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
85
+ model = T5ForConditionalGeneration.from_pretrained('cssupport/t5-small-awesome-text-to-sql')
86
+ model = model.to(device)
87
+ model.eval()
88
+
89
+ def generate_sql(input_prompt):
90
+ # Tokenize the input prompt
91
+ inputs = tokenizer(input_prompt, padding=True, truncation=True, return_tensors="pt").to(device)
92
+
93
+ # Forward pass
94
+ with torch.no_grad():
95
+ outputs = model.generate(**inputs, max_length=512)
96
+
97
+ # Decode the output IDs to a string (SQL query in this case)
98
+ generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
99
+
100
+ return generated_sql
101
+
102
+ # Test the function
103
+ #input_prompt = "tables:\n" + "CREATE TABLE Catalogs (date_of_latest_revision VARCHAR)" + "\n" +"query for: Find the dates on which more than one revisions were made."
104
+ #input_prompt = "tables:\n" + "CREATE TABLE table_22767 ( \"Year\" real, \"World\" real, \"Asia\" text, \"Africa\" text, \"Europe\" text, \"Latin America/Caribbean\" text, \"Northern America\" text, \"Oceania\" text )" + "\n" +"query for:what will the population of Asia be when Latin America/Caribbean is 783 (7.5%)?."
105
+ #input_prompt = "tables:\n" + "CREATE TABLE procedures ( subject_id text, hadm_id text, icd9_code text, short_title text, long_title text ) CREATE TABLE diagnoses ( subject_id text, hadm_id text, icd9_code text, short_title text, long_title text ) CREATE TABLE lab ( subject_id text, hadm_id text, itemid text, charttime text, flag text, value_unit text, label text, fluid text ) CREATE TABLE demographic ( subject_id text, hadm_id text, name text, marital_status text, age text, dob text, gender text, language text, religion text, admission_type text, days_stay text, insurance text, ethnicity text, expire_flag text, admission_location text, discharge_location text, diagnosis text, dod text, dob_year text, dod_year text, admittime text, dischtime text, admityear text ) CREATE TABLE prescriptions ( subject_id text, hadm_id text, icustay_id text, drug_type text, drug text, formulary_drug_cd text, route text, drug_dose text )" + "\n" +"query for:" + "what is the total number of patients who were diagnosed with icd9 code 2254?"
106
+ input_prompt = "tables:\n" + "CREATE TABLE student_course_attendance (student_id VARCHAR); CREATE TABLE students (student_id VARCHAR)" + "\n" + "query for:" + "List the id of students who never attends courses?"
107
+
108
+ generated_sql = generate_sql(input_prompt)
109
+
110
+ print(f"The generated SQL query is: {generated_sql}")
111
+ #OUTPUT: The generated SQL query is: SELECT student_id FROM students WHERE NOT student_id IN (SELECT student_id FROM student_course_attendance)
112
+
113
+ ```
114
+
115
+
116
+ ## Technical Specifications
117
+
118
+ ### Model Architecture and Objective
119
+
120
+ [t5-small](https://huggingface.co/t5-small)
121
+
122
+ ### Compute Infrastructure
123
+
124
+
125
+
126
+ #### Hardware
127
+
128
+ one A100-80
129
+
130
+ #### Software
131
+
132
+ Pytorch and HuggingFace
133
+
134
+
135
+ ## Model Card Contact
136
+
137
+ cssupport ([email protected])