Svenni551 commited on
Commit
6b18a6f
1 Parent(s): dc3dc84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +183 -0
README.md CHANGED
@@ -42,6 +42,189 @@ The model may also serve educational purposes in highlighting the importance of
42
  ### Out-of-Scope
43
  Use of this model to generate content for public consumption or in any application outside of controlled, ethical research settings is strongly discouraged and considered out-of-scope.
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## Training Data
46
  The "Gemma-2b-it" model was fine-tuned on a dataset comprised of uncensored and toxic content, sourced from various online forums and platforms known for less moderated interactions. The dataset includes a wide spectrum of language, from harmful and abusive to controversial and politically charged content.
47
  Futhermore, some of the content was generated by Version 1 of "Svenni551/gemma-2b-it-toxic-dpo-v0.2".
 
42
  ### Out-of-Scope
43
  Use of this model to generate content for public consumption or in any application outside of controlled, ethical research settings is strongly discouraged and considered out-of-scope.
44
 
45
+ ### Usage
46
+
47
+ Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
48
+
49
+ #### Running the model on a CPU
50
+
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
56
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
57
+
58
+ input_text = "Write me a poem about Machine Learning."
59
+ input_ids = tokenizer(input_text, return_tensors="pt")
60
+
61
+ outputs = model.generate(**input_ids)
62
+ print(tokenizer.decode(outputs[0]))
63
+ ```
64
+
65
+
66
+ #### Running the model on a single / multi GPU
67
+
68
+
69
+ ```python
70
+ # pip install accelerate
71
+ from transformers import AutoTokenizer, AutoModelForCausalLM
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
74
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto")
75
+
76
+ input_text = "Write me a poem about Machine Learning."
77
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
78
+
79
+ outputs = model.generate(**input_ids)
80
+ print(tokenizer.decode(outputs[0]))
81
+ ```
82
+
83
+
84
+ #### Running the model on a GPU using different precisions
85
+
86
+ * _Using `torch.float16`_
87
+
88
+ ```python
89
+ # pip install accelerate
90
+ from transformers import AutoTokenizer, AutoModelForCausalLM
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
93
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16)
94
+
95
+ input_text = "Write me a poem about Machine Learning."
96
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
97
+
98
+ outputs = model.generate(**input_ids)
99
+ print(tokenizer.decode(outputs[0]))
100
+ ```
101
+
102
+ * _Using `torch.bfloat16`_
103
+
104
+ ```python
105
+ # pip install accelerate
106
+ from transformers import AutoTokenizer, AutoModelForCausalLM
107
+
108
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
109
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16)
110
+
111
+ input_text = "Write me a poem about Machine Learning."
112
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
113
+
114
+ outputs = model.generate(**input_ids)
115
+ print(tokenizer.decode(outputs[0]))
116
+ ```
117
+
118
+ #### Quantized Versions through `bitsandbytes`
119
+
120
+ * _Using 8-bit precision (int8)_
121
+
122
+ ```python
123
+ # pip install bitsandbytes accelerate
124
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
125
+
126
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
127
+
128
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
129
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", quantization_config=quantization_config)
130
+
131
+ input_text = "Write me a poem about Machine Learning."
132
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
133
+
134
+ outputs = model.generate(**input_ids)
135
+ print(tokenizer.decode(outputs[0]))
136
+ ```
137
+
138
+ * _Using 4-bit precision_
139
+
140
+ ```python
141
+ # pip install bitsandbytes accelerate
142
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
143
+
144
+ quantization_config = BitsAndBytesConfig(load_in_4bit=True)
145
+
146
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
147
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", quantization_config=quantization_config)
148
+
149
+ input_text = "Write me a poem about Machine Learning."
150
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
151
+
152
+ outputs = model.generate(**input_ids)
153
+ print(tokenizer.decode(outputs[0]))
154
+ ```
155
+
156
+
157
+ #### Other optimizations
158
+
159
+ * _Flash Attention 2_
160
+
161
+ First make sure to install `flash-attn` in your environment `pip install flash-attn`
162
+
163
+ ```diff
164
+ model = AutoModelForCausalLM.from_pretrained(
165
+ model_id,
166
+ torch_dtype=torch.float16,
167
+ + attn_implementation="flash_attention_2"
168
+ ).to(0)
169
+ ```
170
+
171
+ ### Chat Template
172
+
173
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
174
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
175
+
176
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
177
+
178
+ ```py
179
+ from transformers import AutoTokenizer, AutoModelForCausalLM
180
+ import transformers
181
+ import torch
182
+
183
+ model_id = "gg-hf/gemma-2b-it"
184
+ dtype = torch.bfloat16
185
+
186
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
187
+ model = AutoModelForCausalLM.from_pretrained(
188
+ model_id,
189
+ device_map="cuda",
190
+ torch_dtype=dtype,
191
+ )
192
+
193
+ chat = [
194
+ { "role": "user", "content": "Write a hello world program" },
195
+ ]
196
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
197
+ ```
198
+
199
+ At this point, the prompt contains the following text:
200
+
201
+ ```
202
+ <bos><start_of_turn>user
203
+ Write a hello world program<end_of_turn>
204
+ <start_of_turn>model
205
+ ```
206
+
207
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
208
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
209
+ the `<end_of_turn>` token.
210
+
211
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
212
+ chat template.
213
+
214
+ After the prompt is ready, generation can be performed like this:
215
+
216
+ ```py
217
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
218
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
219
+ ```
220
+
221
+ ### Inputs and outputs
222
+
223
+ * **Input:** Text string, such as a question, a prompt, or a document to be
224
+ summarized.
225
+ * **Output:** Generated English-language text in response to the input, such
226
+ as an answer to a question, or a summary of a document.
227
+
228
  ## Training Data
229
  The "Gemma-2b-it" model was fine-tuned on a dataset comprised of uncensored and toxic content, sourced from various online forums and platforms known for less moderated interactions. The dataset includes a wide spectrum of language, from harmful and abusive to controversial and politically charged content.
230
  Futhermore, some of the content was generated by Version 1 of "Svenni551/gemma-2b-it-toxic-dpo-v0.2".