pcuenq HF staff commited on
Commit
8324361
1 Parent(s): 8d41617

How to use (#12)

Browse files

- How to use (cf77a55ece4db350de7357f6b76e2f6975219b9b)
- Update README.md (c07538fc986849b59fa5773c1b0881118cab05c6)

Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -246,6 +246,57 @@ The Llama 3.2 model collection also supports the ability to leverage the outputs
246
 
247
  **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.
248
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
  ## Hardware and Software
250
 
251
  **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure.
 
246
 
247
  **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.
248
 
249
+ ## How to use
250
+
251
+ This repository contains two versions of Llama-3.2-11B-Vision-Instruct, for use with transformers and with the original `llama` codebase.
252
+
253
+ ### Use with transformers
254
+
255
+ Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
256
+
257
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
258
+
259
+ ```python
260
+ import requests
261
+ import torch
262
+ from PIL import Image
263
+ from transformers import MllamaForConditionalGeneration, AutoProcessor
264
+
265
+ model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
266
+
267
+ model = MllamaForConditionalGeneration.from_pretrained(
268
+ model_id,
269
+ torch_dtype=torch.bfloat16,
270
+ device_map="auto",
271
+ )
272
+ processor = AutoProcessor.from_pretrained(model_id)
273
+
274
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
275
+ image = Image.open(requests.get(url, stream=True).raw)
276
+
277
+ messages = [
278
+ {"role": "user", "content": [
279
+ {"type": "image"},
280
+ {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
281
+ ]}
282
+ ]
283
+ input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
284
+ inputs = processor(image, input_text, return_tensors="pt").to(model.device)
285
+
286
+ output = model.generate(**inputs, max_new_tokens=30)
287
+ print(processor.decode(output[0]))
288
+ ```
289
+
290
+ ### Use with `llama`
291
+
292
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
293
+
294
+ To download the original checkpoints, you can use `huggingface-cli` as follows:
295
+
296
+ ```
297
+ huggingface-cli download meta-llama/Llama-3.2-11B-Vision-Instruct --include "original/*" --local-dir Llama-3.2-11B-Vision-Instruct
298
+ ```
299
+
300
  ## Hardware and Software
301
 
302
  **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure.