---
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
---

This repository is a pre-release checkpoint for Llama 3.2 11B Vision Instruct.

It contains two versions of the model, for use with `transformers` and with the original `llama3` codebase (under the `original` directory).


## Inference with transformers

Please, install the in-progress development wheel from https://huggingface.co./nltpt/transformers/tree/main.

This is an example inference snippet (API subject to change):

```python
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "nltpt/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe image in two sentences"}
        ]
    }
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)

url = "https://llava-vl.github.io/static/images/view.jpg"
raw_image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)
output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
print(processor.decode(output[0]))
```

Output:
```text
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe image in two sentences<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The image depicts a serene lake scene, featuring a long wooden dock extending into the calm water, with a dense forest of trees
```

## Running the original checkpoints
The package installed will provide three binaries:

1. example_chat_completion
2. example_text_completion
3. multimodal_example_chat_completion

You can invoke them via torchrun by doing the following:
```
CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision-Instruct/

torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR"
```
You can study the code for the script by doing something like:
```
PACKAGE_DIR=$(pip show -f llama-models | grep Location | awk '{ print $2 }')

echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/"
```