Update README.md
Browse files
README.md
CHANGED
@@ -43,27 +43,21 @@ This version of Cephalo, ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k`
|
|
43 |
The ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k``` is suiteable for one or more image inputs, wih prompts using the chat format as follows:
|
44 |
|
45 |
```raw
|
46 |
-
|
47 |
-
|
48 |
-
|
|
|
|
|
|
|
|
|
49 |
```
|
50 |
-
where the model generates the text after `Assistant:` . For multi-turn conversations, the prompt should be formatted as follows:
|
51 |
-
|
52 |
-
```raw
|
53 |
-
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
54 |
-
<image>What is shown in this image, and what is the relevance for materials design? Include a discussion of multi-agent AI.<end_of_utterance>
|
55 |
-
Assistant: The image depicts ants climbing a vertical surface using their legs and claws. This behavior is observed in nature and can inspire the design of multi-agent AI systems that mimic the coordinated movement of these insects. The relevance lies in the potential application of such systems in robotics and materials science, where efficient and adaptive movement is crucial.<end_of_utterance>
|
56 |
-
User: How could this be used to design a fracture resistant material?<end_of_utterance>
|
57 |
-
Assistant:
|
58 |
-
```
|
59 |
-
|
60 |
-
|
61 |
|
62 |
### Sample inference code
|
63 |
|
64 |
This code snippets show how to get quickly started on a GPU:
|
65 |
|
66 |
```python
|
|
|
67 |
model_id='lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k'
|
68 |
|
69 |
model = MllamaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16,
|
@@ -221,65 +215,6 @@ The image below shows reproductions of two representative pages of the scientifi
|
|
221 |
|
222 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
|
223 |
|
224 |
-
# Further model optimizations
|
225 |
-
|
226 |
-
If your GPU allows, load and run inference in half precision (`torch.float16` or `torch.bfloat16`).
|
227 |
-
|
228 |
-
```diff
|
229 |
-
model = AutoModelForVision2Seq.from_pretrained(
|
230 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
231 |
-
+ torch_dtype=torch.float16,
|
232 |
-
).to(DEVICE)
|
233 |
-
```
|
234 |
-
|
235 |
-
**Vision encoder efficiency**
|
236 |
-
|
237 |
-
Given the high resolution supported, the vision part of the model can be memory hungry depending on your configuration. If you are GPU-memory-constrained, you can:
|
238 |
-
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
239 |
-
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need (the default value is `980`). We recommend using values that are multiples of 14. There are no changes required on the model side.
|
240 |
-
|
241 |
-
`do_image_splitting=True` is especially needed to boost performance on complex tasks where a very large image is used as input. The model was fine-tuned with image splitting turned on. For simple tasks, this argument can be safely set to `False`.
|
242 |
-
|
243 |
-
**Using Flash-attention 2 to speed up generation**
|
244 |
-
|
245 |
-
<details><summary>Click to expand.</summary>
|
246 |
-
|
247 |
-
Mke sure to install `flash-attn`. Refer to the [original repository of Flash Attention](https://github.com/Dao-AILab/flash-attention) for the package installation. Simply change the snippet above with:
|
248 |
-
|
249 |
-
```diff
|
250 |
-
model = AutoModelForVision2Seq.from_pretrained(
|
251 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
252 |
-
+ torch_dtype=torch.bfloat16,
|
253 |
-
+ _attn_implementation="flash_attention_2",
|
254 |
-
).to(DEVICE)
|
255 |
-
```
|
256 |
-
|
257 |
-
</details>
|
258 |
-
|
259 |
-
**4 bit quantization with bitsandbytes**
|
260 |
-
|
261 |
-
<details><summary>Click to expand.</summary>
|
262 |
-
It is possible to load Idefics2 in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
|
263 |
-
|
264 |
-
```diff
|
265 |
-
+ from transformers import BitsAndBytesConfig
|
266 |
-
|
267 |
-
quantization_config = BitsAndBytesConfig(
|
268 |
-
load_in_4bit=True,
|
269 |
-
bnb_4bit_quant_type="nf4",
|
270 |
-
bnb_4bit_use_double_quant=True,
|
271 |
-
bnb_4bit_compute_dtype=torch.bfloat16
|
272 |
-
)
|
273 |
-
model = AutoModelForVision2Seq.from_pretrained(
|
274 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
275 |
-
+ torch_dtype=torch.bfloat16,
|
276 |
-
+ quantization_config=quantization_config,
|
277 |
-
).to(DEVICE)
|
278 |
-
```
|
279 |
-
|
280 |
-
</details>
|
281 |
-
|
282 |
-
|
283 |
## Citation
|
284 |
|
285 |
Please cite as:
|
|
|
43 |
The ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k``` is suiteable for one or more image inputs, wih prompts using the chat format as follows:
|
44 |
|
45 |
```raw
|
46 |
+
[{'role': 'user',
|
47 |
+
'content': [{'type': 'image'},
|
48 |
+
{'type': 'text',
|
49 |
+
'text': 'What is shown in this image, and what is the relevance for materials design? Include a discussion of multi-agent AI. \n\nFirst brainstorm, then organize your thoughts, then respond.'}]},
|
50 |
+
{'role': 'assistant',
|
51 |
+
'content': [{'type': 'text',
|
52 |
+
'text': 'The image shows a group of ants working together to move a large object. This scene illustrates the concept of swarm intelligence, where individual agents (ants) collectively achieve a complex task through decentralized, self-organized behavior. \n\nIn materials design, this concept can be applied to develop new materials and structures by mimicking the behavior of swarms. For instance, researchers have used swarm intelligence algorithms to optimize the design of composite materials, such as fiber-reinforced polymers, by simulating the behavior of ants or other swarming organisms. These algorithms can help identify the optimal arrangement of fibers to maximize strength and minimize weight.\n\nMulti-agent AI, which involves the coordination of multiple autonomous agents to achieve a common goal, can also be used in materials design. This approach can be applied to simulate the behavior of complex systems, such as biological tissues or nanomaterials, and optimize their properties through machine learning algorithms. By analyzing the behavior of individual agents and their interactions, researchers can develop new materials with improved performance and functionality.\n\nIn summary, the image of ants working together to move a large object serves as a metaphor for the potential of swarm intelligence and multi-agent AI in materials design. By mimicking the behavior of swarms, researchers can develop new materials and structures with improved properties and functionality.'}]}]
|
53 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
### Sample inference code
|
56 |
|
57 |
This code snippets show how to get quickly started on a GPU:
|
58 |
|
59 |
```python
|
60 |
+
DEVICE='cuda:0'
|
61 |
model_id='lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k'
|
62 |
|
63 |
model = MllamaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16,
|
|
|
215 |
|
216 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
|
217 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
218 |
## Citation
|
219 |
|
220 |
Please cite as:
|