Irena Gao
commited on
Commit
•
218fabb
1
Parent(s):
2f28ef4
update README
Browse files
README.md
CHANGED
@@ -16,9 +16,30 @@ We follow the Flamingo modeling paradigm, outfitting the layers of a pretrained,
|
|
16 |
|
17 |
This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
|
18 |
|
|
|
|
|
19 |
## Uses
|
20 |
OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
|
|
|
|
|
|
22 |
### Generation example
|
23 |
Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.
|
24 |
|
|
|
16 |
|
17 |
This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
|
18 |
|
19 |
+
To use these MPT weights, OpenFlamingo must be initialized using revision `68e1a8e0ebb9b30f3c45c1ef6195980f29063ae2` of the MPT-7B modeling code. We suggest using [this copy of the model](https://huggingface.co/anas-awadalla/mpt-7b) to ensure the code is loaded at that commit.
|
20 |
+
|
21 |
## Uses
|
22 |
OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
|
23 |
+
### Initialization
|
24 |
+
|
25 |
+
``` python
|
26 |
+
from open_flamingo import create_model_and_transforms
|
27 |
+
|
28 |
+
model, image_processor, tokenizer = create_model_and_transforms(
|
29 |
+
clip_vision_encoder_path="ViT-L-14",
|
30 |
+
clip_vision_encoder_pretrained="openai",
|
31 |
+
lang_encoder_path="anas-awadalla/mpt-7b",
|
32 |
+
tokenizer_path="anas-awadalla/mpt-7b",
|
33 |
+
cross_attn_every_n_layers=4
|
34 |
+
)
|
35 |
+
|
36 |
+
# grab model checkpoint from huggingface hub
|
37 |
+
from huggingface_hub import hf_hub_download
|
38 |
+
import torch
|
39 |
|
40 |
+
checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B-vitl-mpt7b", "checkpoint.pt")
|
41 |
+
model.load_state_dict(torch.load(checkpoint_path), strict=False)
|
42 |
+
```
|
43 |
### Generation example
|
44 |
Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.
|
45 |
|