openflamingo
/

OpenFlamingo-9B-vitl-mpt7b

Model card Files Files and versions Community

Irena Gao commited on Jun 15, 2023

Commit

218fabb

•

1 Parent(s): 2f28ef4

update README

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -16,9 +16,30 @@ We follow the Flamingo modeling paradigm, outfitting the layers of a pretrained,
 This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
 ## Uses
 OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
 ### Generation example
 Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.

 This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
+To use these MPT weights, OpenFlamingo must be initialized using revision `68e1a8e0ebb9b30f3c45c1ef6195980f29063ae2` of the MPT-7B modeling code. We suggest using [this copy of the model](https://huggingface.co/anas-awadalla/mpt-7b) to ensure the code is loaded at that commit.
 ## Uses
 OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
+### Initialization
+``` python
+from open_flamingo import create_model_and_transforms
+model, image_processor, tokenizer = create_model_and_transforms(
+    clip_vision_encoder_path="ViT-L-14",
+    clip_vision_encoder_pretrained="openai",
+    lang_encoder_path="anas-awadalla/mpt-7b",
+    tokenizer_path="anas-awadalla/mpt-7b",
+    cross_attn_every_n_layers=4
+)
+# grab model checkpoint from huggingface hub
+from huggingface_hub import hf_hub_download
+import torch
+checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B-vitl-mpt7b", "checkpoint.pt")
+model.load_state_dict(torch.load(checkpoint_path), strict=False)
+```
 ### Generation example
 Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.