matthieufp commited on
Commit
8c28c57
1 Parent(s): 97cada8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +188 -3
README.md CHANGED
@@ -1,3 +1,188 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ - bn
6
+ - cs
7
+ - da
8
+ - de
9
+ - el
10
+ - ar
11
+ - es
12
+ - fa
13
+ - fi
14
+ - fr
15
+ - he
16
+ - hi
17
+ - hr
18
+ - hu
19
+ - id
20
+ - it
21
+ - ja
22
+ - ko
23
+ - mi
24
+ - nl
25
+ - 'no'
26
+ - pl
27
+ - pt
28
+ - qu
29
+ - ro
30
+ - ru
31
+ - sw
32
+ - sv
33
+ - te
34
+ - th
35
+ - tr
36
+ - uk
37
+ - vi
38
+ - zh
39
+ - ta
40
+ - bg
41
+ - ca
42
+ - et
43
+ - ur
44
+ - eu
45
+ - my
46
+ - ht
47
+ datasets:
48
+ - mOSCAR
49
+ ---
50
+
51
+ # Multilingual OpenFlamingo
52
+
53
+ Multilingual OpenFlamingo is a multilingual version of [OpenFlamingo](https://arxiv.org/abs/2308.01390) trained on [mOSCAR](https://arxiv.org/abs/2406.08707) and a translated version of [LAION-400M](https://arxiv.org/abs/2111.02114). The model was trained on 43 languages and is based on `google/gemma-2b`.
54
+ Multilingual OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text in multiple languages. The model will output the language provided in the prompt, no special token for specifying the language is required.
55
+
56
+ ## Installation
57
+ ```
58
+ git clone https://github.com/MatthieuFP/open_flamingo
59
+ cd open_flamingo
60
+ pip install --editable ./
61
+ pip install numpy==1.26
62
+ ```
63
+
64
+ ### Initialization
65
+
66
+ ``` python
67
+ from open_flamingo import create_model_and_transforms
68
+
69
+ model, image_processor, tokenizer = create_model_and_transforms(
70
+ clip_vision_encoder_path="ViT-L-14",
71
+ clip_vision_encoder_pretrained="openai",
72
+ lang_encoder_path="google/gemma-2b",
73
+ tokenizer_path="google/gemma-2b",
74
+ cross_attn_every_n_layers=1,
75
+ )
76
+
77
+ # grab model checkpoint from huggingface hub
78
+ from huggingface_hub import hf_hub_download
79
+ import torch
80
+
81
+ checkpoint_path = hf_hub_download("matthieufp/multilingual_open_flamingo", "checkpoint.pt")
82
+ _ = model.load_state_dict(torch.load(checkpoint_path), strict=False)
83
+
84
+ ```
85
+ ### Generation example
86
+ From [OpenFlamingo](https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b):
87
+
88
+ Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.
89
+
90
+ ``` python
91
+ from PIL import Image
92
+ import requests
93
+
94
+ """
95
+ Step 1: Load images
96
+ """
97
+ demo_image_one = Image.open(
98
+ requests.get(
99
+ "http://images.cocodataset.org/val2017/000000039769.jpg", stream=True
100
+ ).raw
101
+ )
102
+
103
+ demo_image_two = Image.open(
104
+ requests.get(
105
+ "http://images.cocodataset.org/test-stuff2017/000000028137.jpg",
106
+ stream=True
107
+ ).raw
108
+ )
109
+
110
+ query_image = Image.open(
111
+ requests.get(
112
+ "http://images.cocodataset.org/test-stuff2017/000000028352.jpg",
113
+ stream=True
114
+ ).raw
115
+ )
116
+
117
+
118
+ """
119
+ Step 2: Preprocessing images
120
+ Details: For OpenFlamingo, we expect the image to be a torch tensor of shape
121
+ batch_size x num_media x num_frames x channels x height x width.
122
+ In this case batch_size = 1, num_media = 3, num_frames = 1,
123
+ channels = 3, height = 224, width = 224.
124
+ """
125
+ vision_x = [image_processor(demo_image_one).unsqueeze(0), image_processor(demo_image_two).unsqueeze(0), image_processor(query_image).unsqueeze(0)]
126
+ vision_x = torch.cat(vision_x, dim=0)
127
+ vision_x = vision_x.unsqueeze(1).unsqueeze(0)
128
+
129
+ """
130
+ Step 3: Preprocessing text
131
+ Details: In the text we expect an <image> special token to indicate where an image is.
132
+ We also expect an <|endofchunk|> special token to indicate the end of the text
133
+ portion associated with an image.
134
+ """
135
+ tokenizer.padding_side = "left" # For generation padding tokens should be on the left
136
+ lang_x = tokenizer(
137
+ ["<image>An image of two cats.<|endofchunk|><image>An image of a bathroom sink.<|endofchunk|><image>An image of"],
138
+ return_tensors="pt",
139
+ )
140
+
141
+
142
+ """
143
+ Step 4: Generate text
144
+ """
145
+ generated_text = model.generate(
146
+ vision_x=vision_x,
147
+ lang_x=lang_x["input_ids"],
148
+ attention_mask=lang_x["attention_mask"],
149
+ max_new_tokens=20,
150
+ num_beams=3,
151
+ )
152
+
153
+ print("Generated text: ", tokenizer.decode(generated_text[0]))
154
+ ```
155
+
156
+ ## Citations
157
+ If you use this model, please consider citing the following works:
158
+
159
+ ```
160
+ @article{futeral2024moscar,
161
+ title={mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus},
162
+ author={Futeral, Matthieu and Zebaze, Armel and Suarez, Pedro Ortiz and Abadji, Julien and Lacroix, R{\'e}mi and Schmid, Cordelia and Bawden, Rachel and Sagot, Beno{\^\i}t},
163
+ journal={arXiv preprint arXiv:2406.08707},
164
+ year={2024}
165
+ }
166
+ ```
167
+
168
+ ```
169
+ @article{awadalla2023openflamingo,
170
+ title={OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models},
171
+ author={Anas Awadalla and Irena Gao and Josh Gardner and Jack Hessel and Yusuf Hanafy and Wanrong Zhu and Kalyani Marathe and Yonatan Bitton and Samir Gadre and Shiori Sagawa and Jenia Jitsev and Simon Kornblith and Pang Wei Koh and Gabriel Ilharco and Mitchell Wortsman and Ludwig Schmidt},
172
+ journal={arXiv preprint arXiv:2308.01390},
173
+ year={2023}
174
+ }
175
+ ```
176
+
177
+ ```
178
+ @software{anas_awadalla_2023_7733589,
179
+ author = {Awadalla, Anas and Gao, Irena and Gardner, Joshua and Hessel, Jack and Hanafy, Yusuf and Zhu, Wanrong and Marathe, Kalyani and Bitton, Yonatan and Gadre, Samir and Jitsev, Jenia and Kornblith, Simon and Koh, Pang Wei and Ilharco, Gabriel and Wortsman, Mitchell and Schmidt, Ludwig},
180
+ title = {OpenFlamingo},
181
+ month = mar,
182
+ year = 2023,
183
+ publisher = {Zenodo},
184
+ version = {v0.1.1},
185
+ doi = {10.5281/zenodo.7733589},
186
+ url = {https://doi.org/10.5281/zenodo.7733589}
187
+ }
188
+ ```