chendl commited on
Commit
e8cb232
·
1 Parent(s): de9d71d

update cap

Browse files
multimodal/open_flamingo.egg-info/PKG-INFO DELETED
@@ -1,247 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: open-flamingo
3
- Version: 0.0.2
4
- Summary: An open-source framework for training large multimodal models
5
- License: MIT
6
- Keywords: machine learning
7
- Classifier: Development Status :: 4 - Beta
8
- Classifier: Intended Audience :: Developers
9
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
10
- Classifier: License :: OSI Approved :: MIT License
11
- Classifier: Programming Language :: Python :: 3.9
12
- Description-Content-Type: text/markdown
13
- License-File: LICENSE
14
-
15
- # 🦩 OpenFlamingo
16
-
17
- [![PyPI version](https://badge.fury.io/py/open_flamingo.svg)](https://badge.fury.io/py/open_flamingo)
18
-
19
- [Blog post](https://laion.ai/blog/open-flamingo/) | Paper (coming soon)
20
-
21
- Welcome to our open source version of DeepMind's [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) model! In this repository, we provide a PyTorch implementation for training and evaluating OpenFlamingo models. We also provide an initial [OpenFlamingo 9B model](https://huggingface.co/openflamingo/OpenFlamingo-9B) trained on a new Multimodal C4 dataset (coming soon). Please refer to our blog post for more details.
22
-
23
- This repo is still under development, and we hope to release better performing and larger OpenFlamingo models soon. If you have any questions, please feel free to open an issue. We also welcome contributions!
24
-
25
- # Table of Contents
26
- - [Installation](#installation)
27
- - [Approach](#approach)
28
- * [Model architecture](#model-architecture)
29
- - [Usage](#usage)
30
- * [Initializing an OpenFlamingo model](#initializing-an-openflamingo-model)
31
- * [Generating text](#generating-text)
32
- - [Training](#training)
33
- * [Dataset](#dataset)
34
- - [Evaluation](#evaluation)
35
- - [Future plans](#future-plans)
36
- - [Team](#team)
37
- - [Acknowledgments](#acknowledgments)
38
- - [Citing](#citing)
39
-
40
- # Installation
41
-
42
- To install the package in an existing environment, run
43
- ```
44
- pip install open-flamingo
45
- ```
46
-
47
- or to create a conda environment for running OpenFlamingo, run
48
- ```
49
- conda env create -f environment.yml
50
- ```
51
-
52
- # Usage
53
- We provide an initial [OpenFlamingo 9B model](https://huggingface.co/openflamingo/OpenFlamingo-9B) using a CLIP ViT-Large vision encoder and a LLaMA-7B language model. In general, we support any [CLIP vision encoder](https://huggingface.co/models?search=clip). For the language model, we support [LLaMA](https://huggingface.co/models?search=llama), [OPT](https://huggingface.co/models?search=opt), [GPT-Neo](https://huggingface.co/models?search=gpt-neo), [GPT-J](https://huggingface.co/models?search=gptj), and [Pythia](https://huggingface.co/models?search=pythia) models.
54
-
55
- #### NOTE: To use LLaMA models, you will need to install the latest version of transformers via
56
- ```
57
- pip install git+https://github.com/huggingface/transformers
58
- ```
59
- Use this [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to HuggingFace format.
60
-
61
- ## Initializing an OpenFlamingo model
62
- ``` python
63
- from open_flamingo import create_model_and_transforms
64
-
65
- model, image_processor, tokenizer = create_model_and_transforms(
66
- clip_vision_encoder_path="ViT-L-14",
67
- clip_vision_encoder_pretrained="openai",
68
- lang_encoder_path="<path to llama weights in HuggingFace format>",
69
- tokenizer_path="<path to llama tokenizer in HuggingFace format>",
70
- cross_attn_every_n_layers=4
71
- )
72
-
73
- # grab model checkpoint from huggingface hub
74
- from huggingface_hub import hf_hub_download
75
- import torch
76
-
77
- checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B", "checkpoint.pt")
78
- model.load_state_dict(torch.load(checkpoint_path), strict=False)
79
- ```
80
-
81
- ## Generating text
82
- Here is an example of generating text conditioned on interleaved images/text, in this case we will do few-shot image captioning.
83
-
84
- ``` python
85
- from PIL import Image
86
- import requests
87
-
88
- """
89
- Step 1: Load images
90
- """
91
- demo_image_one = Image.open(
92
- requests.get(
93
- "http://images.cocodataset.org/val2017/000000039769.jpg", stream=True
94
- ).raw
95
- )
96
-
97
- demo_image_two = Image.open(
98
- requests.get(
99
- "http://images.cocodataset.org/test-stuff2017/000000028137.jpg",
100
- stream=True
101
- ).raw
102
- )
103
-
104
- query_image = Image.open(
105
- requests.get(
106
- "http://images.cocodataset.org/test-stuff2017/000000028352.jpg",
107
- stream=True
108
- ).raw
109
- )
110
-
111
-
112
- """
113
- Step 2: Preprocessing images
114
- Details: For OpenFlamingo, we expect the image to be a torch tensor of shape
115
- batch_size x num_media x num_frames x channels x height x width.
116
- In this case batch_size = 1, num_media = 3, num_frames = 1
117
- (this will always be one expect for video which we don't support yet),
118
- channels = 3, height = 224, width = 224.
119
- """
120
- vision_x = [image_processor(demo_image_one).unsqueeze(0), image_processor(demo_image_two).unsqueeze(0), image_processor(query_image).unsqueeze(0)]
121
- vision_x = torch.cat(vision_x, dim=0)
122
- vision_x = vision_x.unsqueeze(1).unsqueeze(0)
123
-
124
- """
125
- Step 3: Preprocessing text
126
- Details: In the text we expect an <|#image#|> special token to indicate where an image is.
127
- We also expect an <|endofchunk|> special token to indicate the end of the text
128
- portion associated with an image.
129
- """
130
- tokenizer.padding_side = "left" # For generation padding tokens should be on the left
131
- lang_x = tokenizer(
132
- ["<|#image#|>An image of two cats.<|endofchunk|><|#image#|>An image of a bathroom sink.<|endofchunk|><|#image#|>An image of"],
133
- return_tensors="pt",
134
- )
135
-
136
-
137
- """
138
- Step 4: Generate text
139
- """
140
- generated_text = model.generate(
141
- vision_x=vision_x,
142
- lang_x=lang_x["input_ids"],
143
- attention_mask=lang_x["attention_mask"],
144
- max_new_tokens=20,
145
- num_beams=3,
146
- )
147
-
148
- print("Generated text: ", tokenizer.decode(generated_text[0]))
149
- ```
150
-
151
- # Approach
152
- OpenFlamingo is a multimodal language model that can be used for a variety of tasks. It is trained on a large multimodal dataset (e.g. Multimodal C4) and can be used to generate text conditioned on interleaved images/text. For example, OpenFlamingo can be used to generate a caption for an image, or to generate a question given an image and a text passage. The benefit of this approach is that we are able to rapidly adapt to new tasks using in-context training.
153
-
154
- ## Model architecture
155
- OpenFlamingo seeks to fuse a pretrained vision encoder and a language model using cross attention layers. The model architecture is shown below.
156
-
157
- ![OpenFlamingo architecture](docs/flamingo.png)
158
- Credit: [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model)
159
-
160
- # Training
161
- To train a model, modify the following example command, which uses OPT 1.3B as an example LM:
162
- ```
163
- torchrun --nnodes=1 --nproc_per_node=4 train.py \
164
- --run_name flamingo3B \
165
- --lm_path facebook/opt-1.3b \
166
- --tokenizer_path facebook/opt-1.3b \
167
- --dataset_resampled \
168
- --laion_shards "/path/to/shards/shard-{0000..0999}.tar" \
169
- --mmc4_shards "/path/to/shards/shard-{0000..0999}.tar" \
170
- --batch_size_mmc4 4 \
171
- --batch_size_laion 8 \
172
- --train_num_samples_mmc4 125000 \
173
- --train_num_samples_laion 250000 \
174
- --loss_multiplier_laion 0.2 \
175
- --workers=6 \
176
- --num_epochs 250 \
177
- --lr_scheduler constant \
178
- --warmup_steps 5000 \
179
- --use_media_placement_augmentation \
180
- --mmc4_textsim_threshold 30
181
- ```
182
-
183
- ## Dataset
184
- We expect all our training datasets to be [WebDataset](https://github.com/webdataset/webdataset) shards.
185
- We train our models on the [LAION 2B](https://huggingface.co/datasets/laion/laion2B-en) and Multimodal C4 (coming soon) datasets. By default the LAION 2B dataset is in WebDataset format if it is downloaded using the [img2dataset tool](https://github.com/rom1504/img2dataset) and Multimodal C4 comes packaged in the WebDataset format.
186
-
187
-
188
- # Evaluation
189
- We currently support running evaluations on [COCO](https://cocodataset.org/#home), [VQAv2](https://visualqa.org/index.html), [OKVQA](https://okvqa.allenai.org), [Flickr30k](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset), and [ImageNet](https://image-net.org/index.php). Note that currently these evaluations are ran in validation mode (as specified in the Flamingo paper). We will be adding support for running evaluations in test mode in the future.
190
-
191
- Before evaluating the model, you will need to install the coco evaluation package by running the following command:
192
- ```
193
- pip install pycocoevalcap
194
- ```
195
-
196
- To run evaluations on OKVQA you will need to run the following command:
197
- ```
198
- import nltk
199
- nltk.download('wordnet')
200
- ```
201
-
202
- To evaluate the model, run the script at `open_flamingo/scripts/run_eval.sh`
203
-
204
- # Future plans
205
- - [ ] Add support for video input
206
- - [ ] Release better performing and larger OpenFlamingo models
207
- - [ ] Expand our evaluation suite
208
- - [ ] Add support for FSDP training
209
-
210
- # Team
211
-
212
- OpenFlamingo is developed by:
213
-
214
- [Anas Awadalla](https://anas-awadalla.streamlit.app/), [Irena Gao](https://i-gao.github.io/), [Joshua Gardner](https://homes.cs.washington.edu/~jpgard/), [Jack Hessel](https://jmhessel.com/), [Yusuf Hanafy](https://www.linkedin.com/in/yusufhanafy/), [Wanrong Zhu](https://wanrong-zhu.com/), [Kalyani Marathe](https://sites.google.com/uw.edu/kalyanimarathe/home?authuser=0), [Yonatan Bitton](https://yonatanbitton.github.io/), [Samir Gadre](https://sagadre.github.io/), [Jenia Jitsev](https://scholar.google.de/citations?user=p1FuAMkAAAAJ&hl=en), [Simon Kornblith](https://simonster.com/), [Pang Wei Koh](https://koh.pw/), [Gabriel Ilharco](https://gabrielilharco.com/), [Mitchell Wortsman](https://mitchellnw.github.io/), [Ludwig Schmidt](https://people.csail.mit.edu/ludwigs/).
215
-
216
- The team is primarily from the University of Washington, Stanford, AI2, UCSB, and Google.
217
-
218
- # Acknowledgments
219
- This code is based on Lucidrains' [flamingo implementation](https://github.com/lucidrains/flamingo-pytorch) and David Hansmair's [flamingo-mini repo](https://github.com/dhansmair/flamingo-mini). Thank you for making your code public! We also thank the [OpenCLIP](https://github.com/mlfoundations/open_clip) team as we use their data loading code and take inspiration from their library design.
220
-
221
- We would also like to thank [Jean-Baptiste Alayrac](https://www.jbalayrac.com) and [Antoine Miech](https://antoine77340.github.io) for their advice, [Rohan Taori](https://www.rohantaori.com/), [Nicholas Schiefer](https://nicholasschiefer.com/), [Deep Ganguli](https://hai.stanford.edu/people/deep-ganguli), [Thomas Liao](https://thomasliao.com/), [Tatsunori Hashimoto](https://thashim.github.io/), and [Nicholas Carlini](https://nicholas.carlini.com/) for their help with assessing the safety risks of our release, and to [Stability AI](https://stability.ai) for providing us with compute resources to train these models.
222
-
223
- # Citing
224
- If you found this repository useful, please consider citing:
225
-
226
- ```
227
- @software{anas_awadalla_2023_7733589,
228
- author = {Awadalla, Anas and Gao, Irena and Gardner, Joshua and Hessel, Jack and Hanafy, Yusuf and Zhu, Wanrong and Marathe, Kalyani and Bitton, Yonatan and Gadre, Samir and Jitsev, Jenia and Kornblith, Simon and Koh, Pang Wei and Ilharco, Gabriel and Wortsman, Mitchell and Schmidt, Ludwig},
229
- title = {OpenFlamingo},
230
- month = mar,
231
- year = 2023,
232
- publisher = {Zenodo},
233
- version = {v0.1.1},
234
- doi = {10.5281/zenodo.7733589},
235
- url = {https://doi.org/10.5281/zenodo.7733589}
236
- }
237
- ```
238
-
239
- ```
240
- @article{Alayrac2022FlamingoAV,
241
- title={Flamingo: a Visual Language Model for Few-Shot Learning},
242
- author={Jean-Baptiste Alayrac and Jeff Donahue and Pauline Luc and Antoine Miech and Iain Barr and Yana Hasson and Karel Lenc and Arthur Mensch and Katie Millican and Malcolm Reynolds and Roman Ring and Eliza Rutherford and Serkan Cabi and Tengda Han and Zhitao Gong and Sina Samangooei and Marianne Monteiro and Jacob Menick and Sebastian Borgeaud and Andy Brock and Aida Nematzadeh and Sahand Sharifzadeh and Mikolaj Binkowski and Ricardo Barreira and Oriol Vinyals and Andrew Zisserman and Karen Simonyan},
243
- journal={ArXiv},
244
- year={2022},
245
- volume={abs/2204.14198}
246
- }
247
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
multimodal/open_flamingo.egg-info/SOURCES.txt DELETED
@@ -1,53 +0,0 @@
1
- LICENSE
2
- README.md
3
- setup.py
4
- open_flamingo/__init__.py
5
- open_flamingo.egg-info/PKG-INFO
6
- open_flamingo.egg-info/SOURCES.txt
7
- open_flamingo.egg-info/dependency_links.txt
8
- open_flamingo.egg-info/requires.txt
9
- open_flamingo.egg-info/top_level.txt
10
- open_flamingo/chat/__init__.py
11
- open_flamingo/chat/conversation.py
12
- open_flamingo/eval/__init__.py
13
- open_flamingo/eval/classification.py
14
- open_flamingo/eval/coco_metric.py
15
- open_flamingo/eval/eval_datasets.py
16
- open_flamingo/eval/evaluate.py
17
- open_flamingo/eval/evaluate_debug.py
18
- open_flamingo/eval/evaluate_find_showcase.py
19
- open_flamingo/eval/evaluate_temp.py
20
- open_flamingo/eval/imagenet_utils.py
21
- open_flamingo/eval/ok_vqa_utils.py
22
- open_flamingo/eval/vqa_metric.py
23
- open_flamingo/eval/dataset_zoo/__init__.py
24
- open_flamingo/eval/dataset_zoo/aro_datasets.py
25
- open_flamingo/eval/dataset_zoo/constants.py
26
- open_flamingo/eval/dataset_zoo/perturbations.py
27
- open_flamingo/eval/dataset_zoo/retrieval.py
28
- open_flamingo/eval/dataset_zoo/utils.py
29
- open_flamingo/eval/task/__init__.py
30
- open_flamingo/eval/task/caption.py
31
- open_flamingo/eval/task/caption_chat.py
32
- open_flamingo/eval/task/cola.py
33
- open_flamingo/eval/task/crepe.py
34
- open_flamingo/eval/task/gqa.py
35
- open_flamingo/eval/task/mmbench.py
36
- open_flamingo/eval/task/reg.py
37
- open_flamingo/eval/task/utils.py
38
- open_flamingo/eval/task/vl_checklist.py
39
- open_flamingo/src/__init__.py
40
- open_flamingo/src/attention.py
41
- open_flamingo/src/factory.py
42
- open_flamingo/src/flamingo.py
43
- open_flamingo/src/flamingo_lm.py
44
- open_flamingo/src/gcn.py
45
- open_flamingo/src/helpers.py
46
- open_flamingo/src/utils.py
47
- open_flamingo/train/__init__.py
48
- open_flamingo/train/data2.py
49
- open_flamingo/train/distributed.py
50
- open_flamingo/train/instruction_template.py
51
- open_flamingo/train/train.py
52
- open_flamingo/train/train_utils.py
53
- tests/test_flamingo_model.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
multimodal/open_flamingo.egg-info/dependency_links.txt DELETED
@@ -1 +0,0 @@
1
-
 
 
multimodal/open_flamingo.egg-info/requires.txt DELETED
@@ -1,17 +0,0 @@
1
- einops
2
- einops-exts
3
- transformers==4.31.0
4
- torch==1.12.1
5
- torchvision==0.13.1
6
- pillow==9.3.0
7
- more-itertools
8
- datasets==2.9.0
9
- braceexpand==0.1.7
10
- webdataset
11
- wandb==0.13.10
12
- nltk
13
- scipy
14
- inflection
15
- sentencepiece
16
- open_clip_torch==2.20.0
17
- opencv-python==4.7.0.68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
multimodal/open_flamingo.egg-info/top_level.txt DELETED
@@ -1 +0,0 @@
1
- open_flamingo