Image-Text-to-Text
Transformers
Safetensors
English
MLLM
Inference Endpoints
liyang commited on
Commit
dc186b4
1 Parent(s): 5b5fe00

update readme

Browse files
Files changed (1) hide show
  1. README.md +38 -20
README.md CHANGED
@@ -10,30 +10,45 @@ language:
10
  - en
11
  ---
12
 
13
- # Model Card
14
- <!-- Provide a quick summary of what the model is/does. -->
15
- Parrot is a multi-language and multi-modal large language model capable of achieving excellent performance.
 
 
 
16
  For a comprehensive introduction, please refer to [Parrot Paper](https://arxiv.org/abs/2406.02539) and [Parrot GitHub](https://github.com/AIDC-AI/Parrot).
17
 
18
- # Model Details
19
- ![](https://github.com/AIDC-AI/Parrot/images/teaser.png)
20
 
21
- # Performance
22
- ![](https://github.com/AIDC-AI/Parrot/images/performance.png)
23
- ![](https://github.com/AIDC-AI/Parrot/images/performance_table.png)
24
- # Usage
25
 
26
- Below is a code snippet to run Parrot with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Parrot GitHub](https://github.com/AIDC-AI/Parrot).
27
- ```markdown
28
- pip install torch==2.1.2 transformers==4.43.2 pillow==10.3.0
29
- ```
30
- ```python
31
- import torch
32
- from PIL import Image
33
- from transformers import AutoModelForCausalLM
34
- ```
 
 
 
35
 
36
- # Citation
 
 
 
 
 
 
 
 
 
37
  If you find Parrot useful, please cite the paper
38
 
39
  ```markdown
@@ -45,5 +60,8 @@ If you find Parrot useful, please cite the paper
45
  }
46
  ```
47
 
48
- # License
49
  The project is licensed under Apache License Version 2.0 and is restricted to uses that comply with the license agreements of Qwen and Clip.
 
 
 
 
10
  - en
11
  ---
12
 
13
+ # Parrot-7B
14
+
15
+ ## Introduction
16
+ Welcome to Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level.
17
+ Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens.
18
+ Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB.
19
  For a comprehensive introduction, please refer to [Parrot Paper](https://arxiv.org/abs/2406.02539) and [Parrot GitHub](https://github.com/AIDC-AI/Parrot).
20
 
21
+ ## Model
22
+ Parrot is a multilingual multimodal large language model. We provide our fully finetuned models below:
23
 
24
+ | Model | Base LLM | Vision Encoder | Stage | Download |
25
+ | --- | --- | :---: | :---: | :---: |
26
+ | Parrot-7B | Qwen-1.5-7B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-7B) |
27
+ | Parrot-14B | Qwen-1.5-14B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-14B) |
28
 
29
+ <div align="center">
30
+ <img src="https://github.com/AIDC-AI/Parrot/images/teaser.png" width="600px" />
31
+ </div>
32
+
33
+
34
+ ## Performance
35
+ <div align="center">
36
+ <img src="https://github.com/AIDC-AI/Parrot/images/performance.png" width="100%" />
37
+ </div>
38
+ <div align="center">
39
+ <img src="https://github.com/AIDC-AI/Parrot/images/performance_table.png" width="100%" />
40
+ </div>
41
 
42
+
43
+ ## Quick Start
44
+ We provide a quick start demo in [Parrot GitHub](https://github.com/AIDC-AI/Parrot), which can be used as a template to run Parrot for inference.
45
+
46
+ 1. Before running the demo, please make sure you download the [Parrot checkpoint](https://huggingface.co/AIDC-AI/Parrot-7B) and the [Clip checkpoint](https://huggingface.co/openai/clip-vit-large-patch14-336).
47
+ 2. Second, you should replace the paths in the `runner.py`.
48
+ 3. Finally, run the python file in your system.
49
+
50
+
51
+ ## Citation
52
  If you find Parrot useful, please cite the paper
53
 
54
  ```markdown
 
60
  }
61
  ```
62
 
63
+ ## License
64
  The project is licensed under Apache License Version 2.0 and is restricted to uses that comply with the license agreements of Qwen and Clip.
65
+
66
+ ## Disclaimer
67
+ We used compliance-checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to the complexity of the data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.