BAAI
/

BoyaWu10 commited on
Commit
7bf2ff1
Β·
verified Β·
1 Parent(s): 58c5e3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -9,7 +9,7 @@ license: apache-2.0
9
  <img src="./icon.png" alt="Logo" width="350">
10
  </p>
11
 
12
- πŸ“– [Technical report](https://arxiv.org/abs/2402.11530) | 🏠 [Code](https://github.com/BAAI-DCAI/Bunny) | 🐰 [Demo](https://wisemodel.cn/spaces/baai/Bunny)
13
 
14
  Bunny is a family of lightweight but powerful multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5, MiniCPM and Phi-2. To compensate for the decrease in model size, we construct more informative training data by curated selection from a broader data source. Remarkably, our Bunny-v1.0-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLM frameworks (7B), and even achieves performance on par with 13B models.
15
 
@@ -75,7 +75,9 @@ output_ids = model.generate(
75
  input_ids,
76
  images=image_tensor,
77
  max_new_tokens=100,
78
- use_cache=True)[0]
 
 
79
 
80
  print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
81
  ```
 
9
  <img src="./icon.png" alt="Logo" width="350">
10
  </p>
11
 
12
+ πŸ“– [Technical report](https://arxiv.org/abs/2402.11530) | 🏠 [Code](https://github.com/BAAI-DCAI/Bunny) | 🐰 [Demo](http://bunny.dataoptim.org)
13
 
14
  Bunny is a family of lightweight but powerful multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5, MiniCPM and Phi-2. To compensate for the decrease in model size, we construct more informative training data by curated selection from a broader data source. Remarkably, our Bunny-v1.0-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLM frameworks (7B), and even achieves performance on par with 13B models.
15
 
 
75
  input_ids,
76
  images=image_tensor,
77
  max_new_tokens=100,
78
+ use_cache=True,
79
+ repetition_penalty=1.0 # increase this to avoid chattering
80
+ )[0]
81
 
82
  print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
83
  ```