parinzee commited on
Commit
5a0a166
·
verified ·
1 Parent(s): 0b53cf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -9
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  license: llama3
11
  ---
12
 
13
- # **Typhoon-Vision Research Preview**
14
 
15
  **llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
16
 
@@ -111,16 +111,23 @@ output_ids = model.generate(
111
  print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
112
  ```
113
 
 
 
 
 
 
 
 
 
114
  # Intended Uses & Limitations
115
  This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
116
 
117
- # Follow us
118
- Twitter: https://twitter.com/opentyphoon
119
-
120
- # Support
121
- Discord: https://discord.gg/CqyBscMFpg
122
 
123
  # Acknowledgements
124
- In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:
125
- - Training recipe: [Bunny](https://github.com/BAAI-DCAI/Bunny) from BAAI
126
- - Vision Encoder: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) from Google
 
 
10
  license: llama3
11
  ---
12
 
13
+ # **Typhoon-Vision Preview**
14
 
15
  **llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
16
 
 
111
  print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
112
  ```
113
 
114
+ # Evaluation Results
115
+ | Model | MMBench (Dev) | Pope | GQA | GQA (Thai) |
116
+ |:--|:--|:--|:--|:--|
117
+ | Typhoon-Vision 8B Preview | 70.9 | 84.8 | 62.0 | 43.6 |
118
+ | SeaLMMM 7B v0.1 | 64.8 | 86.3 | 61.4 | 25.3 |
119
+ | Bunny Llama3 8B Vision | 76.0 | 86.9 | 64.8 | 24.0 |
120
+ | GPT-4o Mini | 69.8 | 45.4 | 42.6 | 18.1 |
121
+
122
  # Intended Uses & Limitations
123
  This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
124
 
125
+ # Follow Us & Support
126
+ https://twitter.com/opentyphoon
127
+ https://discord.gg/CqyBscMFpg
 
 
128
 
129
  # Acknowledgements
130
+ We would like to thank the Bunny team for open-sourcing their code and data, and thanks to the Google Team for releasing the fine-tuned SigLIP that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
131
+
132
+ ## Typhoon Team
133
+ Parinthapat Pengpun, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul