scb10x
/

llama-3-typhoon-v1.5-8b-vision-preview

Text Generation

Model card Files Files and versions Community

parinzee commited on Aug 13, 2024

Commit

5a0a166

·

verified ·

1 Parent(s): 0b53cf3

Update README.md

Files changed (1) hide show

README.md +16 -9

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 license: llama3
 ---
-# **Typhoon-Vision Research Preview**
 **llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
@@ -111,16 +111,23 @@ output_ids = model.generate(
 print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
 ```
 # Intended Uses & Limitations
 This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
-# Follow us
-Twitter: https://twitter.com/opentyphoon
-# Support
-Discord: https://discord.gg/CqyBscMFpg
 # Acknowledgements
-In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:
-- Training recipe: [Bunny](https://github.com/BAAI-DCAI/Bunny) from BAAI
-- Vision Encoder: [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) from Google

 license: llama3
 ---
+# **Typhoon-Vision Preview**
 **llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
 print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
 ```
+# Evaluation Results
+| Model | MMBench (Dev) | Pope | GQA | GQA (Thai) |
+|:--|:--|:--|:--|:--|
+| Typhoon-Vision 8B Preview | 70.9 | 84.8 | 62.0 | 43.6 |
+| SeaLMMM 7B v0.1 | 64.8 | 86.3 | 61.4 | 25.3 |
+| Bunny Llama3 8B Vision | 76.0 | 86.9 | 64.8 | 24.0 |
+| GPT-4o Mini | 69.8 | 45.4 | 42.6 | 18.1 |
 # Intended Uses & Limitations
 This model is experimental and might not be fully evaluated for all use cases. Developers should assess risks in the context of their specific applications.
+# Follow Us & Support
+https://twitter.com/opentyphoon
+https://discord.gg/CqyBscMFpg
 # Acknowledgements
+We would like to thank the Bunny team for open-sourcing their code and data, and thanks to the Google Team for releasing the fine-tuned SigLIP that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
+## Typhoon Team
+Parinthapat Pengpun, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul