kas

shing3232

AI & ML interests

None yet

Recent Activity

Organizations

None yet

shing3232's activity

New activity in tencent/Tencent-Hunyuan-Large about 2 months ago
New activity in Qwen/Qwen1.5-7B-Chat-GGUF 7 months ago
reacted to merve's post with 🚀 9 months ago
view post
Post
3325
LLaVA-NeXT is recently merged to Hugging Face transformers and it outperforms many of the closed source models like Gemini on various benchmarks 🤩 Let's take a look!
Demo: merve/llava-next
Notebook: https://colab.research.google.com/drive/1afNudu72SNWZCYtCVrRlb9T9Vj9CFJEK?usp=sharing
LLaVA is essentially a vision-language model that consists of ViT-based CLIP encoder, a MLP projection and Vicuna as decoder ✨
LLaVA 1.5 was released with Vicuna, but LLaVA NeXT (1.6) is released with four different LLMs:
- Nous-Hermes-Yi-34B
- Mistral-7B
- Vicuna 7B & 13B
Mistral and Nous-Hermes-Yi-34B are performing better and have better commercial use.
Moreover, according to authors' findings, the improvements comes from more diverse and high quality data mixture and dynamic high resolution.
LLaVA based on Nous-Hermes-Yi-34B outperforms many other models, including Gemini in various multimodal understanding and generation benchmarks 😊