Open LLM Leaderboard

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

open-llm-bot updated a dataset 2 minutes ago

open-llm-leaderboard/requests

open-llm-bot updated a dataset 18 minutes ago

open-llm-leaderboard/Sakalti__ultiima-14B-v0.3-details

open-llm-bot updated a dataset 22 minutes ago

open-llm-leaderboard/YOYO-AI__Qwen2.5-14B-YOYO-1005-details

View all activity

open-llm-leaderboard's activity

open-llm-bot

updated a dataset 2 minutes ago

open-llm-leaderboard/requests

Preview • Updated 2 minutes ago • 251k • 9

open-llm-bot

updated a dataset 18 minutes ago

open-llm-leaderboard/Sakalti__ultiima-14B-v0.3-details

Updated 18 minutes ago

open-llm-bot

updated a dataset 22 minutes ago

open-llm-leaderboard/YOYO-AI__Qwen2.5-14B-YOYO-1005-details

Updated 22 minutes ago

lewtun

posted an update about 13 hours ago

Post

829

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1

AdinaY

posted an update 1 day ago

Post

777

Baichuan is making big moves today 🔥

✨ Launched All-Scenario Reasoning Model (language, visual, and search reasoning capabilities) , with medical expertise as one of its key highlights.
https://ying.baichuan-ai.com/chat

✨ Released Baichuan-M1-14B Medical LLM on the hub
Available in both Base and Instruct versions, support English & Chinese.

Model:
baichuan-inc/Baichuan-M1-14B-Base
baichuan-inc/Baichuan-M1-14B-Instruct

AdinaY

posted an update 2 days ago

Post

1251

VideoLLaMA 3🔥multimodal foundation models for Image and Video Understanding by DAMO Alibaba

Model: DAMO-NLP-SG/videollama3-678cdda9281a0e32fe79af15
Paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

✨ 2B/7B
✨ Apache2.0

1 reply

AdinaY

posted an update 3 days ago

Post

2438

UI-TARS 🔥 series of native GUI agent models (2B/7B/72B) released by ByteDance, combining perception, reasoning, grounding, and memory into one system.

Model: https://huggingface.co/bytedance-research
Paper: UI-TARS: Pioneering Automated GUI Interaction with Native Agents (2501.12326)

AdinaY

posted an update 5 days ago

Post

2542

What happened yesterday in the Chinese AI community? 🚀

T2A-01-HD 👉 https://hailuo.ai/audio
MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.

Tare 👉 https://www.trae.ai/
A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.

DeepSeek-R1 Series 👉 deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
Open-source reasoning models with MIT license by DeepSeek.

Kimi K 1.5 👉 https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/
An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.

And today…

Hunyuan 3D-2.0 👉 tencent/Hunyuan3D-2
A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!

Stay tuned for more updates 👉 https://huggingface.co/zh-ai-community

AdinaY

posted an update 5 days ago

Post

718

Hunyuan 3D 2.0🔥 a synthesis system for high-res textured 3D assets released by Tencent Hunyuan

2 key components: Hunyuan3D-DiT (geometry) and Hunyuan3D-Paint (textures) work together, achieving highly realistic 3D results.

Model: tencent/Hunyuan3D-2
Demo coming soon!

AdinaY

posted an update 6 days ago

Post

2744

BIG release by DeepSeek AI🔥🔥🔥

DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!
https://huggingface.co/deepseek-ai
deepseek-ai/DeepSeek-R1

✨ MIT License : enabling distillation for custom models
✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities
✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'

AdinaY

posted an update 9 days ago

Post

1329

New work from Alibaba_Qwen🔥

Qwen2.5-Math-PRM 7B & 72B 🔢 Process Reward Models for enhanced process supervision in the mathematical reasoning of LLMs.

Paper:
The Lessons of Developing Process Reward Models in Mathematical Reasoning (2501.07301)
Model:
Qwen/Qwen2.5-Math-PRM-7B
Qwen/Qwen2.5-Math-PRM-72B

hynky

authored a paper 10 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 11 days ago • 47

lvwerra

authored a paper 10 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 11 days ago • 47

thomwolf

authored a paper 10 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 11 days ago • 47

AdinaY

posted an update 11 days ago

Post

2219

InternLM3-8B-instruct🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost!
internlm/internlm3-67875827c377690c01a9131d

AdinaY

posted an update 11 days ago

Post

3074

MiniMax, the company behind Hailuo_AI, has joined the open source community by releasing both models and demos of MiniMax-Text-01 & MiniMax-VL-01🔥
- Model
MiniMaxAI/MiniMax-VL-01
MiniMaxAI/MiniMax-Text-01
- Demo
MiniMaxAI/MiniMax-VL-01
MiniMaxAI/MiniMax-Text-01

✨ MiniMax-text-01:
- 456B with 45.9B activated per token
- Combines Lightning Attention, Softmax Attention, and MoE for optimal performance
- Training context up to 1M tokens, inference handles 4M tokens

✨ MiniMax-VL-01:
- ViT-MLP-LLM framework ( non-transformer👀)
- Handles image inputs from 336×336 to 2016×2016
- 694M image-caption pairs + 512B tokens processed across 4 stages

1 reply

AdinaY

posted an update 12 days ago

Post

3164

MiniCPM-o2.6 🔥 an end-side multimodal LLMs released by OpenBMB from the Chinese community
Model: openbmb/MiniCPM-o-2_6
✨ Real-time English/Chinese conversation, emotion control and ASR/STT
✨ Real-time video/audio understanding
✨ Processes up to 1.8M pixels, leads OCRBench & supports 30+ languages

meg

posted an update 12 days ago

Post

2923

💫...And we're live!💫 Seasonal newsletter from ethicsy folks at Hugging Face, exploring the ethics of "AI Agents"
https://huggingface.co/blog/ethics-soc-7
Our analyses found:
- There's a spectrum of "agent"-ness
- *Safety* is a key issue, leading to many other value-based concerns
Read for details & what to do next!
With @evijit , @giadap , and @sasha

AdinaY

posted an update 16 days ago

Post

620

LLaVA-Mini🔥 A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences
Model: ICTNLP/llava-mini-llama-3.1-8b
Paper: LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (2501.03895)
✨ Matches LLaVA-v1.5 using just 1 vision token
✨ Delivers <40ms response time
✨ Reduces vision tokens while maintaining strong visual understanding

albertvillanova

posted an update 19 days ago

Post

1867

Discover all the improvements in the new version of Lighteval: https://huggingface.co/docs/lighteval/

AI & ML interests

Recent Activity

Team members 19

open-llm-leaderboard's activity