Hugging Face Smol Cluster
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
HFSmolCluster's activity
Post
1411
Very interesting security section by
@yjernite
@lvwerra
@reach-vb
@dvilasuero
& the team replicating R1. Broadly applicable to most open-source models & some to APIs (but APIs have a lot more additional risks because you're not in control of the underlying system):
https://huggingface.co/blog/open-r1/update-4#is-it-safe
https://huggingface.co/blog/open-r1/update-4#is-it-safe
Post
1448
A repository is created every ~15 secs on Hugging Face so
@kramp
added a "Getting Started" to make it easier & a model release checklist: https://huggingface.co/docs/hub/model-release-checklist
What are you uploading today?
What are you uploading today?
Post
2521
Nice new space to see how fast your personal or organization followers are growing on HF:
julien-c/follow-history
As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces ๐๐๐
julien-c/follow-history
As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces ๐๐๐

edbeechingย
authored
a
paper
16 days ago

lewtunย
authored
a
paper
16 days ago
Post
2640
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.
And even we were mind-blown by the results we got with this latest model we're releasing: โก๏ธOlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)
It's beating Claude 3.7 on (competitive) programming โa domain Anthropic has been historically really strong atโ and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!
And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3
Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
And even we were mind-blown by the results we got with this latest model we're releasing: โก๏ธOlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)
It's beating Claude 3.7 on (competitive) programming โa domain Anthropic has been historically really strong atโ and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!
And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3
Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

clefourrierย
posted
an
update
17 days ago
Post
1916
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
Post
2119
Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems ๐งโ๐ป
- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B
We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger ๐ช
Together with the models, we are releasing:
๐CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
๐ IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi
For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3
- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B
We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger ๐ช
Together with the models, we are releasing:
๐CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
๐ IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi
For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3
Post
7242
I was chatting with
@peakji
, one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use โHUGGINGFACEโ to get access!
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use โHUGGINGFACEโ to get access!
Post
4674
10,000+ models based on Deepseek R1 have been publicly shared on Hugging Face! Which ones are your favorite ones: https://huggingface.co/models?sort=trending&search=r1. Truly game-changer!
Post
5909
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
Post
2834
What are the best organizations to follow on
@huggingface
?
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
Post
3494
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers
Post
5015
Introducing OpenR1-Math-220k!
open-r1/OpenR1-Math-220k
The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch ๐ช
Whatโs new compared to existing reasoning datasets?
โพ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.
๐ณ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.
๐ 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.
โณ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canโt be verified with a rules-based parser)
๐ We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.
๐ Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2
open-r1/OpenR1-Math-220k
The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch ๐ช
Whatโs new compared to existing reasoning datasets?
โพ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.
๐ณ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.
๐ 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.
โณ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canโt be verified with a rules-based parser)
๐ We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.
๐ Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

hlarcherย
authored
a
paper
about 2 months ago

lewtunย
authored
a
paper
about 2 months ago

loubnabnlย
authored
a
paper
about 2 months ago