Training Transformers Together

non-profit

https://training-transformers-together.github.io

training-transformers-together

Activity Feed

AI & ML interests

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

Recent Activity

Mrinal authored a paper 29 days ago

Multilingual State Space Models for Structured Question Answering in Indic Languages

thomwolf authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

mehdidc authored a paper 2 months ago

A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

View all activity

IlyasMoutawwakil

posted an update 13 days ago

Post

3296

🚀 Optimum: The Last v1 Release 🚀
Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future:
- Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators..
- Optimum‑ONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.

🎯 Why this matters:
- A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience..
- Enable innovation at a faster pace in a more modular, open-source environment.

💡 What this means:
- More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner 👀, ...)
- A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)

🛠️ Major updates I worked on in this release:
✅ Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime.
✅ Solved batched inference/generation for all supported decoder model architectures (LLMs).

✨ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of Optimum‑ONNX.

📝 Release Notes: https://lnkd.in/gXtE_qji
📦 Optimum : https://lnkd.in/ecAezNT6
🎁 Optimum-ONNX: https://lnkd.in/gzjyAjSi
#Optimum #ONNX #OpenSource #HuggingFace #Transformers #Diffusers

Mrinal

authored a paper 29 days ago

Multilingual State Space Models for Structured Question Answering in Indic Languages

Paper • 2502.01673 • Published Feb 1 • 2

thomwolf

authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 65

mehdidc

authored 5 papers 2 months ago

A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

Paper • 2304.07169 • Published Apr 14, 2023

DataComp: In search of the next generation of multimodal datasets

Paper • 2304.14108 • Published Apr 27, 2023 • 2

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Paper • 2406.02061 • Published Jun 4, 2024 • 2

A Practitioner's Guide to Continual Multimodal Pretraining

Paper • 2408.14471 • Published Aug 26, 2024

Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Paper • 2506.04598 • Published Jun 5 • 6

thomwolf

authored a paper 2 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 127

julien-c

posted an update 4 months ago

Post

6328

BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript 🔥

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. 🤯

➡️ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents

1 reply

thomwolf

posted an update 4 months ago

Post

6268

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!

1 reply

thomwolf

authored 2 papers 4 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22

awacke1

posted an update 4 months ago

Post

2137

AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥

thomwolf

posted an update 5 months ago

Post

3718

The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324

1 reply

osanseviero

authored a paper 5 months ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 53

thomwolf

posted an update 5 months ago

Post

3064

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

julien-c

posted an update 5 months ago

Post

4083

Important notice 🚨

For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference – with more coming soon), we've started enabling Pay as you go (=PAYG)

What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.

You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.

9 replies

awacke1

posted an update 5 months ago

Post

2336

I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" 🚀

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: https://huggingface.co/spaces/awacke1/TorchTransformers-NLP-CV-SFT

2 replies

awacke1

posted an update 6 months ago

Post

2499

🚀 Blast into the future with ZaxxonGalaxian – a thrilling 3D action game where you navigate epic battles through towering 3D cityscapes! Face off against relentless swarm bots, climb the leaderboard, and dominate the skies. awacke1/ZaxxoGalaxian

AI & ML interests

Recent Activity

Team members 117

training-transformers-together's activity