Weโre thrilled to share ๐ฆ๐บ๐ผ๐น๐ฉ๐๐ (256M & 500M)โthe smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโyou can fine-tune it on your laptop and run it on your toaster!
Why Itโs Game-Changing: - ๐ข๐๐๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ ๐๐ฎ๐ฟ๐ด๐ฒ๐ฟ ๐ ๐ผ๐ฑ๐ฒ๐น๐: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction! ๐ ๐ถ๐ด๐ต๐๐ ๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐: The 256M version delivers 80% of our 2.2B modelโs performance, and the 500M version hits 90% ๐๐ถ๐ด๐ต๐๐ป๐ถ๐ป๐ด-๐๐ฎ๐๐ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโon par with models 10x bigger. That means cheaper, faster indexing and real-world impact.
Whatโs New Under the Hood: - ๐ก๐ฒ๐ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ: Smaller overall size (400M -> 93M), but with higher resolution. - ๐๐ถ๐ด๐ต๐ฒ๐ฟ ๐ฃ๐ถ๐ ๐ฒ๐น๐/๐ง๐ผ๐ธ๐ฒ๐ป: 4096 vs. 1820โmore efficient image processing. - ๐ฆ๐บ๐ฎ๐ฟ๐ ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป: Faster training and a performance boost.
What happened yesterday in the Chinese AI community? ๐
T2A-01-HD ๐ https://hailuo.ai/audio MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.
Tare ๐ https://www.trae.ai/ A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.
Kimi K 1.5 ๐ https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/ An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.
And todayโฆ
Hunyuan 3D-2.0 ๐ tencent/Hunyuan3D-2 A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!
Coming back to Paris Friday to open our new Hugging Face office!
We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐ค๐ฆพ๐ฆฟ
In the past seven days, the Diffusers team has shipped:
1. Two new video models 2. One new image model 3. Two new quantization backends 4. Three new fine-tuning scripts 5. Multiple fixes and library QoL improvements
Coffee on me if someone can guess 1 - 4 correctly.
1 reply
ยท
reacted to merve's
post with ๐ฅabout 1 month ago
Multimodal ๐ผ๏ธ > Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants ๐ > OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license โจ > Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs ๐ฌ > Meta released a new iteration of Llama 70B, Llama3.2-70B trained further > EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license ๐ฅ > Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license > Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models > Dataset: FineWeb2 just landed with multilinguality update! ๐ฅ nearly 8TB pretraining data in many languages!
Image/Video Generation ๐ผ๏ธ > Tencent released HunyuanVideo, a new photorealistic video generation model > OminiControl is a new editing/control framework for image generation models like Flux
Audio ๐ > Indic-Parler-TTS is a new text2speech model made by community
reacted to fdaudens's
post with โค๏ธabout 2 months ago
๐๐ Just dropped: visualization mapping Hugging Face's most liked & downloaded models from 2022 to now. Small models are clearly on the rise - fascinating shift in both likes and download patterns.
๐ซ๐ท Lancement officiel de l'OpenLLM French Leaderboard : initiative open-source pour rรฉfรฉrencer lโรฉvaluation des LLMs francophones
Aprรจs beaucoup dโefforts et de sueurs avec Alexandre Lavallee, nous sommes ravis dโannoncer que le OpenLLMFrenchLeaderboard est en ligne sur Hugging Face (space url: le-leadboard/OpenLLMFrenchLeaderboard) la toute premiรจre plateforme dรฉdiรฉe ร lโรฉvaluation des grands modรจles de langage (LLM) en franรงais. ๐ซ๐ทโจ
Ce projet de longue haleine est avant tout une ลuvre de passion mais surtout une nรฉcessitรฉ absolue. Il devient urgent et vital d'oeuvrer ร plus de transparence dans ce domaine stratรฉgique des LLM dits multilingues. La premiรจre piรจce ร l'รฉdifice est donc la mise en place d'une รฉvaluation systรฉmatique et systรฉmique des modรจles actuels et futurs.
Votre modรจle IA franรงais est-il prรชt ร se dรฉmarquer ? Soumettez le dans notre espace, et voyez comment vous vous comparez par rapport aux autres modรจles.
โ Comment รงa marche : Soumettez votre LLM franรงais pour รฉvaluation, et nous le testerons sur des benchmarks de rรฉfรฉrence spรฉcifiquement adaptรฉs pour la langue franรงaise โ notre suite de benchmarks comprend :
Le processus est encore manuel, mais nous travaillons sur son automatisation, avec le soutien de la communautรฉ Hugging Face.
@clem , on se prรฉpare pour une mise ร niveau de lโespace ? ๐๐
Ce n'est pas qu'une question de chiffresโil s'agit de crรฉer une IA qui reflรจte vraiment notre langue, notre culture et nos valeurs. OpenLLMFrenchLeaderboard est notre contribution personnelle pour faรงonner l'avenir des LLM en France.
1 reply
ยท
reacted to clem's
post with ๐about 2 months ago
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI - A big company will see its market cap divided by two or more because of AI - At least 100,000 personal AI robots will be pre-ordered - China will start to lead the AI race (as a consequence of leading the open-source AI race). - There will be big breakthroughs in AI for biology and chemistry. - We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price โ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs โ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry โ for video ๐ดfor time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI โ Monetary ๐ดEnvironmental (๐ข)
- A popular media will be mostly AI-generated โ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment ๐currently 7M of AI builders on Hugging Face
BlackForest Labs Flux Dev VS. Stability AI Stable Diffusion Large 3.5
Together with the โ data-is-better-together community, we've worked on an Apache 2.0 licensed open image preference dataset based on the fal ai imgsys prompts dataset. Thanks to the awesome community, we have managed to get 5K preference pairs in less than 2 days. The annotation alignment among annotators is great too.
Aashish Kumar won a month of Hugging Face Pro by making the most contributions! Congrats from the entire team ๐ฅ
The best thing?! We are not done yet! Let's keep the annotations coming for 5K more in the second part of the sprint! (with more prices to go around).
SmolVLM speeding locally on a laptop thanks to mlx-vlm and @Gradio ! Try it with two lines: pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit
Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐ - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐ - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
@victor@not-lain There has been a sudden and unusual outbreak of spam postings on the HF Forum that seem to be aimed at relaying online videos and commenting on them. It is also spanning multiple languages for some reason. I've flagged it too, but I'm not sure if the staff will be able to keep up with the manual measures in the future.
๐งถ We are launching distilabel DataCraft: get started with synthetic data using clicks and natural language!
๐ Workflow - Write down your custom GenAI usecase - Automatically generate system prompts - Create sample datasets for quick iteration - Produce full-scale datasets with customizable parameters - Push generated datasets directly to the Hugging Face Hub
โก๏ธ Powered by Argilla's distilabel and open source LLMs ๐ Uses Free Serverless HF Inference Endpoints
๐ก Use Cases: - Fine-tuning language models for specific domains - Creating diverse datasets for robust model training - Rapid prototyping of AI applications - Generating synthetic data for privacy-sensitive projects
I was reading through an abstract and found myself wondering how much LLM performance is being left on the table due to insufficient curation of training datasets: "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning" by Kaur, Park, Goyal, Arora. https://arxiv.org/abs/2408.14774 In particular, the observation that "Introducing low quality answers ("shirkers") in 20% of Instruct-SkillMix examples causes performance to plummet..." had me wondering how many ostensibly good datasets out there are in fact populated with a significant number of "shirkers".