Andres Marafioti's picture

Andres Marafioti

andito

AI & ML interests

Multimodal models, VLM and TTS

Recent Activity

liked a dataset 1 day ago
Aria-UI/Aria-UI_Data
liked a model 2 days ago
HuggingFaceTB/SmolVLM-500M-Instruct
liked a model 2 days ago
HuggingFaceTB/SmolVLM-256M-Instruct
View all activity

Articles

Organizations

Hugging Face's profile picture HuggingFaceM4's profile picture Huggingface Projects's profile picture Hugging Face H4's profile picture Hugging Face TB Research's profile picture MLX Community's profile picture Distillation Hugs's profile picture Argilla Warehouse's profile picture Hugging Face FineVideo's profile picture smol-explorers's profile picture Hugging Face Science's profile picture

andito's activity

posted an update 3 days ago
view post
Post
1313
๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐˜„๐—ผ๐—ฟ๐—น๐—ฑ'๐˜€ ๐˜€๐—บ๐—ฎ๐—น๐—น๐—ฒ๐˜€๐˜ ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น!

Weโ€™re thrilled to share ๐—ฆ๐—บ๐—ผ๐—น๐—ฉ๐—Ÿ๐—  (256M & 500M)โ€”the smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโ€”you can fine-tune it on your laptop and run it on your toaster!

Why Itโ€™s Game-Changing:
- ๐—ข๐˜‚๐˜๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐˜€ ๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
๐— ๐—ถ๐—ด๐—ต๐˜๐˜† ๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐˜†: The 256M version delivers 80% of our 2.2B modelโ€™s performance, and the 500M version hits 90%
๐—Ÿ๐—ถ๐—ด๐—ต๐˜๐—ป๐—ถ๐—ป๐—ด-๐—™๐—ฎ๐˜€๐˜ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโ€”on par with models 10x bigger. That means cheaper, faster indexing and real-world impact.

Whatโ€™s New Under the Hood:
- ๐—ก๐—ฒ๐˜„ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ: Smaller overall size (400M -> 93M), but with higher resolution.
- ๐—›๐—ถ๐—ด๐—ต๐—ฒ๐—ฟ ๐—ฃ๐—ถ๐˜…๐—ฒ๐—น๐˜€/๐—ง๐—ผ๐—ธ๐—ฒ๐—ป: 4096 vs. 1820โ€”more efficient image processing.
- ๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜ ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Faster training and a performance boost.

Check our blog: https://huggingface.co/blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo
  • 1 reply
ยท
reacted to AdinaY's post with ๐Ÿš€ 4 days ago
view post
Post
2542
What happened yesterday in the Chinese AI community? ๐Ÿš€

T2A-01-HD ๐Ÿ‘‰ https://hailuo.ai/audio
MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.

Tare ๐Ÿ‘‰ https://www.trae.ai/
A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.

DeepSeek-R1 Series ๐Ÿ‘‰ deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
Open-source reasoning models with MIT license by DeepSeek.

Kimi K 1.5 ๐Ÿ‘‰ https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/
An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.

And todayโ€ฆ

Hunyuan 3D-2.0 ๐Ÿ‘‰ tencent/Hunyuan3D-2
A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!

Stay tuned for more updates ๐Ÿ‘‰ https://huggingface.co/zh-ai-community
reacted to clem's post with ๐Ÿš€ about 1 month ago
view post
Post
1958
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐Ÿค–๐Ÿฆพ๐Ÿฆฟ

https://t.co/enkFXjWndJ
  • 1 reply
ยท
reacted to sayakpaul's post with ๐Ÿš€ about 1 month ago
view post
Post
2109
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2811
Aya by Cohere For AI can now see! ๐Ÿ‘€

C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B ๐ŸŒฑ works on 8 languages! ๐Ÿ—ฃ๏ธ

The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo

Dataset maya-multimodal/pretrain

Model maya-multimodal/maya ๐Ÿ‘
kudos @nahidalam and team
  • 1 reply
ยท
reacted to merve's post with โค๏ธ about 2 months ago
view post
Post
5619
This week in open-source AI was insane ๐Ÿค  A small recap๐Ÿ•บ๐Ÿป merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal ๐Ÿ–ผ๏ธ
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants ๐Ÿ‘
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license โœจ
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs ๐Ÿ’ฌ
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license ๐Ÿ”ฅ
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! ๐Ÿ”ฅ nearly 8TB pretraining data in many languages!

Image/Video Generation ๐Ÿ–ผ๏ธ
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio ๐Ÿ”Š
> Indic-Parler-TTS is a new text2speech model made by community
reacted to fdaudens's post with โค๏ธ about 2 months ago
view post
Post
1090
๐Ÿ“ˆ๐Ÿ‘€ Just dropped: visualization mapping Hugging Face's most liked & downloaded models from 2022 to now. Small models are clearly on the rise - fascinating shift in both likes and download patterns.

Check it out: huggingface/open-source-ai-year-in-review-2024
reacted to malhajar's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
4419
๐Ÿ‡ซ๐Ÿ‡ท Lancement officiel de l'OpenLLM French Leaderboard : initiative open-source pour rรฉfรฉrencer lโ€™รฉvaluation des LLMs francophones

Aprรจs beaucoup dโ€™efforts et de sueurs avec Alexandre Lavallee, nous sommes ravis dโ€™annoncer que le OpenLLMFrenchLeaderboard est en ligne sur Hugging Face (space url: le-leadboard/OpenLLMFrenchLeaderboard) la toute premiรจre plateforme dรฉdiรฉe ร  lโ€™รฉvaluation des grands modรจles de langage (LLM) en franรงais. ๐Ÿ‡ซ๐Ÿ‡ทโœจ

Ce projet de longue haleine est avant tout une ล“uvre de passion mais surtout une nรฉcessitรฉ absolue. Il devient urgent et vital d'oeuvrer ร  plus de transparence dans ce domaine stratรฉgique des LLM dits multilingues. La premiรจre piรจce ร  l'รฉdifice est donc la mise en place d'une รฉvaluation systรฉmatique et systรฉmique des modรจles actuels et futurs.

Votre modรจle IA franรงais est-il prรชt ร  se dรฉmarquer ? Soumettez le dans notre espace, et voyez comment vous vous comparez par rapport aux autres modรจles.

โ“ Comment รงa marche :
Soumettez votre LLM franรงais pour รฉvaluation, et nous le testerons sur des benchmarks de rรฉfรฉrence spรฉcifiquement adaptรฉs pour la langue franรงaise โ€” notre suite de benchmarks comprend :

- BBH-fr : Raisonnement complexe
- IFEval-fr : Suivi d'instructions
- GPQA-fr : Connaissances avancรฉes
- MUSR-fr : Raisonnement narratif
- MATH_LVL5-fr : Capacitรฉs mathรฉmatiques
- MMMLU-fr : Comprรฉhension multitรขche

Le processus est encore manuel, mais nous travaillons sur son automatisation, avec le soutien de la communautรฉ Hugging Face.

@clem , on se prรฉpare pour une mise ร  niveau de lโ€™espace ? ๐Ÿ˜๐Ÿ‘€

Ce n'est pas qu'une question de chiffresโ€”il s'agit de crรฉer une IA qui reflรจte vraiment notre langue, notre culture et nos valeurs. OpenLLMFrenchLeaderboard est notre contribution personnelle pour faรงonner l'avenir des LLM en France.
  • 1 reply
ยท
reacted to clem's post with ๐Ÿš€ about 2 months ago
view post
Post
4575
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.

How my predictions for 2024 turned out:

- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
โœ… (Inflexion, AdeptAI,...)

- Open-source LLMs will reach the level of the best closed-source LLMs
โœ… with QwQ and dozens of others

- Big breakthroughs in AI for video, time-series, biology and chemistry
โœ… for video ๐Ÿ”ดfor time-series, biology and chemistry

- We will talk much more about the cost (monetary and environmental) of AI
โœ…Monetary ๐Ÿ”ดEnvironmental (๐Ÿ˜ข)

- A popular media will be mostly AI-generated
โœ… with NotebookLM by Google

- 10 millions AI builders on Hugging Face leading to no increase of unemployment
๐Ÿ”œcurrently 7M of AI builders on Hugging Face
ยท
reacted to davidberenstein1957's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
1585
๐Ÿ”ฅ Dataset Drop - Open Image Preferences

BlackForest Labs Flux Dev VS. Stability AI Stable Diffusion Large 3.5

Together with the โ data-is-better-together community, we've worked on an Apache 2.0 licensed open image preference dataset based on the fal ai imgsys prompts dataset. Thanks to the awesome community, we have managed to get 5K preference pairs in less than 2 days. The annotation alignment among annotators is great too.

Aashish Kumar won a month of Hugging Face Pro by making the most contributions! Congrats from the entire team ๐Ÿฅ‡

The best thing?! We are not done yet! Let's keep the annotations coming for 5K more in the second part of the sprint! (with more prices to go around).

Dataset: https://huggingface.co/datasets/data-is-better-together/image-preferences-results
posted an update about 2 months ago
view post
Post
1902
SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to their post with ๐Ÿ”ฅ about 2 months ago
view post
Post
3339
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐Ÿคฏ
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐Ÿš€
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
reacted to merve's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
2200
The authors of ColPali trained a retrieval model based on SmolVLM ๐Ÿค  vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 ๐Ÿ’—
posted an update about 2 months ago
view post
Post
3339
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐Ÿคฏ
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐Ÿš€
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
reacted to THUdyh's post with ๐Ÿ”ฅ 3 months ago
reacted to clem's post with ๐Ÿš€ 3 months ago
view post
Post
2050
Who's going to get to the most liked model on Hugging Face first: StabilityAI, Meta, Black Forest or someone else? The race is on!
  • 2 replies
ยท
reacted to John6666's post with ๐Ÿ‘€ 4 months ago
view post
Post
16116
@victor @not-lain There has been a sudden and unusual outbreak of spam postings on the HF Forum that seem to be aimed at relaying online videos and commenting on them. It is also spanning multiple languages for some reason. I've flagged it too, but I'm not sure if the staff will be able to keep up with the manual measures in the future.
ยท
reacted to davidberenstein1957's post with ๐Ÿš€ 4 months ago
view post
Post
2014
๐Ÿงถ We are launching distilabel DataCraft: get started with synthetic data using clicks and natural language!

๐ŸŒŠ Workflow
- Write down your custom GenAI usecase
- Automatically generate system prompts
- Create sample datasets for quick iteration
- Produce full-scale datasets with customizable parameters
- Push generated datasets directly to the Hugging Face Hub

โšก๏ธ Powered by Argilla's distilabel and open source LLMs
๐Ÿ†“ Uses Free Serverless HF Inference Endpoints

๐Ÿ’ก Use Cases:
- Fine-tuning language models for specific domains
- Creating diverse datasets for robust model training
- Rapid prototyping of AI applications
- Generating synthetic data for privacy-sensitive projects

๐Ÿš€ Start crafting your custom datasets today and do it quicker, easier and more private with distilabel DataCraft!
https://huggingface.co/spaces/argilla/distilabel-datacraft
  • 1 reply
ยท
reacted to julien-c's post with โค๏ธ 4 months ago
view post
Post
5196
Hey it was good meeting you yesterday @MaziyarPanahi ๐Ÿ”ฅ

thanks @mishig for setting this up

Let's make the Hub as useful as possible for the community โค๏ธ
  • 1 reply
ยท
reacted to grimjim's post with ๐Ÿ‘€ 4 months ago
view post
Post
1970
I was reading through an abstract and found myself wondering how much LLM performance is being left on the table due to insufficient curation of training datasets: "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning" by Kaur, Park, Goyal, Arora.
https://arxiv.org/abs/2408.14774
In particular, the observation that "Introducing low quality answers ("shirkers") in 20% of Instruct-SkillMix examples causes performance to plummet..." had me wondering how many ostensibly good datasets out there are in fact populated with a significant number of "shirkers".
  • 7 replies
ยท