Hugging Face Fellows

non-profit

AI & ML interests

The Fellowship is a network of exceptional people from different backgrounds who contribute to open-source machine learning πŸ§™β€β™‚οΈπŸ¦Έβ€β™€οΈπŸ¦ΉπŸ§β€β™‚οΈ

Recent Activity

hugging-fellows's activity

merveΒ 
posted an update 1 day ago
view post
Post
1893
What a beginning to this year in open ML 🀠
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal πŸ–ΌοΈ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook β€” 22k hours worth of samples from instruction videos 🀯
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs πŸ’¬
> Microsoft released Phi-4, sota open-source 14B language model πŸ”₯
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B 🐬🐬
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct πŸ’­
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview πŸ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs πŸ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences πŸ‘©πŸ»β€πŸ’»

Embeddings πŸ”–
> @MoritzLaurer released zero-shot version of ModernBERT large πŸ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation ⏯️
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts πŸ”₯
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
merveΒ 
posted an update 3 days ago
view post
Post
1541
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license πŸ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM πŸ’¬

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ‡️

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
Β·
clemΒ 
posted an update 9 days ago
view post
Post
3844
Cool to see @ylecun joining the top 10 of most followed on HF!

(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)
  • 2 replies
Β·
tomaarsenΒ 
posted an update 11 days ago
view post
Post
2621
That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details:
πŸ€– Based on ModernBERT-base with 149M parameters.
πŸ“Š Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
🏎️ Immediate FA2 and unpacking support for super efficient inference.
πŸͺ† Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
➑️ Maximum sequence length of 8192 tokens!
2️⃣ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
βž• Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
πŸ›οΈ Apache 2.0 licensed: fully commercially permissible

Try it out here: nomic-ai/modernbert-embed-base

Very nice work by Zach Nussbaum and colleagues at Nomic AI.
merveΒ 
posted an update 11 days ago
view post
Post
4716
supercharge your LLM apps with smolagents πŸ”₯

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents
merveΒ 
posted an update 18 days ago
merveΒ 
posted an update 25 days ago
view post
Post
2787
Aya by Cohere For AI can now see! πŸ‘€

C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! πŸ—£οΈ

The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo

Dataset maya-multimodal/pretrain

Model maya-multimodal/maya πŸ‘
kudos @nahidalam and team
  • 1 reply
Β·
clemΒ 
posted an update 25 days ago
view post
Post
1800
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots πŸ€–πŸ¦ΎπŸ¦Ώ

https://t.co/enkFXjWndJ
  • 1 reply
Β·
merveΒ 
posted an update 25 days ago
view post
Post
3297
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧢

✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled πŸ“ˆ scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful πŸ”₯
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models πŸ”₯
Β·
merveΒ 
posted an update about 1 month ago
view post
Post
1766
A complete RAG pipeline includes a reranker, which ranks the documents to find the best document πŸ““
Same goes for multimodal RAG, multimodal rerankers which we can integrate to multimodal RAG pipelines!
Learn how to build a complete multimodal RAG pipeline with vidore/colqwen2-v1.0 as retriever, lightonai/MonoQwen2-VL-v0.1 as reranker, Qwen/Qwen2-VL-7B-Instruct as VLM in this notebook that runs on a GPU as small as L4 πŸ”₯ https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_reranker_and_vlms
christopherΒ 
posted an update about 1 month ago
view post
Post
1599
The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot
Β·
merveΒ 
posted an update about 1 month ago
view post
Post
5598
This week in open-source AI was insane 🀠 A small recapπŸ•ΊπŸ» merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal πŸ–ΌοΈ
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants πŸ‘
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs πŸ’¬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license πŸ”₯
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! πŸ”₯ nearly 8TB pretraining data in many languages!

Image/Video Generation πŸ–ΌοΈ
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio πŸ”Š
> Indic-Parler-TTS is a new text2speech model made by community
merveΒ 
posted an update about 1 month ago
view post
Post
1552
New InternVL drop with a state-of-the-art 78B vision language model with MIT license πŸ”₯ https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c
The release comes with seven new vision LMs based on InternViT 300M/6B and Qwen2.5 (0.5B, 3B, 32B, 72B) and InternLM2 (8B, 7B, 20B) in different sizes
78B model is of InternViT 6B and Qwen2.5-72B Instruct, can accomplish variety of tasks πŸ‘ Try here OpenGVLab/InternVL
christopherΒ 
posted an update about 1 month ago
BramVanroyΒ 
posted an update about 1 month ago
view post
Post
506
In the spirit of "Better late than never", I've finally written a brief overview paper for GEITje 7B Ultra. Initially released 10 months ago (oops), but still reaching around 1300 monthly downloads across the HF ecosystem (not including ollama).

GEITje 7B Ultra: A Conversational Model for Dutch (2412.04092)

While the paper discusses the model a little bit, I especially wanted to write about the datasets, which to this day seem an important asset for Dutch LLM training (SFT and preference tuning). We have a long way to go for Dutch, but publishing transparent and reproducible artefacts seems an important step to me, alongside having open discussions about data, bias, architectures.

In that spirit, thanks are in order for the creation of GEITje 7B Ultra and all related datasets:

- Michiel Buisman and UWV for providing the means to create the datasets
- Flemish Supercomputer Center (VSC) for the compute
- The Hugging Face Fellows and rest of the team for their discussions and insights
- The Dutch NLP community, notably @Rijgersberg for building the base GEITje model and the fruitful discussions we've had

More to come, step by step!

BramVanroy/geitje-7b-ultra-65c1ee010ad80fd1f6a8f208