smol-explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

asnassar authored a paper 8 days ago

Optimized Table Tokenization for Table Structure Recognition

asnassar authored a paper 8 days ago

MolGrapher: Graph-based Visual Recognition of Chemical Structures

asnassar authored a paper 8 days ago

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

View all activity

smol-explorers's activity

asnassar

authored 9 papers 8 days ago

KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

Paper • 2405.00505 • Published May 1, 2024

Docling Technical Report

Paper • 2408.09869 • Published Aug 19, 2024

TableFormer: Table Structure Understanding with Transformers

Paper • 2203.01017 • Published Mar 2, 2022

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

Paper • 2501.17887 • Published Jan 27

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 75

MatteoOmenetti

authored a paper 9 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 75

mfarre

authored a paper 9 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 75

andito

authored a paper 9 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 75

thomwolf

posted an update 14 days ago

Post

2595

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

eliebak

posted an update 14 days ago

Post

1519

Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

1 reply

manu

authored 2 papers 16 days ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 19 days ago • 75

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 33

andito

posted an update 21 days ago

Post

2522

Extremely bullish on @CohereForAI 's Aya Vision (8B & 32B) - new SOTA open-weight VLMs

- 8B wins up to 81% of the time in its class, better than Gemini Flash
- 32B beats Llama 3.2 90B!
- Covers 23 languages, excels in image captioning, VQA & more
- Integrated on transformers from Day 0!

Efficient multimodal models are here to stay!!🔥
Check out their blog! https://huggingface.co/blog/aya-vision

hlarcher

authored a paper about 2 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 214

plaguss

authored a paper about 2 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 214

andito

authored a paper about 2 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 214

AI & ML interests

Recent Activity

Team members 24

smol-explorers's activity