Steffen Röcker's picture

Steffen Röcker PRO

sroecker

·

https://x.com/sroecker

AI & ML interests

Local models

Recent Activity

upvoted a collection about 2 hours ago

reranking series v2

upvoted a collection about 2 hours ago

updated a model 3 days ago

sroecker/Qwen2.5-0.5B-Instruct-FP8-Dynamic

View all activity

Organizations

sroecker's activity

upvoted 2 collections about 2 hours ago

reranking series v2

V2 crispy rerank series • 2 items • Updated about 21 hours ago • 8

DeepHermes

Preview models of hybrid reasoner Hermes series • 6 items • Updated about 6 hours ago • 13

upvoted an article 4 days ago

Article

Introducing EuroBERT: A High-Performance Multilingual Encoder Model

By

and 3 others •

4 days ago

• 115

upvoted a collection 8 days ago

Q-Filters

Pre-computed Q-Filters for efficient KV cache compression. • 15 items • Updated 10 days ago • 6

upvoted a collection 13 days ago

Granite 3.2 Language Models

3 items • Updated 15 days ago • 14

upvoted a collection 14 days ago

DeepSeek-R1-Distill Quantized

18 items • Updated Feb 7 • 12

upvoted a collection 16 days ago

olmOCR

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 3 items • Updated about 7 hours ago • 92

upvoted a collection 19 days ago

SigLIP 2

OpenCLIP and timm SigLIP 2 models • 45 items • Updated 20 days ago • 11

upvoted a collection 22 days ago

ModernGLiClass

GLiClass with ModernBERT backbone • 4 items • Updated 6 days ago • 8

upvoted an article 28 days ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

Feb 4

• 61

upvoted a paper about 1 month ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4 • 18

upvoted a collection about 1 month ago

EuroLLM

4 items • Updated 20 days ago • 30

upvoted a paper about 1 month ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108

upvoted an article about 1 month ago

Article

Replicating DeepSeek R1 for Information Extraction

By

•

Jan 31

• 38

upvoted a collection about 1 month ago

R1 Multilingual

5 items • Updated Jan 31 • 10

upvoted a paper about 1 month ago

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Paper • 2501.18511 • Published Jan 30 • 19

upvoted a collection about 1 month ago

Tulu 3 Models

All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated about 7 hours ago • 93

upvoted an article about 1 month ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 803

upvoted 2 collections about 2 months ago

Quantized DeepSeek R1 Distill

3 items • Updated Jan 22 • 3

DeepSeek-R1-abliterated

7 items • Updated Jan 31 • 93