Mishig Davaadorj's picture

Mishig Davaadorj

mishig

·

AI & ML interests

NP-completeness, grammars, universality

Recent Activity

upvoted an article 3 days ago

Train 400x faster Static Embedding Models with Sentence Transformers

upvoted an article 3 days ago

❤️ a love letter to the Open AI inference client

updated a Space 4 days ago

nanotron/ultrascale-playbook

View all activity

Organizations

mishig's activity

upvoted 2 articles 3 days ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15

• 153

Article

❤️ a love letter to the Open AI inference client

By

•

3 days ago

• 8

upvoted an article 7 days ago

Article

Remote VAEs for decoding with HF endpoints 🤗

8 days ago

• 30

upvoted a paper 11 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 12 days ago • 153

upvoted a paper 12 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 15 days ago • 138

upvoted a collection 19 days ago

SYNTHETIC-1

A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 11 days ago • 49

upvoted an article 19 days ago

Article

State of open video generation models in Diffusers

Jan 27

• 50

upvoted a paper 24 days ago

DynVFX: Augmenting Real Videos with Dynamic Content

Paper • 2502.03621 • Published 26 days ago • 28

upvoted a collection 25 days ago

Hibiki fr-en

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated 25 days ago • 50

upvoted a paper 25 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 27 days ago • 196

upvoted a paper 28 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published about 1 month ago • 108

upvoted 2 articles about 1 month ago

Article

Replicating DeepSeek R1 for Information Extraction

By

•

about 1 month ago

• 36

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

By

•

Jan 30

• 34

upvoted a paper about 1 month ago

Deep Learning Scaling is Predictable, Empirically

Paper • 1712.00409 • Published Dec 1, 2017 • 1

upvoted a collection about 2 months ago

Cosmos

The collection of Cosmos models • 31 items • Updated Jan 17 • 266

upvoted a collection 3 months ago

Hymba

A series of Hybrid Small Language Models. • 2 items • Updated Jan 17 • 28

upvoted an article 4 months ago

Article

Releasing the largest multilingual open pretraining dataset

By

and 2 others •

Nov 13, 2024

• 99

upvoted a paper 4 months ago

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 78

upvoted a collection 4 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 11 days ago • 242

upvoted an article 4 months ago

Article

VLM Art Analysis

By

•

Oct 4, 2024

• 11