5 108 118

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

liked a Space about 9 hours ago

amd/gpt-oss-120b-chatbot

upvoted an article 1 day ago

Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training

upvoted an article 1 day ago

SmolLM - blazingly fast and remarkably powerful

View all activity

Organizations

liked a Space about 9 hours ago

215

GPT-OSS-120B on AMD MI300X

💻

gpt-oss-120b model running on AMD MI300 infrastructure.

upvoted 3 articles 1 day ago

Article

Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training

and 4 others •

10 days ago

• 45

Article

SmolLM - blazingly fast and remarkably powerful

and 2 others •

Jul 16, 2024

• 410

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

and 2 others •

Mar 20, 2024

• 100

liked a model 1 day ago

Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated 10 days ago • 58.7k • • 1.1k

upvoted a collection 2 days ago

Awesome SFT datasets

Collection

A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 139

liked a model 4 days ago

HuggingFaceTB/SmolLM3-3B

Text Generation • 3B • Updated 3 days ago • 727k • • 662

upvoted an article 4 days ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 626

upvoted an article 5 days ago

Article

The Missing Semester of AI for Organizations #1: LLM Security

•

11 days ago

• 8

reacted to danielhanchen's post with ❤️ 11 days ago

Post

4194

Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! 🔥🦥
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF

Model will run on 14GB RAM for 20b and 66GB for 120b.

2 replies

upvoted an article 11 days ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

and 11 others •

13 days ago

• 459

upvoted a collection 12 days ago

gpt-oss

Collection

Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 11 days ago • 292

liked 3 models 12 days ago

liked 4 models 17 days ago

Qwen/Qwen3-Coder-30B-A3B-Instruct

Text Generation • 31B • Updated 10 days ago • 196k • • 471

tencent/HunyuanWorld-1

Image-to-3D • Updated 17 days ago • 17.9k • 552

zai-org/GLM-4.5

Text Generation • 358B • Updated 6 days ago • 23.9k • • 1.21k

HuggingFaceTB/SmolLM-135M-Instruct

Text Generation • 0.1B • Updated Sep 4, 2024 • 11.8k • 120

reacted to AdinaY's post with 🔥 17 days ago

Post

3519

Qwen3-30B-A3B-Thinking-2507 🔥 latest step in scaling thinking capabilities from Alibaba Qwen team.

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

✨ 30B total / 3B active - Apache 2.0
✨ Native 256K context
✨ SOTA coding, alignment, agentic reasoning

Shyam Sunder Kumar

AI & ML interests

Recent Activity

Organizations

theainerd's activity

GPT-OSS-120B on AMD MI300X

Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training

SmolLM - blazingly fast and remarkably powerful

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

SmolLM3: smol, multilingual, long-context reasoner

The Missing Semester of AI for Organizations #1: LLM Security

Welcome GPT OSS, the new open-source model family from OpenAI!