Blog, Articles, and discussions

KV Cache from scratch in nanoVLM

By June 4, 2025 • 34

Community Articles

view all

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

and 1 other •

2 days ago

• 56

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

and 1 other •

3 days ago

• 22

System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

•

3 days ago

• 9

Daily Robotics June #1 - SmolVLA discovery and thoughts

•

1 day ago

• 9

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

•

about 19 hours ago

• 9

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 147

Bigger isn't always better: how to choose the most efficient model for context-specific tasks 🌱🧑🏼‍💻

•

8 days ago

• 18

🌙 Introducing Moon: Storytelling Generator Model

and 1 other •

6 days ago

• 6

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

•

Jul 29, 2024

• 328

Common AI Model Formats

•

Feb 27

• 42

PipelineRL

and 3 others •

Apr 25

• 26

Code a simple RAG from scratch

•

Oct 29, 2024

• 85

Decoding Strategies in Large Language Models

•

Oct 29, 2024

• 66

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

Jan 30

• 72

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

•

Apr 16

• 17

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

By December 6, 2023 guest • 10

Open LLM Leaderboard: DROP deep dive

By December 1, 2023 • 8

Introducing Prodigy-HF: a direct integration with Hugging Face

By November 7, 2023 • 1

Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama 2, and Mistral for Disaster Tweets Analysis with Lora

By November 7, 2023 • 11

Personal Copilot: Train Your Own Coding Assistant

By October 27, 2023 • 60

Chat Templates: An End to the Silent Performance Killer

By October 3, 2023 • 22

Non-engineers guide: Train a LLaMA 2 chatbot

By September 28, 2023 • 6

Optimizing your LLM in production

By September 15, 2023 • 18

Fine-tuning Llama 2 70B using PyTorch FSDP

By September 13, 2023 • 24

Spread Your Wings: Falcon 180B is here

By September 6, 2023 • 7

Code Llama: Llama 2 learns to code

By August 25, 2023 • 9

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

By August 22, 2023 • 33

Fine-tune Llama 2 with DPO

By August 8, 2023 • 54

Llama 2 is here - get it on Hugging Face

By July 18, 2023 • 28

Community Articles

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

and 1 other •

2 days ago

• 56

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

and 1 other •

3 days ago

• 22

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 603

Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes

and 2 others •

about 17 hours ago

• 13

Interactive Tools for machine learning, deep learning, and math

•

10 days ago

• 40

AI Policy @🤗: Response to the 2025 National AI R&D Strategic Plan

and 2 others •

3 days ago

• 12

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

•

Mar 17

• 281

System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

•

3 days ago

• 9

Daily Robotics June #1 - SmolVLA discovery and thoughts

•

1 day ago

• 9

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

•

about 19 hours ago

• 9

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 147

Bigger isn't always better: how to choose the most efficient model for context-specific tasks 🌱🧑🏼‍💻

•

8 days ago

• 18

🌙 Introducing Moon: Storytelling Generator Model

and 1 other •

6 days ago

• 6

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

•

Jul 29, 2024

• 328

Common AI Model Formats

•

Feb 27

• 42

PipelineRL

and 3 others •

Apr 25

• 26

Code a simple RAG from scratch

•

Oct 29, 2024

• 85

Decoding Strategies in Large Language Models

•

Oct 29, 2024

• 66

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

Jan 30

• 72

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

•

Apr 16

• 17

View all

Blog, Articles, and discussions

KV Cache from scratch in nanoVLM

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

*Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings

Uncensor any LLM with abliteration

Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes

Interactive Tools for machine learning, deep learning, and math

AI Policy @🤗: Response to the 2025 National AI R&D Strategic Plan

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

Daily Robotics June #1 - SmolVLA discovery and thoughts

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Bigger isn't always better: how to choose the most efficient model for context-specific tasks 🌱🧑🏼‍💻

🌙 Introducing **Moon**: Storytelling Generator Model

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

Common AI Model Formats

PipelineRL

Code a simple RAG from scratch

Decoding Strategies in Large Language Models

KV Caching Explained: Optimizing Transformer Inference Efficiency

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

Open LLM Leaderboard: DROP deep dive

Introducing Prodigy-HF: a direct integration with Hugging Face

Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama 2, and Mistral for Disaster Tweets Analysis with Lora

Personal Copilot: Train Your Own Coding Assistant

Chat Templates: An End to the Silent Performance Killer

Non-engineers guide: Train a LLaMA 2 chatbot

Optimizing your LLM in production

Fine-tuning Llama 2 70B using PyTorch FSDP

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Fine-tune Llama 2 with DPO

Llama 2 is here - get it on Hugging Face

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

*Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings

Uncensor any LLM with abliteration

Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes

Interactive Tools for machine learning, deep learning, and math

AI Policy @🤗: Response to the 2025 National AI R&D Strategic Plan

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

Daily Robotics June #1 - SmolVLA discovery and thoughts

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Bigger isn't always better: how to choose the most efficient model for context-specific tasks 🌱🧑🏼‍💻

🌙 Introducing **Moon**: Storytelling Generator Model

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

Common AI Model Formats

PipelineRL

Code a simple RAG from scratch

Decoding Strategies in Large Language Models

KV Caching Explained: Optimizing Transformer Inference Efficiency

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

🌙 Introducing Moon: Storytelling Generator Model

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

🌙 Introducing Moon: Storytelling Generator Model