Custom Vibe Coding Quest Part 1: The Quest Begins 🧙

Community Article Published March 26, 2025

I’m starting a new quest to build my own custom coding models. I’m inspired by the whole vibe coding thing, but I think it’s fundamentally limited by using one-size-fit-all models. With multiple distinct models, for your given vibe, you could generate code that suits its application.

In these blog posts, I’m going to experiment with and share my work on fine-tuning models. I’ll use open source tools and datasets, and where-ever possible I’ll rely on free, cheap, and local solutions.

If I get this right, my set-up will need to be adapted for you and your work, but that should be achievable if you follow along.

Here are some other resources to get you started:

  • We’re adding new chapters to the LLM course, which focus on fine-tuning LLMs
  • If you’ve not worked with transformers and Hugging Face ecosystem before, you should start the LLM course from scratch.

The Goal

If we take an IDE like Cursor or VSCode’s Co-pilot, we can see that they use models for a range of core tasks. For example:

  • Retrieve the relevant part of your file or code base
  • Generates code completions whilst typing
  • Responds to queries in a chat window
  • Applies changes to one or more code files based on instructions
  • Reasons about challenging problems
  • Uses tools like documentation retrieval to generate solutions

These tasks are likely using a set of models in multiple ways with relevant prompts, and we won’t be able to build a complete application. In reality, I want most of the prompting to be taken care of for me at first. So I can focus on fine-tuning. Fortunately, there are some cool open source tools like Continue.dev which do this.

Phew… I’m not building a new IDE.

The Development Environment

We’ll use a set of easy tools to get our IDE environment running locally: VScode, ZED, LMStudio, and Continue.dev. If you want a detailed guide on setting this up, go to blog post.

We’ll use local inference that runs on ‘my laptop’, mainly just because that makes it easy for me to test and demo. Along the way, I’ll share options in case you have more or less compute resources than me. That means we’ll be using a M3 mac with 64GB of RAM.

The Training Setup

We’ll rely entirely on post training techniques because they have the biggest impact for the budget and we’ll use libraries like Unsloth, TRL, and accelerate to keep things simple and efficient. If you want to read up on this material, check out this unit in the LLM course.

The Evaluation Setup

The aim here is to set up a usable custom coding environment and not squeeze decimals out of a benchmark, so we’re going to rely on coding problems, sharing recordings, and reviewing qualitative performance.

That said, we’ll need some benchmarks to guide our decision making when training models, so we’ll use HumanEval and Live Code Bench to compare models. I’ve chosen HumanEval because it’s fast and Live Code Bench because it robust and contamination free.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment