Dataset

#1
by titan087 - opened

Hey,

What dataset did you use to finetune this model? I was looking for one to finetune codellama 34b and havent found one that looked good.

Thanks!

Same here. So I chose a benchmark dataset.
https://huggingface.co/datasets/codeparrot/xlcost-text-to-code

The JavaScript subsection has about 10K rows. I felt that to be good enough for a fine-tune. Let me know your thoughts as well.

Its worth a shot, for a basic test I can try training either Gemma or Llama3, or potentially Phi-3, at least to start with. If it works well enough, than scale it up to one of the coding based 34b models.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment