update: after a lot of fun experiments, I'm doubtful there is a way for this method to really have a positive impact on the outcome without similar resources to those used to train llama in the first place. leaving it up for the sake of mad science, but moving on. not recommended for general use
Llama 2 Chronos 13b x Llama 1 Chronos 33b x Alpaca
This is a frankenllama model based on the technique in https://huggingface.co/chargoddard/llama2-22b
I built my base 22b model by using https://huggingface.co/Oniichat/llama2-base-chronos-13b-merge as a base, and https://huggingface.co/elinas/chronos-33b as a donor.
I then trained a qlora on the Alpaca dataset with the default peft configuration from https://github.com/facebookresearch/llama-recipes/blob/main/quickstart.ipynb
This is the result of baking in that adapter.
This configuration only targets q_proj
and v_proj
and uses r=8
. I was expecting to need to add more targets and increase r
to get significant improvements, but I was surprised by the quality of its context awareness, and I'm starting to think that maybe a 32mb lora is all it takes to get decent results in 22b.
I will keep playing with other peft configurations and see where that gets me next.
If anyone wants the chronos 22b base model (requires fine tuning) or the adapter, lmk in community discussions.
- Downloads last month
- 13