This model is a finetuned on the LLama model finetuned on the en<>yo MENYO-20k data. The new Llama-2 tokenizer is used.
The wandb logs can be found here: , including 1 epoch of training on bidirectional data.
This model is a finetuned on the LLama model finetuned on the en<>yo MENYO-20k data. The new Llama-2 tokenizer is used.
The wandb logs can be found here: , including 1 epoch of training on bidirectional data.