Swallow v0.2 as a Possible Improvement Over Youko
#2
by
Casual-Autopsy
- opened
tokyotech-llm/Llama-3.1-Swallow-8B-v0.2
According to their evals for the pretrain version, both their en-ja and ja-en WMT20 scores are both better than Youko's pretrain, as well as their avg. scores for both English and Japanese task.
Found it while looking for models to merge, so I thought it might be useful as a base for what you're a trying to achieve.
Also according to the research paper created by the org within the model card, it boasts about their findings in creating better quality Japanese text corpus.
I was actually looking for another base model to try, so thanks! I was considering using Qwen2.5 7B, but Swallow-8B-v0.2 seems to have a better WMT-20 score. I guess I'll give it a spin and see how it performs on the VNTL leaderboard.