Swallow v0.2 as a Possible Improvement Over Youko

#2
by Casual-Autopsy - opened

tokyotech-llm/Llama-3.1-Swallow-8B-v0.2

According to their evals for the pretrain version, both their en-ja and ja-en WMT20 scores are both better than Youko's pretrain, as well as their avg. scores for both English and Japanese task.
Found it while looking for models to merge, so I thought it might be useful as a base for what you're a trying to achieve.

Also according to the research paper created by the org within the model card, it boasts about their findings in creating better quality Japanese text corpus.

I was actually looking for another base model to try, so thanks! I was considering using Qwen2.5 7B, but Swallow-8B-v0.2 seems to have a better WMT-20 score. I guess I'll give it a spin and see how it performs on the VNTL leaderboard.

Wow, it got a score close to Qwen2.5 14B, that's great.
image.png

Sign up or log in to comment