Why your 8B model faster than original LLama3-8B-Instruct?

#1
by HBD007 - opened

I understood that the reason for the faster inference of the 70B model is the Speculative Decoding mentioned in your blog post.
However, what is the reason why elyza/Llama-3-ELYZA-JP-8B is faster than the original LLama3-8B-Instruct?

Sign up or log in to comment