Unexpectedly Large Memory Usage of ibm-fms/llama3-8b-accelerator in vLLM

#4
by baizhuoyan - opened

When I use mlpSpeculator in vllm, I noticed that ibm-fms/llama-13b-accelerator has only about 1.5665 GB, while ibm-fms/llama3-8b-accelerator has 4.4649 GB, which seems quite large for an 8B model. Why is this the case?

Sign up or log in to comment