Unexpectedly Large Memory Usage of ibm-fms/llama3-8b-accelerator in vLLM
#4
by
baizhuoyan
- opened
When I use mlpSpeculator in vllm, I noticed that ibm-fms/llama-13b-accelerator has only about 1.5665 GB, while ibm-fms/llama3-8b-accelerator has 4.4649 GB, which seems quite large for an 8B model. Why is this the case?