Unexpectedly Large Memory Usage of ibm-fms/llama3-8b-accelerator in vLLM
#4 opened about 10 hours ago
by
baizhuoyan
llama3.1 version
1
#3 opened 4 months ago
by
amgadhasan
ValueError: Unsupported model type mlp_speculator using TGI server
2
#2 opened 7 months ago
by
rishabh-simpplr
shard 0 never ready when given the speculator option?
7
#1 opened 8 months ago
by
mhill4980