Marlin kernel in vLLM - new checkpoint?
#10 opened 8 months ago
by
zoltan-fedor

Based on llama-2?
1
#9 opened 9 months ago
by
rdewolff

[AUTOMATED] Model Memory Requirements
#8 opened 10 months ago
by
muellerzr

How to setup the generation_config properly?
#7 opened 10 months ago
by
KIlian42
The inference API is too slow.
1
#6 opened 11 months ago
by
YernazarBis
How did you create AWQ-quantized weights?
4
#5 opened 11 months ago
by
nightdude

encountered error when loading model
7
#4 opened 11 months ago
by
zhouzr