Downscaling the `Q_q` and `W_k` matrices for repeated layers in franken-merges
14
#4 opened 11 months ago
by
jukofyork

Guidance on GPU VRAM Split?
5
#3 opened about 1 year ago
by
nmitchko
Performance
13
#2 opened about 1 year ago
by
KnutJaegersberg
