mradermacher/model_requests · GigaChat-20B-A3B

15 days ago

•

https://huggingface.co/ai-sage/GigaChat-20B-A3B-base
https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct

I completely missed that support for the GigaChat archidecture was implemented for and merged into llama.cpp.

GigaChat is a russian MoE model with 20 billion parameters for which 3 are active at a time. This is quite cool as this means it fits into the memory of a laptop and thanks to MoE even performing at reasonable performance. It supports up to 131.000 Token context. Let's hope they don't exclusively used Russian datasets to train this but given that they used English benchmarks to test it I'm quite sure it's multilingual in which case it would make for quite a special model as I would imagine their training data differs significantly from English-focused or Chinese-focused models we've seen so far.

mradermacher

Owner 15 days ago

blindly queued as well :)

mradermacher changed discussion status to closed 15 days ago