GigaChat-20B-A3B

#553
by nicoboss - opened

https://huggingface.co/ai-sage/GigaChat-20B-A3B-base
https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct

I completely missed that support for the GigaChat archidecture was implemented for and merged into llama.cpp.

GigaChat is a russian MoE model with 20 billion parameters for which 3 are active at a time. This is quite cool as this means it fits into the memory of a laptop and thanks to MoE even performing at reasonable performance. It supports up to 131.000 Token context. Let's hope they don't exclusively used Russian datasets to train this but given that they used English benchmarks to test it I'm quite sure it's multilingual in which case it would make for quite a special model as I would imagine their training data differs significantly from English-focused or Chinese-focused models we've seen so far.

blindly queued as well :)

mradermacher changed discussion status to closed

Sign up or log in to comment