ymcki
/

gemma-2-2b-jpn-it-GGUF

@@ -2,6 +2,8 @@
 base_model: google/gemma-2-2b-jpn-it
 language:
 - multilingual
 library_name: transformers
 license: gemma
 license_link: https://ai.google.dev/gemma/terms
@@ -28,10 +30,77 @@ Run them in [LM Studio](https://lmstudio.ai/)
 ## Download a file (not the whole branch) from below:
-| Filename | Quant type | File Size | Split | Description |
-| -------- | ---------- | --------- | ----- | ----------- |
-| [gemma-2-2b-jpn-it-f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUFP/blob/main/gemma-2-2b-jpn-it-f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
-| [gemma-2-2b-jpn-it-Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUFP/blob/main/gemma-2-2b-jpn-it-Q8_0.gguf) | Q8_0 | 2.78GB | false | Extremely high quality, *recommended*. |
 ## Downloading using huggingface-cli
@@ -50,3 +119,5 @@ huggingface-cli download ymcki/gemma-2-2b-jpn-it-GGUF --include "gemma-2-2b-jpn-
 ## Credits
 Thank you bartowski for providing a README.md to get me started.

 base_model: google/gemma-2-2b-jpn-it
 language:
 - multilingual
+datasets:
+  - TFMC/imatrix-dataset-for-japanese-llm
 library_name: transformers
 license: gemma
 license_link: https://ai.google.dev/gemma/terms
 ## Download a file (not the whole branch) from below:
+| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
+| -------- | ---------- | --------- | ----- | --------------- | ----------- |
+| [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
+| [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | Extremely high quality, *recommended*. |
+| [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 2.78GB | false | Good quality, *recommended for edge device <8GB RAM*. |
+| [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 2.78GB | false | Good quality, *recommended for edge device <8GB RAM*. |
+| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 2.78GB | false | Good quality, *recommended for edge device <8GB RAM*. |
+| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 2.78GB | false | Good quality, *recommended for edge device <8GB RAM*. |
+| [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 2.78GB | false | Poor quality, *not recommended*. |
+| [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 2.78GB | false | Poor quality, *not recommended*. |
+| [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 2.78GB | false | Poor quality, *not recommended*. |
+| [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 2.78GB | false | Poor quality, *not recommended*. |
+## How to check i8mm and sve support for ARM devices
+ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.
+ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
+For ARM devices without both, it is recommended to use Q4_0_4_4.
+For Apple devices,
+```
+sysctl hw
+```
+For ARM devices (ie most Android devices),
+```
+cat /proc/cpuinfo
+```
+There are also android apps that can display /proc/cpuinfo.
+## Which Q4_0 model to use for ARM devices
+| Brand | Series | Model | i8mm | sve | Quant Type |
+| ----- | ------ | ----- | ---- | --- | -----------|
+| Qualcomm｜Snapdragon | >= 7 Gen 1 | Yes | Yes | Q4_0_8_8 |
+| Qualcomm｜Snapdragon | others | No | No | Q4_0_4_4 |
+| Apple | M | M1 | No | No | Q4_0_4_4 |
+| Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
+| Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
+| Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
+## Convert safetensors to f16 gguf
+Make sure you have llama.cpp git cloned:
+```
+python3 convert_hf_to_gguf.py gemma-2-2b-jpn-it/ --outfile gemma-2-2b-jpn-it.f16.gguf --outtype f16
+```
+## Convert f16 gguf to Q8_0 gguf without imatrix
+Make sure you have llama.cpp compiled:
+```
+./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
+```
+## Convert f16 gguf to other gguf with imatrix
+First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
+```
+./llama-imatrix -m gemma-2-2b-jpn-it.f16.gguf -f c4_en_ja_imatrix.txt -o gemma-2-2b-jpn-it.imatrix --chunks 32
+```
+Then, convert f16 gguf with imatrix to create imatrix gguf
+```
+./llama-quantize --imatrix gemma-2-2b-jpn-it.imatrix gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf q4_0_8_8
+```
 ## Downloading using huggingface-cli
 ## Credits
 Thank you bartowski for providing a README.md to get me started.
+Thank you YoutechA320U for the ELYZA-tasks-100 auto evaluation tool.