israellaguan
/

Falcon3-10B-Instruct-1.58bit-i2_s

Model card Files Files and versions Community

israellaguan commited on Dec 19, 2024

Commit

4a2fdd1

·

verified ·

1 Parent(s): 0c0feb8

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -247,6 +247,7 @@ pip install -r requirements.txt
 // python setup_env.py --hf-repo your_hf_username/Falcon3-10B-Instruct-1.58bit -q i2_s // You can skip this one
 // move the model to the folder models, then
 python run_inference.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf -p "What is 1.58bit quantization in LLM and why its iteresting for gpu poor people?" -cnv
 ```
 ## Evaluation

 // python setup_env.py --hf-repo your_hf_username/Falcon3-10B-Instruct-1.58bit -q i2_s // You can skip this one
 // move the model to the folder models, then
 python run_inference.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf -p "What is 1.58bit quantization in LLM and why its iteresting for gpu poor people?" -cnv
+# 1.58-bit quantization is a method in which floating-point numbers in a neural network are represented using fewer bits, specifically 1.58 bits. This technique reduces the number of bits needed to represent a number, which can lead to improved performance and lower memory usage.
 ```
 ## Evaluation