SalmanFaroz
/

Llama-3.2-1B-Instruct-Q3_K_M-GGUF

@@ -209,8 +209,96 @@ extra_gated_button_content: Submit
 ---
 # SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF
-This model was converted to GGUF format from [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
-Refer to the [original model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 ---
 # SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF
+### Install the package
+Run one of the following commands, according to your system:
+```shell
+# Base ctransformers with no GPU acceleration
+pip install llama-cpp-python
+# With NVidia CUDA acceleration
+CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
+# Or with OpenBLAS acceleration
+CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
+# Or with CLBLast acceleration
+CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
+# Or with AMD ROCm GPU acceleration (Linux only)
+CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
+# Or with Metal GPU acceleration for macOS systems only
+CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
+# In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
+$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
+pip install llama-cpp-python
+```
+## For Inference
+## Download
+```
+from huggingface_hub import hf_hub_download
+REPO_ID = "SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF"
+FILENAME = "llama-3.2-3b-instruct-q3_k_m.gguf"
+hf_hub_download(repo_id=REPO_ID, filename=FILENAME,local_dir="./")
+```
+### Code:
+```
+from llama_cpp import Llama
+# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
+llm = Llama(
+  model_path="./llama-3.2-3b-instruct-q3_k_m.gguf",  # Download the model file first
+  n_ctx=4096,
+  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
+  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
+)
+prompt = """
+<|start_header_id|>system<|end_header_id|>
+You are an expert in composing functions. You are given a question and a set of possible functions.
+Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
+If none of the functions can be used, point it out. If the given question lacks the parameters required by the function,also point it out. You should only return the function call in tools call sections.
+If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]
+You SHOULD NOT include any other text in the response.
+Here is a list of functions in JSON format that you can invoke.[
+    {
+        "name": "get_user_info",
+        "description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
+        "parameters": {
+            "type": "dict",
+            "required": [
+                "user_id"
+            ],
+            "properties": {
+                "user_id": {
+                "type": "integer",
+                "description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
+            },
+            "special": {
+                "type": "string",
+                "description": "Any special information or parameters that need to be considered while fetching user details.",
+                "default": "none"
+                }
+            }
+        }
+    }
+]
+<|eot_id|><|start_header_id|>user<|end_header_id|>
+Can you retrieve the details for the user with the ID 7890, who has black as their special request?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+output = llm(prompt,
+  max_tokens=100,
+  temperature=0.001
+)
+```
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)