SalmanFaroz commited on
Commit
f22afea
1 Parent(s): 23442d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -2
README.md CHANGED
@@ -209,8 +209,96 @@ extra_gated_button_content: Submit
209
  ---
210
 
211
  # SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF
212
- This model was converted to GGUF format from [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
213
- Refer to the [original model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for more details on the model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
 
215
  ## Use with llama.cpp
216
  Install llama.cpp through brew (works on Mac and Linux)
 
209
  ---
210
 
211
  # SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF
212
+
213
+
214
+ ### Install the package
215
+
216
+ Run one of the following commands, according to your system:
217
+
218
+ ```shell
219
+ # Base ctransformers with no GPU acceleration
220
+ pip install llama-cpp-python
221
+ # With NVidia CUDA acceleration
222
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
223
+ # Or with OpenBLAS acceleration
224
+ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
225
+ # Or with CLBLast acceleration
226
+ CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
227
+ # Or with AMD ROCm GPU acceleration (Linux only)
228
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
229
+ # Or with Metal GPU acceleration for macOS systems only
230
+ CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
231
+
232
+ # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
233
+ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
234
+ pip install llama-cpp-python
235
+ ```
236
+
237
+ ## For Inference
238
+
239
+ ## Download
240
+ ```
241
+ from huggingface_hub import hf_hub_download
242
+
243
+ REPO_ID = "SalmanFaroz/Llama-3.2-1B-Instruct-Q3_K_M-GGUF"
244
+ FILENAME = "llama-3.2-3b-instruct-q3_k_m.gguf"
245
+
246
+ hf_hub_download(repo_id=REPO_ID, filename=FILENAME,local_dir="./")
247
+ ```
248
+
249
+ ### Code:
250
+
251
+ ```
252
+ from llama_cpp import Llama
253
+
254
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
255
+ llm = Llama(
256
+ model_path="./llama-3.2-3b-instruct-q3_k_m.gguf", # Download the model file first
257
+ n_ctx=4096,
258
+ n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
259
+ n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
260
+ )
261
+
262
+ prompt = """
263
+ <|start_header_id|>system<|end_header_id|>
264
+ You are an expert in composing functions. You are given a question and a set of possible functions.
265
+ Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
266
+ If none of the functions can be used, point it out. If the given question lacks the parameters required by the function,also point it out. You should only return the function call in tools call sections.
267
+ If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]
268
+ You SHOULD NOT include any other text in the response.
269
+ Here is a list of functions in JSON format that you can invoke.[
270
+ {
271
+ "name": "get_user_info",
272
+ "description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
273
+ "parameters": {
274
+ "type": "dict",
275
+ "required": [
276
+ "user_id"
277
+ ],
278
+ "properties": {
279
+ "user_id": {
280
+ "type": "integer",
281
+ "description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
282
+ },
283
+ "special": {
284
+ "type": "string",
285
+ "description": "Any special information or parameters that need to be considered while fetching user details.",
286
+ "default": "none"
287
+ }
288
+ }
289
+ }
290
+ }
291
+ ]
292
+ <|eot_id|><|start_header_id|>user<|end_header_id|>
293
+ Can you retrieve the details for the user with the ID 7890, who has black as their special request?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
294
+ """
295
+
296
+ output = llm(prompt,
297
+ max_tokens=100,
298
+ temperature=0.001
299
+ )
300
+ ```
301
+
302
 
303
  ## Use with llama.cpp
304
  Install llama.cpp through brew (works on Mac and Linux)