tinybiggames commited on
Commit
08b8fd9
1 Parent(s): 60149f7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -47
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
 
2
  language:
3
  - en
4
  license: mit
 
 
5
  tags:
6
  - nlp
7
  - code
8
  - llama-cpp
9
  - gguf-my-repo
10
- - LMEngine
11
- license_link: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
12
- pipeline_tag: text-generation
13
  widget:
14
  - messages:
15
  - role: user
@@ -19,59 +19,43 @@ widget:
19
  # tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF
20
  This model was converted to GGUF format from [`microsoft/Phi-3-mini-128k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
21
  Refer to the [original model card](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) for more details on the model.
22
- ## Use with tinyBigGAMES's [Inference](https://github.com/tinyBigGAMES) Libraries.
23
 
 
 
24
 
25
- How to configure LMEngine:
 
26
 
27
- ```Delphi
28
- InitConfig(
29
- 'C:/LLM/gguf', // path to model files
30
- -1 // number of GPU layer, -1 to use all available layers
31
- );
32
  ```
 
33
 
34
- How to define model:
 
 
 
35
 
36
- ```Delphi
37
- DefineModel('phi-3-mini-128k-instruct.Q4_K_M.gguf',
38
- 'phi-3-mini-128k-instruct.Q4_K_M', 8000,
39
- '<|{role}|>{content}<|end|>',
40
- '<|assistant|>');
41
  ```
42
 
43
- How to add a message:
44
 
45
- ```Delphi
46
- AddMessage(
47
- ROLE_USER, // role
48
- 'What is AI?' // content
49
- );
50
  ```
51
 
52
- `{role}` - will be substituted with the message "role"
53
- `{content}` - will be substituted with the message "content"
54
-
55
- How to do inference:
56
 
57
- ```Delphi
58
- var
59
- LTokenOutputSpeed: Single;
60
- LInputTokens: Int32;
61
- LOutputTokens: Int32;
62
- LTotalTokens: Int32;
63
-
64
- if RunInference('phi-3-mini-128k-instruct.Q4_K_M', 1024) then
65
- begin
66
- GetInferenceStats(nil, @LTokenOutputSpeed, @LInputTokens, @LOutputTokens,
67
- @LTotalTokens);
68
- PrintLn('', FG_WHITE);
69
- PrintLn('Tokens :: Input: %d, Output: %d, Total: %d, Speed: %3.1f t/s',
70
- FG_BRIGHTYELLOW, LInputTokens, LOutputTokens, LTotalTokens, LTokenOutputSpeed);
71
- end
72
- else
73
- begin
74
- PrintLn('', FG_WHITE);
75
- PrintLn('Error: %s', FG_RED, GetError());
76
- end;
77
- ```
 
1
  ---
2
+ base_model: microsoft/Phi-3-mini-128k-instruct
3
  language:
4
  - en
5
  license: mit
6
+ license_link: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
7
+ pipeline_tag: text-generation
8
  tags:
9
  - nlp
10
  - code
11
  - llama-cpp
12
  - gguf-my-repo
 
 
 
13
  widget:
14
  - messages:
15
  - role: user
 
19
  # tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF
20
  This model was converted to GGUF format from [`microsoft/Phi-3-mini-128k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
21
  Refer to the [original model card](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) for more details on the model.
 
22
 
23
+ ## Use with llama.cpp
24
+ Install llama.cpp through brew (works on Mac and Linux)
25
 
26
+ ```bash
27
+ brew install llama.cpp
28
 
 
 
 
 
 
29
  ```
30
+ Invoke the llama.cpp server or the CLI.
31
 
32
+ ### CLI:
33
+ ```bash
34
+ llama-cli --hf-repo tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-128k-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
35
+ ```
36
 
37
+ ### Server:
38
+ ```bash
39
+ llama-server --hf-repo tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-128k-instruct-q4_k_m.gguf -c 2048
 
 
40
  ```
41
 
42
+ Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
43
 
44
+ Step 1: Clone llama.cpp from GitHub.
45
+ ```
46
+ git clone https://github.com/ggerganov/llama.cpp
 
 
47
  ```
48
 
49
+ Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
50
+ ```
51
+ cd llama.cpp && LLAMA_CURL=1 make
52
+ ```
53
 
54
+ Step 3: Run inference through the main binary.
55
+ ```
56
+ ./llama-cli --hf-repo tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-128k-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
57
+ ```
58
+ or
59
+ ```
60
+ ./llama-server --hf-repo tinybiggames/Phi-3-mini-128k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-128k-instruct-q4_k_m.gguf -c 2048
61
+ ```