24bean commited on
Commit
6c22071
1 Parent(s): 145c5aa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ language:
4
+ - ko
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - ' llama'
8
+ - facebook
9
+ - ' meta'
10
+ - llama-2
11
+ - kollama
12
+ - llama-2-ko
13
+ - llama-2-ko-chat
14
+ - text-generation-inference
15
+ ---
16
+
17
+ # 💻MAC os Compatible💻
18
+
19
+ # Llama 2 ko 7B - GGUF
20
+ - Model creator: [Meta](https://huggingface.co/meta-llama)
21
+ - Original model: [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat)
22
+ - Original Llama-2-Ko-Chat model: [Llama 2 ko 7B Chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat)
23
+ - Reference: [Llama 2 7B GGUF](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
24
+
25
+ <!-- description start -->
26
+ ## Download
27
+ ```shell
28
+ pip3 install huggingface-hub>=0.17.1
29
+ ```
30
+
31
+ Then you can download any individual model file to the current directory, at high speed, with a command like this:
32
+
33
+ ```shell
34
+ huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat-q8-0.gguf --local-dir . --local-dir-use-symlinks False
35
+ ```
36
+
37
+ Or you can download llama-2-ko-7b.gguf, non-quantized model by
38
+
39
+ ```shell
40
+ huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat.gguf --local-dir . --local-dir-use-symlinks False
41
+ ```
42
+
43
+ ## Example `llama.cpp` command
44
+
45
+ Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
46
+
47
+ ```shell
48
+ ./main -ngl 32 -m llama-2-ko-7b-chat-q8-0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
49
+ ```
50
+
51
+ # How to run from Python code
52
+
53
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
54
+
55
+ ## How to load this model from Python using ctransformers
56
+
57
+ ### First install the package
58
+
59
+ ```bash
60
+ # Base ctransformers with no GPU acceleration
61
+ pip install ctransformers>=0.2.24
62
+ # Or with CUDA GPU acceleration
63
+ pip install ctransformers[cuda]>=0.2.24
64
+ # Or with ROCm GPU acceleration
65
+ CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
66
+ # Or with Metal GPU acceleration for macOS systems
67
+ CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
68
+ ```
69
+
70
+ ### Simple example code to load one of these GGUF models
71
+
72
+ ```python
73
+ from ctransformers import AutoModelForCausalLM
74
+
75
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
76
+ llm = AutoModelForCausalLM.from_pretrained("24bean/Llama-2-ko-7B-Chat-GGUF", model_file="llama-2-7b-chat-q8-0.gguf", model_type="llama", gpu_layers=50)
77
+
78
+ print(llm("인공지능은"))
79
+ ```
80
+
81
+ ## How to use with LangChain
82
+
83
+ Here's guides on using llama-cpp-python or ctransformers with LangChain:
84
+
85
+ * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
86
+ * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
87
+
88
+ <!-- README_GGUF.md-how-to-run end -->