24bean
/

Llama-2-ko-7B-Chat-GGUF

Text Generation

llama-2-ko-chat

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

24bean commited on Dec 1, 2023

Commit

6c22071

•

1 Parent(s): 145c5aa

Create README.md

Files changed (1) hide show

README.md +88 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+license: llama2
+language:
+- ko
+pipeline_tag: text-generation
+tags:
+- ' llama'
+- facebook
+- ' meta'
+- llama-2
+- kollama
+- llama-2-ko
+- llama-2-ko-chat
+- text-generation-inference
+---
+# 💻MAC os Compatible💻
+# Llama 2 ko 7B - GGUF
+- Model creator: [Meta](https://huggingface.co/meta-llama)
+- Original model: [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat)
+- Original Llama-2-Ko-Chat model: [Llama 2 ko 7B Chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat)
+- Reference: [Llama 2 7B GGUF](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
+<!-- description start -->
+## Download
+```shell
+pip3 install huggingface-hub>=0.17.1
+```
+Then you can download any individual model file to the current directory, at high speed, with a command like this:
+```shell
+huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat-q8-0.gguf --local-dir . --local-dir-use-symlinks False
+```
+Or you can download llama-2-ko-7b.gguf, non-quantized model by
+```shell
+huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat.gguf --local-dir . --local-dir-use-symlinks False
+```
+## Example `llama.cpp` command
+Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
+```shell
+./main -ngl 32 -m llama-2-ko-7b-chat-q8-0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
+```
+# How to run from Python code
+You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
+## How to load this model from Python using ctransformers
+### First install the package
+```bash
+# Base ctransformers with no GPU acceleration
+pip install ctransformers>=0.2.24
+# Or with CUDA GPU acceleration
+pip install ctransformers[cuda]>=0.2.24
+# Or with ROCm GPU acceleration
+CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
+# Or with Metal GPU acceleration for macOS systems
+CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
+```
+### Simple example code to load one of these GGUF models
+```python
+from ctransformers import AutoModelForCausalLM
+# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
+llm = AutoModelForCausalLM.from_pretrained("24bean/Llama-2-ko-7B-Chat-GGUF", model_file="llama-2-7b-chat-q8-0.gguf", model_type="llama", gpu_layers=50)
+print(llm("인공지능은"))
+```
+## How to use with LangChain
+Here's guides on using llama-cpp-python or ctransformers with LangChain:
+* [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
+* [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
+<!-- README_GGUF.md-how-to-run end -->