File size: 843 Bytes

---
license: cc-by-nc-4.0
language:
- en
pipeline_tag: conversational
tags:
- fair-use
- llama2
- ggml
---

# Llama2 Movie Character Finetuned 7F Quantized 

Quantized by llama.cpp  (https://github.com/ggerganov/llama.cpp)
Either use by llama.cpp or llama cpp python wrappers

ctransformers example

pip install ctransformers


```
from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("llama2-7f-fp16-ggml-q4.bin", 
                                           model_type='llama',
                                           gpu_layers=100,  # you can use less (like 20) if you have less gpu ram
                                           max_new_tokens=50, 
                                           stop=["###","##"],
                                           threads=4)  # 4 cpu threads to limit cpu
```