File size: 843 Bytes
ea1e681
 
4a093b0
 
 
 
 
 
 
e35d8f9
 
 
 
 
 
 
 
 
 
 
 
e7a19a1
e35d8f9
 
 
 
 
 
 
 
e7a19a1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: cc-by-nc-4.0
language:
- en
pipeline_tag: conversational
tags:
- fair-use
- llama2
- ggml
---

# Llama2 Movie Character Finetuned 7F Quantized 

Quantized by llama.cpp  (https://github.com/ggerganov/llama.cpp)
Either use by llama.cpp or llama cpp python wrappers

ctransformers example

pip install ctransformers


```
from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("llama2-7f-fp16-ggml-q4.bin", 
                                           model_type='llama',
                                           gpu_layers=100,  # you can use less (like 20) if you have less gpu ram
                                           max_new_tokens=50, 
                                           stop=["###","##"],
                                           threads=4)  # 4 cpu threads to limit cpu
```