metadata
license: cc-by-nc-4.0
language:
- en
pipeline_tag: conversational
tags:
- fair-use
- llama2
- ggml
Llama2 Movie Character Finetuned 7F Quantized
Quantized by llama.cpp (https://github.com/ggerganov/llama.cpp) Either use by llama.cpp or llama cpp python wrappers
ctransformers example
pip install ctransformers
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained("llama2-7f-fp16-ggml-q4.bin",
model_type='llama',
gpu_layers=100, # you can use less (like 20) if you have less gpu ram
max_new_tokens=50,
stop=["###","##"],
threads=4) # 4 cpu threads to limit cpu