metadata

license: cc-by-nc-4.0
language:
  - en
pipeline_tag: conversational
tags:
  - fair-use
  - llama2
  - ggml

Llama2 Movie Character Finetuned 7F Quantized

Quantized by llama.cpp (https://github.com/ggerganov/llama.cpp) Either use by llama.cpp or llama cpp python wrappers

ctransformers example

pip install ctransformers

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("llama2-7f-fp16-ggml-q4.bin", 
                                           model_type='llama',
                                           gpu_layers=100,  # you can use less (like 20) if you have less gpu ram
                                           max_new_tokens=50, 
                                           stop=["###","##"],
                                           threads=4)  # 4 cpu threads to limit cpu