YokaiKoibito/llama2_70b_chat_uncensored-fp16

This is an fp16 copy of jarradh/llama2_70b_chat_uncensored for faster downloading and less disk space usage than the fp32 original. I simply imported the model to CPU with torch_dtype=torch.float16 and then exported it again. I also added a chat_template entry derived from the model card to the tokenizer_config.json file, which previously didn't have one. All credit for the model goes to jarradh.

Arguable a better name for this model would be something like Llama-2-70B_Wizard-Vicuna-Uncensored-fp16, but to avoid confusion I'm sticking with jarradh's naming scheme.

Repositories available

GPTQ models for GPU inference, with multiple quantisation parameter options.
2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference
2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference, plus fp16 GGUF for requantizing
Jarrad Hope's unquantised model in fp16 pytorch format, for GPU inference and further conversions
Jarrad Hope's original unquantised fp32 model in pytorch format, for further conversions

Prompt template: Human-Response

### HUMAN:
{prompt}

### RESPONSE:

YokaiKoibito
/

llama2_70b_chat_uncensored-fp16

Repositories available

Prompt template: Human-Response

Dataset used to train YokaiKoibito/llama2_70b_chat_uncensored-fp16