---
base_model: abacusai/Giraffe-v2-13b-32k
inference: false
language:
- en
library_name: transformers
license: llama2
model_creator: Abacus.AI
model_name: Giraffe v2 7B 32K GGUF
model_type: llama2
quantized_by: claysauruswrecks
---

# Giraffe v2 7B 32K - GGUF

- Model creator: [Abacus.AI](https://huggingface.co/abacusai/)
- Original model: [Giraffe v2 7B 32K](https://huggingface.co/abacusai/Giraffe-v2-13b-32k)

<!-- description start -->
## Description

This repo contains GGUF format model files for [Abacus.AI's Giraffe v2 7B 32K](https://huggingface.co/abacusai/Giraffe-v2-13b-32k)

These files were quantized on an Intel i9-9980HK with 32GB RAM.

<!-- description end -->
<!-- README_GGUF.md-about-gguf start -->
### About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

Here is an incomplete list of clients and libraries that are known to support GGUF:

- [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
- [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
- [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
- [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
- [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection.
- [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
- [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
- [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
<!-- README_GGUF.md-about-gguf end -->

<!-- compatibility_gguf start -->
## Compatibility

These quantized GGUFv2 files are compatible with llama.cpp from November 1st, 2023 onwards, as of commit [c43c2da](https://github.com/ggerganov/llama.cpp/commit/c43c2da8afacaddfe51c09b21dbd9922cd0ea46b)

They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
<!-- compatibility_gguf end -->

## Usage

I have only tested these in `text-generation-webui` with `ctx=2048` and `compress_pos_emb` from `4` through `8`.

TODO: Make sure the longer context is actually functional.

<!-- README_GGUF.md-provided-files start -->
## Provided files

| Name | Quant method | Bits | Size | Use case |
| ---- | ---- | ---- | ---- | ---- |
| [giraffe-v2-13b-32k.Q4_K_M.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q4_K_M.gguf) | Q4_K_M | 4 | 7.4 GB| medium, balanced quality - recommended |
| [giraffe-v2-13b-32k.Q5_K_M.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q5_K_M.gguf) | Q5_K_M | 5 | 8.6 GB| large, very low quality loss - recommended |
| [giraffe-v2-13b-32k.Q8_0.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q8_0.gguf) | Q8_0 | 8 | 13.0 GB | very large, extremely low quality loss - not recommended unless as a treat |

<!-- README_GGUF.md-provided-files end -->

<!-- footer start -->
Thanks to [TheBloke](https://huggingface.co/TheBloke) for the README.md template.
<!-- footer end -->

<!-- original-model-card start -->
## Original model card: Abacus.AI's Giraffe v2 13B 7B 32K

### Model Card: Giraffe-v2-13b-32k

#### Model Details

#### Model Description

We have followed up on our previous training runs related to extending the context length
of Llama models. The associated github repository

https://github.com/abacusai/long-context

has some basic details on our approach and metrics. We have also published a paper on arXiv
that covers our experiments and analysis a lot more comprehensively.

http://arxiv.org/abs/2308.10882

- **Developed by:** [Abacus.AI](https://abacus.ai)
- **Model type:** Transformer based autoregressive causal language model
- **License:** Llama 2 Community License: https://github.com/facebookresearch/llama/blob/main/LICENSE
- **Finetuned from model:** Llama V2 13B

### Usage

To use this model at longer lengths the model needs to be patched to interpolate the longer context
lengths. It will not work if it is simply loaded with the `AutoModel` framework of `transformers`.
For full details and usage see:

https://github.com/abacusai/Long-Context

The evaluation section has detailed code for how to load and patch the model for inference (or further fine-tuning).
Note in particular the `max_position_embeddings` is not relevant since the patched module dynamically reallocates
the position buffers as required.

The tokenizer corresponding to this model is https://huggingface.co/abacusai/Giraffe-v1-Tokenizer.

Using the code in the repository you can load this model with the following code:
```python
from models import load_model, load_tokenizer
tokenizer = load_tokenizer()
model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8)
```
<!-- original-model-card end -->