|
--- |
|
base_model: abacusai/Giraffe-v2-13b-32k |
|
inference: false |
|
language: |
|
- en |
|
library_name: transformers |
|
license: llama2 |
|
model_creator: Abacus.AI |
|
model_name: Giraffe v2 7B 32K GGUF |
|
model_type: llama2 |
|
quantized_by: claysauruswrecks |
|
--- |
|
|
|
# Giraffe v2 7B 32K - GGUF |
|
|
|
- Model creator: [Abacus.AI](https://huggingface.co/abacusai/) |
|
- Original model: [Giraffe v2 7B 32K](https://huggingface.co/abacusai/Giraffe-v2-13b-32k) |
|
|
|
<!-- description start --> |
|
## Description |
|
|
|
This repo contains GGUF format model files for [Abacus.AI's Giraffe v2 7B 32K](https://huggingface.co/abacusai/Giraffe-v2-13b-32k) |
|
|
|
These files were quantized on an Intel i9-9980HK with 32GB RAM. |
|
|
|
<!-- description end --> |
|
<!-- README_GGUF.md-about-gguf start --> |
|
### About GGUF |
|
|
|
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. |
|
|
|
Here is an incomplete list of clients and libraries that are known to support GGUF: |
|
|
|
- [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option. |
|
- [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration. |
|
- [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling. |
|
- [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. |
|
- [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection. |
|
- [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration. |
|
- [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. |
|
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. |
|
- [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use. |
|
<!-- README_GGUF.md-about-gguf end --> |
|
|
|
<!-- compatibility_gguf start --> |
|
## Compatibility |
|
|
|
These quantized GGUFv2 files are compatible with llama.cpp from November 1st, 2023 onwards, as of commit [c43c2da](https://github.com/ggerganov/llama.cpp/commit/c43c2da8afacaddfe51c09b21dbd9922cd0ea46b) |
|
|
|
They are also compatible with many third party UIs and libraries - please see the list at the top of this README. |
|
<!-- compatibility_gguf end --> |
|
|
|
## Usage |
|
|
|
I have only tested these in `text-generation-webui` with `ctx=2048` and `compress_pos_emb` from `4` through `8`. |
|
|
|
TODO: Make sure the longer context is actually functional. |
|
|
|
<!-- README_GGUF.md-provided-files start --> |
|
## Provided files |
|
|
|
| Name | Quant method | Bits | Size | Use case | |
|
| ---- | ---- | ---- | ---- | ---- | |
|
| [giraffe-v2-13b-32k.Q4_K_M.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q4_K_M.gguf) | Q4_K_M | 4 | 7.4 GB| medium, balanced quality - recommended | |
|
| [giraffe-v2-13b-32k.Q5_K_M.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q5_K_M.gguf) | Q5_K_M | 5 | 8.6 GB| large, very low quality loss - recommended | |
|
| [giraffe-v2-13b-32k.Q8_0.gguf](https://huggingface.co/claysauruswrecks/Giraffe-v2-13b-32k-GGUF/blob/main/giraffe-v2-13b-32k.Q8_0.gguf) | Q8_0 | 8 | 13.0 GB | very large, extremely low quality loss - not recommended unless as a treat | |
|
|
|
<!-- README_GGUF.md-provided-files end --> |
|
|
|
<!-- footer start --> |
|
Thanks to [TheBloke](https://huggingface.co/TheBloke) for the README.md template. |
|
<!-- footer end --> |
|
|
|
<!-- original-model-card start --> |
|
## Original model card: Abacus.AI's Giraffe v2 13B 7B 32K |
|
|
|
### Model Card: Giraffe-v2-13b-32k |
|
|
|
#### Model Details |
|
|
|
#### Model Description |
|
|
|
We have followed up on our previous training runs related to extending the context length |
|
of Llama models. The associated github repository |
|
|
|
https://github.com/abacusai/long-context |
|
|
|
has some basic details on our approach and metrics. We have also published a paper on arXiv |
|
that covers our experiments and analysis a lot more comprehensively. |
|
|
|
http://arxiv.org/abs/2308.10882 |
|
|
|
- **Developed by:** [Abacus.AI](https://abacus.ai) |
|
- **Model type:** Transformer based autoregressive causal language model |
|
- **License:** Llama 2 Community License: https://github.com/facebookresearch/llama/blob/main/LICENSE |
|
- **Finetuned from model:** Llama V2 13B |
|
|
|
### Usage |
|
|
|
To use this model at longer lengths the model needs to be patched to interpolate the longer context |
|
lengths. It will not work if it is simply loaded with the `AutoModel` framework of `transformers`. |
|
For full details and usage see: |
|
|
|
https://github.com/abacusai/Long-Context |
|
|
|
The evaluation section has detailed code for how to load and patch the model for inference (or further fine-tuning). |
|
Note in particular the `max_position_embeddings` is not relevant since the patched module dynamically reallocates |
|
the position buffers as required. |
|
|
|
The tokenizer corresponding to this model is https://huggingface.co/abacusai/Giraffe-v1-Tokenizer. |
|
|
|
Using the code in the repository you can load this model with the following code: |
|
```python |
|
from models import load_model, load_tokenizer |
|
tokenizer = load_tokenizer() |
|
model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8) |
|
``` |
|
<!-- original-model-card end --> |
|
|