Text Generation
GGUF
PyTorch
English
instruct
finance
stock market
candlesticks
FinGPT
option trading
future stock prediction
trends prediction
Enterprise LLM
Enterprise
Enterprise ready
Banks
Wealth Management
quantized
GGUF
quantization
imat
imatrix
static
32bit
16bit
8bit
6bit
5bit
4bit
3bit
2bit
1bit
conversational
base_model: Writer/Palmyra-Fin-70B-32K | |
extra_gated_fields: | |
Email: text | |
I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox | |
Name: text | |
Organization or Affiliation: text | |
Receive email updates and promotions on Writer products, services, and research?: | |
options: | |
- 'Yes' | |
- 'No' | |
type: select | |
extra_gated_prompt: By clicking "Agree", you agree to the [License Agreement](https://writer.com/legal/open-model-license/) | |
and acknowledge Writer's [Privacy Policy](https://writer.com/legal/acceptable-use/). | |
inference: false | |
language: | |
- en | |
library_name: gguf | |
license: other | |
license_link: https://writer.com/legal/open-model-license/ | |
license_name: writer-open-model-license | |
model-index: | |
- name: Palmyra-Fin-70B-32k | |
results: [] | |
pipeline_tag: text-generation | |
quantized_by: legraphista | |
tags: | |
- instruct | |
- pytorch | |
- finance | |
- stock market | |
- candlesticks | |
- FinGPT | |
- option trading | |
- future stock prediction | |
- trends prediction | |
- Enterprise LLM | |
- Enterprise | |
- Enterprise ready | |
- Banks | |
- Wealth Management | |
- quantized | |
- GGUF | |
- quantization | |
- imat | |
- imatrix | |
- static | |
- 32bit | |
- 16bit | |
- 8bit | |
- 6bit | |
- 5bit | |
- 4bit | |
- 3bit | |
- 2bit | |
- 1bit | |
# Palmyra-Fin-70B-32K-IMat-GGUF | |
_Llama.cpp imatrix quantization of Writer/Palmyra-Fin-70B-32K_ | |
Original Model: [Writer/Palmyra-Fin-70B-32K](https://huggingface.co/Writer/Palmyra-Fin-70B-32K) | |
Original dtype: `BF16` (`bfloat16`) | |
Quantized by: llama.cpp [b3504](https://github.com/ggerganov/llama.cpp/releases/tag/b3504) | |
IMatrix dataset: [here](https://gist.githubusercontent.com/bartowski1182/eb213dccb3571f863da82e99418f81e8/raw/b2869d80f5c16fd7082594248e80144677736635/calibration_datav3.txt) | |
- [Files](#files) | |
- [IMatrix](#imatrix) | |
- [Common Quants](#common-quants) | |
- [All Quants](#all-quants) | |
- [Downloading using huggingface-cli](#downloading-using-huggingface-cli) | |
- [Inference](#inference) | |
- [Simple chat template](#simple-chat-template) | |
- [Chat template with system prompt](#chat-template-with-system-prompt) | |
- [Llama.cpp](#llama-cpp) | |
- [FAQ](#faq) | |
- [Why is the IMatrix not applied everywhere?](#why-is-the-imatrix-not-applied-everywhere) | |
- [How do I merge a split GGUF?](#how-do-i-merge-a-split-gguf) | |
--- | |
## Files | |
### IMatrix | |
Status: β Available | |
Link: [here](https://huggingface.co/legraphista/Palmyra-Fin-70B-32K-IMat-GGUF/blob/main/imatrix.dat) | |
### Common Quants | |
| Filename | Quant type | File Size | Status | Uses IMatrix | Is Split | | |
| -------- | ---------- | --------- | ------ | ------------ | -------- | | |
| [Palmyra-Fin-70B-32K.Q8_0/*](https://huggingface.co/legraphista/Palmyra-Fin-70B-32K-IMat-GGUF/tree/main/Palmyra-Fin-70B-32K.Q8_0) | Q8_0 | 74.98GB | β Available | βͺ Static | β Yes | |
| [Palmyra-Fin-70B-32K.Q6_K/*](https://huggingface.co/legraphista/Palmyra-Fin-70B-32K-IMat-GGUF/tree/main/Palmyra-Fin-70B-32K.Q6_K) | Q6_K | 57.89GB | β Available | βͺ Static | β Yes | |
| Palmyra-Fin-70B-32K.Q4_K | Q4_K | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q3_K | Q3_K | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q2_K | Q2_K | - | β³ Processing | π’ IMatrix | - | |
### All Quants | |
| Filename | Quant type | File Size | Status | Uses IMatrix | Is Split | | |
| -------- | ---------- | --------- | ------ | ------------ | -------- | | |
| Palmyra-Fin-70B-32K.F32 | F32 | - | β³ Processing | βͺ Static | - | |
| Palmyra-Fin-70B-32K.BF16 | BF16 | - | β³ Processing | βͺ Static | - | |
| Palmyra-Fin-70B-32K.FP16 | F16 | - | β³ Processing | βͺ Static | - | |
| [Palmyra-Fin-70B-32K.Q8_0/*](https://huggingface.co/legraphista/Palmyra-Fin-70B-32K-IMat-GGUF/tree/main/Palmyra-Fin-70B-32K.Q8_0) | Q8_0 | 74.98GB | β Available | βͺ Static | β Yes | |
| [Palmyra-Fin-70B-32K.Q6_K/*](https://huggingface.co/legraphista/Palmyra-Fin-70B-32K-IMat-GGUF/tree/main/Palmyra-Fin-70B-32K.Q6_K) | Q6_K | 57.89GB | β Available | βͺ Static | β Yes | |
| Palmyra-Fin-70B-32K.Q5_K | Q5_K | - | β³ Processing | βͺ Static | - | |
| Palmyra-Fin-70B-32K.Q5_K_S | Q5_K_S | - | β³ Processing | βͺ Static | - | |
| Palmyra-Fin-70B-32K.Q4_K | Q4_K | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q4_K_S | Q4_K_S | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ4_NL | IQ4_NL | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ4_XS | IQ4_XS | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q3_K | Q3_K | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q3_K_L | Q3_K_L | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q3_K_S | Q3_K_S | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ3_M | IQ3_M | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ3_S | IQ3_S | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ3_XS | IQ3_XS | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ3_XXS | IQ3_XXS | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q2_K | Q2_K | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.Q2_K_S | Q2_K_S | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ2_M | IQ2_M | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ2_S | IQ2_S | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ2_XS | IQ2_XS | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ2_XXS | IQ2_XXS | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ1_M | IQ1_M | - | β³ Processing | π’ IMatrix | - | |
| Palmyra-Fin-70B-32K.IQ1_S | IQ1_S | - | β³ Processing | π’ IMatrix | - | |
## Downloading using huggingface-cli | |
If you do not have hugginface-cli installed: | |
``` | |
pip install -U "huggingface_hub[cli]" | |
``` | |
Download the specific file you want: | |
``` | |
huggingface-cli download legraphista/Palmyra-Fin-70B-32K-IMat-GGUF --include "Palmyra-Fin-70B-32K.Q8_0.gguf" --local-dir ./ | |
``` | |
If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run: | |
``` | |
huggingface-cli download legraphista/Palmyra-Fin-70B-32K-IMat-GGUF --include "Palmyra-Fin-70B-32K.Q8_0/*" --local-dir ./ | |
# see FAQ for merging GGUF's | |
``` | |
--- | |
## Inference | |
### Simple chat template | |
``` | |
<|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
{assistant_response}<|eot_id|><|start_header_id|>user<|end_header_id|> | |
{next_user_prompt}<|eot_id|> | |
``` | |
### Chat template with system prompt | |
``` | |
<|begin_of_text|><|start_header_id|>system<|end_header_id|> | |
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> | |
{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
{assistant_response}<|eot_id|><|start_header_id|>user<|end_header_id|> | |
{next_user_prompt}<|eot_id|> | |
``` | |
### Llama.cpp | |
``` | |
llama.cpp/main -m Palmyra-Fin-70B-32K.Q8_0.gguf --color -i -p "prompt here (according to the chat template)" | |
``` | |
--- | |
## FAQ | |
### Why is the IMatrix not applied everywhere? | |
According to [this investigation](https://www.reddit.com/r/LocalLLaMA/comments/1993iro/ggufs_quants_can_punch_above_their_weights_now/), it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results). | |
### How do I merge a split GGUF? | |
1. Make sure you have `gguf-split` available | |
- To get hold of `gguf-split`, navigate to https://github.com/ggerganov/llama.cpp/releases | |
- Download the appropriate zip for your system from the latest release | |
- Unzip the archive and you should be able to find `gguf-split` | |
2. Locate your GGUF chunks folder (ex: `Palmyra-Fin-70B-32K.Q8_0`) | |
3. Run `gguf-split --merge Palmyra-Fin-70B-32K.Q8_0/Palmyra-Fin-70B-32K.Q8_0-00001-of-XXXXX.gguf Palmyra-Fin-70B-32K.Q8_0.gguf` | |
- Make sure to point `gguf-split` to the first chunk of the split. | |
--- | |
Got a suggestion? Ping me [@legraphista](https://x.com/legraphista)! |