--- license: apache-2.0 language: - de - fr - en - ro base_model: - google/flan-t5-xxl library_name: llama.cpp tags: - llama.cpp --- # flan-t5-xxl-gguf ## This is a quantized version of [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl/) ![Google Original Model Architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg) ## Usage/Examples ```sh ./llama-cli -m /path/to/file.gguf --prompt "your prompt" --n-gpu-layers nn ``` nn --> numbers of layers to offload to gpu ## Quants BITs | TYPE | --------|------------- | Q2 | Q2_K | Q3 | Q3_K, Q3_K_L, Q3_K_M, Q3_K_S | Q4 | Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S | Q5 | Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S | Q6 | Q6_K | Q8 | Q8_0 | #### Additional: BITs | TYPE/float | --------|------------- | 16 | f16 | 32 | f32 | ## Disclaimer I don't claim any rights on this model. All rights go to google. ## Acknowledgements - [Original model](https://huggingface.co/google/flan-t5-xxl/) - [Original README](https://huggingface.co/google/flan-t5-xxl/blob/main/README.md) - [Original license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)