|
--- |
|
license: other |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- transformers |
|
- gguf |
|
- imatrix |
|
- granite-3.1-8b-instruct |
|
--- |
|
Quantizations of https://huggingface.co/ibm-granite/granite-3.1-8b-instruct |
|
|
|
### Inference Clients/UIs |
|
* [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
* [KoboldCPP](https://github.com/LostRuins/koboldcpp) |
|
* [ollama](https://github.com/ollama/ollama) |
|
* [jan](https://github.com/janhq/jan) |
|
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui) |
|
* [GPT4All](https://github.com/nomic-ai/gpt4all) |
|
--- |
|
|
|
# From original readme |
|
|
|
Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. |
|
|
|
- **Developers:** Granite Team, IBM |
|
- **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models) |
|
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/) |
|
- **Paper:** [Granite 3.1 Language Models (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d) |
|
- **Release Date**: December 18th, 2024 |
|
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
**Supported Languages:** |
|
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. |
|
|
|
**Intended Use:** |
|
The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. |
|
|
|
*Capabilities* |
|
* Summarization |
|
* Text classification |
|
* Text extraction |
|
* Question-answering |
|
* Retrieval Augmented Generation (RAG) |
|
* Code related tasks |
|
* Function-calling tasks |
|
* Multilingual dialog use cases |
|
* Long-context tasks including long document/meeting summarization, long document QA, etc. |
|
|
|
**Generation:** |
|
This is a simple example of how to use Granite-3.1-8B-Instruct model. |
|
|
|
Install the following libraries: |
|
|
|
```shell |
|
pip install torch torchvision torchaudio |
|
pip install accelerate |
|
pip install transformers |
|
``` |
|
Then, copy the snippet from the section that is relevant for your use case. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
device = "auto" |
|
model_path = "ibm-granite/granite-3.1-8b-instruct" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
# drop device_map if running on CPU |
|
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) |
|
model.eval() |
|
# change input text as desired |
|
chat = [ |
|
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." }, |
|
] |
|
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) |
|
# tokenize the text |
|
input_tokens = tokenizer(chat, return_tensors="pt").to(device) |
|
# generate output tokens |
|
output = model.generate(**input_tokens, |
|
max_new_tokens=100) |
|
# decode output tokens into text |
|
output = tokenizer.batch_decode(output) |
|
# print output |
|
print(output) |
|
``` |