granite-3.2-8b-instruct-Q6_K-GGUF / README.md

Update README.md

4891c69 verified 25 days ago

7.56 kB

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	tags:
	- language
	- granite-3.2
	- llama-cpp
	- gguf-my-repo
	base_model: ibm-granite/granite-3.2-8b-instruct
	---

	# Triangle104/granite-3.2-8b-instruct-Q6_K-GGUF
	This model was converted to GGUF format from [`ibm-granite/granite-3.2-8b-instruct`](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) for more details on the model.

	---
	Model Summary:
	-
	Granite-3.2-8B-Instruct is an 8-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Built on top of Granite-3.1-8B-Instruct,
	it has been trained using a mix of permissively licensed open-source
	datasets and internally generated synthetic data designed for reasoning
	tasks. The model allows controllability of its thinking capability,
	ensuring it is applied only when required.

	Developers: Granite Team, IBM
	Website: Granite Docs
	Release Date: February 26th, 2025
	License: Apache 2.0

	Supported Languages:
	-
	English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech,
	Italian, Korean, Dutch, and Chinese. However, users may finetune this
	Granite model for languages beyond these 12 languages.

	Intended Use:
	-
	This model is designed to handle general instruction-following tasks and
	can be integrated into AI assistants across various domains, including
	business applications.

	Capabilities
	-

	Thinking
	Summarization
	Text classification
	Text extraction
	Question-answering
	Retrieval Augmented Generation (RAG)
	Code related tasks
	Function-calling tasks
	Multilingual dialog use cases
	Long-context tasks including long document/meeting summarization, long document QA, etc.


	Generation:
	-
	This is a simple example of how to use Granite-3.2-8B-Instruct model.


	Install the following libraries:
	-

	pip install torch torchvision torchaudio
	pip install accelerate
	pip install transformers



	Then, copy the snippet from the section that is relevant for your use case.


	from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
	import torch

	model_path="ibm-granite/granite-3.2-8b-instruct"
	device="cuda"
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	device_map=device,
	torch_dtype=torch.bfloat16,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	model_path
	)

	conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]

	input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)

	set_seed(42)
	output = model.generate(
	**input_ids,
	max_new_tokens=8192,
	)

	prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
	print(prediction)



	Example Outputs
	-

	thinking=True


	Here is my thought process:

	1. We know we start with 10 liters of a 30% acid solution, so there's 3 liters of pure acid in it (10 liters * 0.3).
	2. We need to determine how much of the 70% acid solution to add such that the final mixture is 50% acid.
	3. Let's say x liters of the 70% acid solution are added. This will contribute 0.7x liters of pure acid.
	4. The total volume of the new mixture will be 10 + x liters, and the total amount of pure acid will be 3 + 0.7x liters.
	5. We need to set up an equation where (3 + 0.7x) / (10 + x) = 0.5 (since we want a 50% acid mixture).
	6. Solving this equation for x will give us the volume of 70% solution needed.

	Here is my response:

	To solve this, we need to set up an equation based on the concentration of acid before and after mixing.

	1. Initial Acid Content:
	- We start with 10 liters of a 30% acid solution.
	- Pure acid in this = 10 * 0.3 = 3 liters.

	2. Variable for Addition:
	- Let x be the liters of 70% acid solution to be added.
	- Pure acid in x liters of 70% solution = 0.7x liters.

	3. Final Mixture Concentration:
	- Total volume of mixture = 10 + x liters.
	- Total pure acid in the mixture = 3 (from initial solution) + 0.7x (from added solution).

	4. Setting up the equation for 50% acid concentration:
	- We require that the ratio of pure acid to total volume is 0.5 (50%).
	- Therefore, the equation is:

	\[
	\frac{3 + 0.7x}{10 + x} = 0.5
	\]

	5. Solving the equation:

	Multiply both sides by (10 + x) to clear the fraction:

	\[
	3 + 0.7x = 0.5(10 + x)
	\]

	Expanding the right side:

	\[
	3 + 0.7x = 5 + 0.5x
	\]

	Subtract 0.5x from both sides:

	\[
	3 + 0.2x = 5
	\]

	Subtract 3 from both sides:

	\[
	0.2x = 2
	\]

	Divide by 0.2:

	\[
	x = 10
	\]

	So, 10 liters of the 70% acid solution must be added to the initial 10 liters of 30% solution to get a 50% acid mixture.

	Let's verify:

	- Initial acid: 3 liters
	- Acid from added solution: 0.7 * 10 = 7 liters
	- Total acid: 3 + 7 = 10 liters
	- Total volume: 10 + 10 = 20 liters
	- Acid percentage: \(\frac{10}{20} = 0.5 = 50\%\)

	This confirms our solution.

	Final answer: 10 liters of 70% acid solution should be added.



	thinking=False


	Let's denote the amount of 70% acid solution to be added as x liters.

	We start with 10 liters of a 30% acid solution, which means there are 0.3 * 10 = 3 liters of pure acid in it.

	We want to end up with a 50% acid solution after adding x liters of 70% acid solution. The total volume of the mixture will be 10 + x liters, and we want half of this volume to be pure acid.

	So, the amount of pure acid in the final mixture should be 0.5 * (10 + x) liters.

	The amount of pure acid in the final mixture comes from two sources: the original 30% solution and the added 70% solution. So, we can set up the following equation:

	3 (from the original solution) + 0.7x (from the added solution) = 0.5 * (10 + x)

	Now, let's solve for x:

	3 + 0.7x = 5 + 0.5x
	0.7x - 0.5x = 5 - 3
	0.2x = 2
	x = 2 / 0.2
	x = 10

	So, you need to add 10 liters of a 70% acid solution to the 10 liters of a 30% acid solution to get a 50% acid mixture.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/granite-3.2-8b-instruct-Q6_K-GGUF --hf-file granite-3.2-8b-instruct-q6_k.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/granite-3.2-8b-instruct-Q6_K-GGUF --hf-file granite-3.2-8b-instruct-q6_k.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/granite-3.2-8b-instruct-Q6_K-GGUF --hf-file granite-3.2-8b-instruct-q6_k.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/granite-3.2-8b-instruct-Q6_K-GGUF --hf-file granite-3.2-8b-instruct-q6_k.gguf -c 2048
	```