Deci
/

DeciLM-7B-instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

DeciLM-7B-instruct-GGUF / README.md

avideci's picture

Update README.md

5b748ad about 1 year ago

|

2.35 kB

	---
	license: apache-2.0
	---

	# DeciLM-7b-instruct GGUF checkpoints
	This repository includes DeciLM-7b-instruct checkpoints in the GGUF format.<br>
	DeciLM performs well on commodity CPUs using the llama.cpp codebase.

	## 1. Clone and build llama.cpp (1 minute)
	```
	git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
	```

	## 2. Download the GGUF checkpoint
	- Go to "Files"
	- Click on "decilm-7b-uniform-gqa-q8_0.gguf"
	- Click on the "Download" button

	## 3. Generate outputs
	- Feed the chat template to DeciLM-7b-instruct quantized to INT8.
	```text
	./main -m ~/Downloads/decilm-7b-uniform-gqa-q8_0.gguf -p """
	### System:
	You are an AI assistant that follows instruction extremely well. Help as much as you can.
	### User:
	How do I make the most delicious pancakes the world has ever tasted?
	### Assistant:
	"""
	```

	- Running on MacBook M2 Pro 32gb::
	```
	### System:
	You are an AI assistant that follows instruction extremely well. Help as much as you can.
	### User:
	How do I make the most delicious pancakes the world has ever tasted?
	### Assistant:
	To make the most delicious pancakes the world ever tasted, follow these steps:

	1. In a mixing bowl, combine 2 cups of all-purpose flour, 4 tablespoons of sugar, and 3 teaspoon of baking powder with 1/2 teaspoon salt, mix well.
	2. Make a hole in the center and pour in 4 eggs and 1 cup of milk, whisk well mix it until smooth. Add 3 table spoon of oil and a tables of melted butter.
	3. Heat your frying pan with little bit butter or oil and ladle batter onto the pan, spread it with 1/2 inch width. Wait for small bubbles to form in the surface and flip over to brown other side until golden.
	4. Enjoy your delicious pancakes [end of text]

	llama_print_timings: load time = 343.16 ms
	llama_print_timings: sample time = 14.69 ms / 172 runs ( 0.09 ms per token, 11712.63 tokens per second)
	llama_print_timings: prompt eval time = 239.48 ms / 52 tokens ( 4.61 ms per token, 217.14 tokens per second)
	llama_print_timings: eval time = 7767.20 ms / 171 runs ( 45.42 ms per token, 22.02 tokens per second)
	llama_print_timings: total time = 8045.89 ms
	ggml_metal_free: deallocating
	Log end
	```