README.md · ModelCloud/Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-v1 at main

Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-v1 / README.md

Qubitium

Update README.md

d5c6a3e verified about 22 hours ago

preview code

raw

history blame contribute delete

1.71 kB

	---
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-Coder-32B
	pipeline_tag: text-generation
	tags:
	- gptqmodel
	- modelcloud
	- code
	- codeqwen
	- chat
	- qwen
	- qwen-coder
	- instruct
	- int4
	- gptq
	- 4bit
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/641c13e7999935676ec7bc03/-n_0DiARmihJh8GH96YaX.png)

	This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).

	- bits: 4
	- dynamic: null
	- group_size: 32
	- desc_act: true
	- static_groups: false
	- sym: true
	- lm_head: false
	- true_sequential: true
	- quant_method: "gptq"
	- checkpoint_format: "gptq"
	- meta：
	- quantizer: gptqmodel:1.2.1
	- uri: https://github.com/modelcloud/gptqmodel
	- damp_percent: 0.1
	- damp_auto_increment: 0.0015


	## Example:
	```python
	from transformers import AutoTokenizer
	from gptqmodel import GPTQModel

	model_name = "ModelCloud/Qwen2.5-Coder-32B-Instruct-gptqmodel-4bit-vortex-v1"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = GPTQModel.load(model_name)

	messages = [
	{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
	{"role": "user", "content": "How can I design a data structure in C++ to store the top 5 largest integer numbers?"},
	]
	input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")

	outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512)
	result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)

	print(result)
	```