File size: 3,418 Bytes
865ceb8
 
 
 
 
 
 
 
 
 
8297720
 
 
865ceb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed9fe0c
53a016c
 
 
865ceb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
datasets:
- TigerResearch/pretrain_zh
language:
- zh
base_model:
- Qwen/Qwen2.5-3B
tags:
- qwen2.5
- text-generation-inference
- Text Generation
- Character
---

**Qwen2.5-3B-Character**

**Introduction:**

**Qwen2.5-3B-Character** is the Character version of [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model.  It is developed based on the [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model. It is specifically designed for character-to-character transformation and generation tasks. 

**Core Contributions:**

1. **Modified Token Vocabulary:** The original model's token vocabulary has been revised to remove tokens representing phrases and multiple characters. This refinement enhances the model's focus on individual character processing.

2. **Continued Pre-training:** Based on the modified vocabulary, the model has undergone further pre-training to optimize its performance and adaptability for character-level tasks.
   

**Training Dataset:**

The model has been trained using the `TigerResearch/pretrain_zh` dataset, a comprehensive Chinese pre-training dataset provided by **TigerResearch**. For more information about the dataset, please visit: [TigerResearch/pretrain_zh](https://huggingface.co/datasets/TigerResearch/pretrain_zh).


**Training Code:**

The training process for this model was facilitated by the **LLaMA-Factory**, an open-source project that provides tools and frameworks for training language models. The LLaMa-factory codebase is available at: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).


**Results**

To assess the efficacy of the Qwen2.5-3B-Character, we evaluated its performance on three widely utilized benchmarks: C-Evel, CMMLU, and MMLU. The results are tabulated as follows:

| Model                             | ceval| cmmlu| mmlu| 
| :---                              | :---: | :---: | :---: | 
| Qwen2.5-3B                        | 74.37| 74.94| 65.87 |
| Qwen2.5-3B-filter 		        | 70.43| 69.69| 65.53 |
| Qwen2.5-3B-Character     	        | 71.97| 71.94| 65.18 |

In the table, to discern the model performance more distinctly, we have presented the test results for both the original Qwen2.5-3B (Qwen2.5-3B) and the token-modified Qwen2.5-3B (Qwen2.5-3B-filter).


**Quickstart**

The latest version of transformers is recommended (at least 4.37.0). Here we show a code snippet to show you how to use the chat model with transformers:

```shell
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name = 'Henry94/Qwen2.5-3B-Character'

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")


prompt = "请简单介绍一下大型语言模型."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
```