yixinsong commited on
Commit
003b904
·
verified ·
1 Parent(s): 92c4144

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -3
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ## Introduction
5
+
6
+ SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
7
+ co-developed by the **IPADS and School of AI at Shanghai Jiao Tong University** and **Zenergize AI**.
8
+ Designed from the ground up for resource-constrained environments,
9
+ SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
10
+ without relying on the cloud.
11
+
12
+ ## Performance
13
+
14
+
15
+ For the MMLU evaluation, we use a 0-shot CoT setting.
16
+
17
+ ## Model Card
18
+
19
+ <div align="center">
20
+
21
+ | **Architecture** | Mixture-of-Experts (MoE) |
22
+ |:---:|:---:|
23
+ | **Total Parameters** | 21B |
24
+ | **Activated Parameters** | 3B |
25
+ | **Number of Layers** | 52 |
26
+ | **Attention Hidden Dimension** | 2560 |
27
+ | **MoE Hidden Dimension** (per Expert) | 768 |
28
+ | **Number of Attention Heads** | 28 |
29
+ | **Number of KV Heads** | 4 |
30
+ | **Number of Experts** | 64 |
31
+ | **Selected Experts per Token** | 6 |
32
+ | **Vocabulary Size** | 151,936 |
33
+ | **Context Length** | 16K |
34
+ | **Attention Mechanism** | GQA |
35
+ | **Activation Function** | ReGLU |
36
+ </div>
37
+
38
+ ## How to Run
39
+
40
+ ### Transformers
41
+
42
+ The latest version of `transformers` is recommended or `transformers>=4.53.3` is required.
43
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ import torch
48
+
49
+ path = "PowerInfer/SmallThinker-21BA3B-Instruct"
50
+ device = "cuda"
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
53
+ model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
54
+
55
+ messages = [
56
+ {"role": "user", "content": "Give me a short introduction to large language model."},
57
+ ]
58
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
59
+
60
+ model_outputs = model.generate(
61
+ model_inputs,
62
+ do_sample=True,
63
+ max_new_tokens=1024
64
+ )
65
+
66
+ output_token_ids = [
67
+ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
68
+ ]
69
+
70
+ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
71
+ print(responses)
72
+
73
+ ```
74
+
75
+ ### ModelScope
76
+
77
+ `ModelScope` adopts Python API similar to (though not entirely identical to) `Transformers`. For basic usage, simply modify the first line of the above code as follows:
78
+
79
+ ```python
80
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
81
+ ```