Improve model card: Add `library_name`, explicit paper and code links (#3)
Browse files- Improve model card: Add `library_name`, explicit paper and code links (6526c2455757fa91b7aae2c0aaa2783d627be85e)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,9 +1,16 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
5 |
pipeline_tag: text-generation
|
|
|
6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
## Introduction
|
8 |
|
9 |
<p align="center">
|
@@ -42,7 +49,7 @@ All models are evaluated in non-thinking mode.
|
|
42 |
| Qwen3 0.6B | 0.6 | 148.56 | 94.91 | 45.93 | 15.29 | 27.44 | 13.32 | 9.76 |
|
43 |
| Qwen3 1.7B | 1.3 | 62.24 | 41.00 | 20.29 | 6.09 | 11.08 | 6.35 | 4.15 |
|
44 |
| Qwen3 1.7B+limited memory | limit 1G | 2.66 | 1.09 | 1.00 | 0.47 | - | - | 0.11 |
|
45 |
-
| Gemma3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 3.80 | 6.66 | 3.
|
46 |
|
47 |
Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
|
48 |
|
@@ -115,5 +122,4 @@ from modelscope import AutoModelForCausalLM, AutoTokenizer
|
|
115 |
## Statement
|
116 |
- Due to the constraints of its model size and the limitations of its training data, its responses may contain factual inaccuracies, biases, or outdated information.
|
117 |
- Users bear full responsibility for independently evaluating and verifying the accuracy and appropriateness of all generated content.
|
118 |
-
- SmallThinker does not possess genuine comprehension or consciousness and cannot express personal opinions or value judgments.
|
119 |
-
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: apache-2.0
|
5 |
pipeline_tag: text-generation
|
6 |
+
library_name: transformers
|
7 |
---
|
8 |
+
|
9 |
+
# SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
|
10 |
+
|
11 |
+
**Paper**: [SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment](https://huggingface.co/papers/2507.20984)
|
12 |
+
**Code**: [https://github.com/SJTU-IPADS/SmallThinker](https://github.com/SJTU-IPADS/SmallThinker)
|
13 |
+
|
14 |
## Introduction
|
15 |
|
16 |
<p align="center">
|
|
|
49 |
| Qwen3 0.6B | 0.6 | 148.56 | 94.91 | 45.93 | 15.29 | 27.44 | 13.32 | 9.76 |
|
50 |
| Qwen3 1.7B | 1.3 | 62.24 | 41.00 | 20.29 | 6.09 | 11.08 | 6.35 | 4.15 |
|
51 |
| Qwen3 1.7B+limited memory | limit 1G | 2.66 | 1.09 | 1.00 | 0.47 | - | - | 0.11 |
|
52 |
+
| Gemma3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 3.80 | 6.66 | 3.80 | 2.45 |
|
53 |
|
54 |
Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
|
55 |
|
|
|
122 |
## Statement
|
123 |
- Due to the constraints of its model size and the limitations of its training data, its responses may contain factual inaccuracies, biases, or outdated information.
|
124 |
- Users bear full responsibility for independently evaluating and verifying the accuracy and appropriateness of all generated content.
|
125 |
+
- SmallThinker does not possess genuine comprehension or consciousness and cannot express personal opinions or value judgments.
|
|