Update README.md
Browse files
README.md
CHANGED
@@ -14,43 +14,21 @@ license_link: LICENSE
|
|
14 |
 <a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>  
|
15 |
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
|
|
|
|
|
|
22 |
|
23 |
-
**
|
|
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
- **KV Cache Compression**: Utilizing Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies, it significantly reduces memory usage and computational overhead of the KV cache, thereby improving inference throughput.
|
28 |
-
|
29 |
-
- **Expert-Specific Learning Rate Scaling**: Different learning rates are assigned to different experts, ensuring that each sub-model can effectively learn from the data and contribute to overall performance.
|
30 |
-
|
31 |
-
- **Long-Context Processing Capability**: Both the pre-trained model and the instruction-tuned model support text sequences of up to 256K tokens, significantly enhancing the ability to handle long-context tasks.
|
32 |
-
|
33 |
-
- **Extensive Benchmarking**: Extensive experiments across multiple languages and tasks have validated the practical effectiveness and safety of Hunyuan-A13B.
|
34 |
-
|
35 |
-
- **Hybrid Reasoning Capability**: It supports both fast thinking and slow thinking inference modes.
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
**Architecture**
|
40 |
-
|
41 |
-
Hunyuan-A13B adopts a Fine-grained Mixture of Experts (Fine-grained MoE) architecture, comprising a total of 80 billion parameters with 13 billion active parameters. The model has been trained on over 20 trillion tokens. It supports a context length of up to 256K tokens. The following are the detailed specifications of the model architecture:
|
42 |
-
|
43 |
-
- **Total Parameters**: 80B
|
44 |
-
- **Active Parameters**: 13B
|
45 |
-
- **Number of Layers**: 32
|
46 |
-
- **Attention Heads**: 32
|
47 |
-
- **Number of Shared Experts**: 1
|
48 |
-
- **Number of Non-Shared Experts**: 64
|
49 |
-
- **Routing Strategy**: Top-8
|
50 |
-
- **Activation Function**: SwiGLU
|
51 |
-
- **Hidden Layer Dimension**: 4096
|
52 |
-
- **Expert Hidden Layer Dimension**: 3072
|
53 |
|
|
|
54 |
|
55 |
|
56 |
|
|
|
14 |
 <a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>  
|
15 |
|
16 |
|
17 |
+
Welcome to the official repository of **Hunyuan-A13B**, an innovative and open-source large language model (LLM) built on a fine-grained Mixture-of-Experts (MoE) architecture. Designed for efficiency and scalability, Hunyuan-A13B delivers cutting-edge performance with minimal computational overhead, making it an ideal choice for advanced reasoning and general-purpose applications, especially in resource-constrained environments.
|
18 |
|
19 |
+
## Key Features and Highlights
|
20 |
|
21 |
+
- **High Performance with Fewer Parameters**: With only 13B active parameters (out of a total of 80B), Hunyuan-A13B achieves competitive results compared to much larger models across diverse benchmark tasks.
|
22 |
+
- **Robust Pre-Training and Optimization**: Trained on a massive 20TB high-quality dataset, the model benefits from structured supervised fine-tuning and reinforcement learning strategies to enhance its reasoning, language comprehension, and general knowledge capabilities.
|
23 |
+
- **Dual-Mode Chain-of-Thought (CoT) Framework**: This unique feature allows dynamic adjustment of reasoning depth, balancing computational efficiency with accuracy. It supports both concise responses for simple tasks and in-depth reasoning for complex challenges.
|
24 |
+
- **Exceptional Long-Context Understanding**: Hunyuan-A13B natively supports a 256K context window, maintaining robust performance in long-text tasks.
|
25 |
|
26 |
+
- **Advanced Agent-Oriented Capabilities**: Tailored optimizations enable effective handling of complex decision-making, with leading performance on agent benchmarks such as BFCL-v3 and τ-Bench.
|
27 |
+
- **Superior Inference Efficiency**: Architectural innovations, including Grouped Query Attention (GQA) and support for multiple quantization formats , result in exceptional inference speed.
|
28 |
|
29 |
+
## Why Choose Hunyuan-A13B?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
+
Hunyuan-A13B stands out as a powerful, scalable, and computationally efficient LLM, perfectly suited for researchers and developers seeking high performance without the burden of excessive resource demands. Whether you're working on academic research, building cost-effective AI solutions, or exploring novel applications, Hunyuan-A13B provides a versatile foundation to build upon.
|
32 |
|
33 |
|
34 |
|