manaestras commited on
Commit
58198a2
·
verified ·
1 Parent(s): 44ad127

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -32
README.md CHANGED
@@ -14,43 +14,21 @@ license_link: LICENSE
14
  &nbsp<a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>&nbsp&nbsp
15
 
16
 
17
- ## Model Introduction
18
 
19
- The A13B models released by Tencent Hunyuan this time: [Tencent-Hunyuan-A13B-Pretrain](https://huggingface.co/tencent/Hunyuan-A13B-Pretrain) , [Tencent-Hunyuan-A13B-Instruct](https://huggingface.co/tencent/Hunyuan-A13B-Instruct) and [Tencent-Hunyuan-A13B-Instruct-FP8](https://huggingface.co/tencent/Tencent-Hunyuan-A13B-Instruct-FP8), use better data allocation and training, have strong performance, and have achieved a good balance between computing and performance. It stands out from many large-scale language models and is currently one of the strongest Chinese Mixture of Experts (MoE) models, featuring a total of 80 billion parameters and 13 billion active parameters.
20
 
21
- ### Introduction to Technical Advantages
 
 
 
22
 
23
- **Model**
 
24
 
25
- - **High-Quality Synthetic Data**: By enhancing training with synthetic data, Hunyuan-A13B is able to learn richer representations, handle long-context inputs, and generalize better to unseen data.
26
-
27
- - **KV Cache Compression**: Utilizing Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies, it significantly reduces memory usage and computational overhead of the KV cache, thereby improving inference throughput.
28
-
29
- - **Expert-Specific Learning Rate Scaling**: Different learning rates are assigned to different experts, ensuring that each sub-model can effectively learn from the data and contribute to overall performance.
30
-
31
- - **Long-Context Processing Capability**: Both the pre-trained model and the instruction-tuned model support text sequences of up to 256K tokens, significantly enhancing the ability to handle long-context tasks.
32
-
33
- - **Extensive Benchmarking**: Extensive experiments across multiple languages and tasks have validated the practical effectiveness and safety of Hunyuan-A13B.
34
-
35
- - **Hybrid Reasoning Capability**: It supports both fast thinking and slow thinking inference modes.
36
-
37
-
38
-
39
- **Architecture**
40
-
41
- Hunyuan-A13B adopts a Fine-grained Mixture of Experts (Fine-grained MoE) architecture, comprising a total of 80 billion parameters with 13 billion active parameters. The model has been trained on over 20 trillion tokens. It supports a context length of up to 256K tokens. The following are the detailed specifications of the model architecture:
42
-
43
- - **Total Parameters**: 80B
44
- - **Active Parameters**: 13B
45
- - **Number of Layers**: 32
46
- - **Attention Heads**: 32
47
- - **Number of Shared Experts**: 1
48
- - **Number of Non-Shared Experts**: 64
49
- - **Routing Strategy**: Top-8
50
- - **Activation Function**: SwiGLU
51
- - **Hidden Layer Dimension**: 4096
52
- - **Expert Hidden Layer Dimension**: 3072
53
 
 
54
 
55
  &nbsp;
56
 
 
14
  &nbsp<a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>&nbsp&nbsp
15
 
16
 
17
+ Welcome to the official repository of **Hunyuan-A13B**, an innovative and open-source large language model (LLM) built on a fine-grained Mixture-of-Experts (MoE) architecture. Designed for efficiency and scalability, Hunyuan-A13B delivers cutting-edge performance with minimal computational overhead, making it an ideal choice for advanced reasoning and general-purpose applications, especially in resource-constrained environments.
18
 
19
+ ## Key Features and Highlights
20
 
21
+ - **High Performance with Fewer Parameters**: With only 13B active parameters (out of a total of 80B), Hunyuan-A13B achieves competitive results compared to much larger models across diverse benchmark tasks.
22
+ - **Robust Pre-Training and Optimization**: Trained on a massive 20TB high-quality dataset, the model benefits from structured supervised fine-tuning and reinforcement learning strategies to enhance its reasoning, language comprehension, and general knowledge capabilities.
23
+ - **Dual-Mode Chain-of-Thought (CoT) Framework**: This unique feature allows dynamic adjustment of reasoning depth, balancing computational efficiency with accuracy. It supports both concise responses for simple tasks and in-depth reasoning for complex challenges.
24
+ - **Exceptional Long-Context Understanding**: Hunyuan-A13B natively supports a 256K context window, maintaining robust performance in long-text tasks.
25
 
26
+ - **Advanced Agent-Oriented Capabilities**: Tailored optimizations enable effective handling of complex decision-making, with leading performance on agent benchmarks such as BFCL-v3 and τ-Bench.
27
+ - **Superior Inference Efficiency**: Architectural innovations, including Grouped Query Attention (GQA) and support for multiple quantization formats , result in exceptional inference speed.
28
 
29
+ ## Why Choose Hunyuan-A13B?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ Hunyuan-A13B stands out as a powerful, scalable, and computationally efficient LLM, perfectly suited for researchers and developers seeking high performance without the burden of excessive resource demands. Whether you're working on academic research, building cost-effective AI solutions, or exploring novel applications, Hunyuan-A13B provides a versatile foundation to build upon.
32
 
33
  &nbsp;
34