tencent
/

Hunyuan-A13B-Instruct

@@ -14,43 +14,21 @@ license_link: LICENSE
     &nbsp<a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>&nbsp&nbsp
-## Model Introduction
-The A13B models released by Tencent Hunyuan this time: [Tencent-Hunyuan-A13B-Pretrain](https://huggingface.co/tencent/Hunyuan-A13B-Pretrain) , [Tencent-Hunyuan-A13B-Instruct](https://huggingface.co/tencent/Hunyuan-A13B-Instruct) and [Tencent-Hunyuan-A13B-Instruct-FP8](https://huggingface.co/tencent/Tencent-Hunyuan-A13B-Instruct-FP8), use better data allocation and training, have strong performance, and have achieved a good balance between computing and performance. It stands out from many large-scale language models and is currently one of the strongest Chinese Mixture of Experts (MoE) models, featuring a total of 80 billion parameters and 13 billion active parameters.
-### Introduction to Technical Advantages
-**Model**
-- **High-Quality Synthetic Data**: By enhancing training with synthetic data, Hunyuan-A13B is able to learn richer representations, handle long-context inputs, and generalize better to unseen data.
-- **KV Cache Compression**: Utilizing Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies, it significantly reduces memory usage and computational overhead of the KV cache, thereby improving inference throughput.
-- **Expert-Specific Learning Rate Scaling**: Different learning rates are assigned to different experts, ensuring that each sub-model can effectively learn from the data and contribute to overall performance.
-- **Long-Context Processing Capability**: Both the pre-trained model and the instruction-tuned model support text sequences of up to 256K tokens, significantly enhancing the ability to handle long-context tasks.
-- **Extensive Benchmarking**: Extensive experiments across multiple languages and tasks have validated the practical effectiveness and safety of Hunyuan-A13B.
-- **Hybrid Reasoning Capability**: It supports both fast thinking and slow thinking inference modes.
-**Architecture**
-Hunyuan-A13B adopts a Fine-grained Mixture of Experts (Fine-grained MoE) architecture, comprising a total of 80 billion parameters with 13 billion active parameters. The model has been trained on over 20 trillion tokens. It supports a context length of up to 256K tokens. The following are the detailed specifications of the model architecture:
-- **Total Parameters**: 80B
-- **Active Parameters**: 13B
-- **Number of Layers**: 32
-- **Attention Heads**: 32
-- **Number of Shared Experts**: 1
-- **Number of Non-Shared Experts**: 64
-- **Routing Strategy**: Top-8
-- **Activation Function**: SwiGLU
-- **Hidden Layer Dimension**: 4096
-- **Expert Hidden Layer Dimension**: 3072
 &nbsp;

     &nbsp<a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>&nbsp&nbsp
+Welcome to the official repository of **Hunyuan-A13B**, an innovative and open-source large language model (LLM) built on a fine-grained Mixture-of-Experts (MoE) architecture. Designed for efficiency and scalability, Hunyuan-A13B delivers cutting-edge performance with minimal computational overhead, making it an ideal choice for advanced reasoning and general-purpose applications, especially in resource-constrained environments.
+## Key Features and Highlights
+- **High Performance with Fewer Parameters**: With only 13B active parameters (out of a total of 80B), Hunyuan-A13B achieves competitive results compared to much larger models across diverse benchmark tasks.
+- **Robust Pre-Training and Optimization**: Trained on a massive 20TB high-quality dataset, the model benefits from structured supervised fine-tuning and reinforcement learning strategies to enhance its reasoning, language comprehension, and general knowledge capabilities.
+- **Dual-Mode Chain-of-Thought (CoT) Framework**: This unique feature allows dynamic adjustment of reasoning depth, balancing computational efficiency with accuracy. It supports both concise responses for simple tasks and in-depth reasoning for complex challenges.
+- **Exceptional Long-Context Understanding**: Hunyuan-A13B natively supports a 256K context window, maintaining robust performance in long-text tasks.
+- **Advanced Agent-Oriented Capabilities**: Tailored optimizations enable effective handling of complex decision-making, with leading performance on agent benchmarks such as BFCL-v3  and τ-Bench.
+- **Superior Inference Efficiency**: Architectural innovations, including Grouped Query Attention (GQA) and support for multiple quantization formats , result in exceptional inference speed.
+## Why Choose Hunyuan-A13B?
+Hunyuan-A13B stands out as a powerful, scalable, and computationally efficient LLM, perfectly suited for researchers and developers seeking high performance without the burden of excessive resource demands. Whether you're working on academic research, building cost-effective AI solutions, or exploring novel applications, Hunyuan-A13B provides a versatile foundation to build upon.
 &nbsp;