arxiv:2502.05602

UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA

Published on Feb 8

Authors:

Jiale Dong ,

Abstract

Compared to traditional Vision Transformers (ViT), Mixture-of-Experts Vision Transformers (MoE-ViT) are introduced to scale model size without a proportional increase in computational complexity, making them a new research focus. Given the high performance and reconfigurability, FPGA-based accelerators for MoE-ViT emerge, delivering substantial gains over general-purpose processors. However, existing accelerators often fall short of fully exploring the design space, leading to suboptimal trade-offs between resource utilization and performance. To overcome this problem, we introduce UbiMoE, a novel end-to-end FPGA accelerator tailored for MoE-ViT. Leveraging the unique computational and memory access patterns of MoE-ViTs, we develop a latency-optimized streaming attention kernel and a resource-efficient reusable linear kernel, effectively balancing performance and resource consumption. To further enhance design efficiency, we propose a two-stage heuristic search algorithm that optimally tunes hardware parameters for various FPGA resource constraints. Compared to state-of-the-art (SOTA) FPGA designs, UbiMoE achieves 1.34x and 3.35x throughput improvements for MoE-ViT on Xilinx ZCU102 and Alveo U280 platforms, respectively, while enhancing energy efficiency by 1.75x and 1.54x. Our implementation is available at https://github.com/DJ000011/UbiMoE.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.05602 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.05602 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.05602 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.