arxiv:2408.11997

Floating-Point Multiply-Add with Approximate Normalization for Low-Cost Matrix Engines

Published on Aug 21, 2024

Authors:

Abstract

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point arithmetic. However, conventional floating-point implementations can be power hungry. This paper proposes a method to improve the energy efficiency of the matrix engines used in machine learning algorithm acceleration. Our approach leverages approximate normalization within the floating-point multiply-add units as a means to reduce their hardware complexity, without sacrificing overall machine-learning model accuracy. Hardware synthesis results show that this technique reduces area and power consumption roughly by 16% and 13% on average for Bfloat16 format. Also, the error introduced in transformer model accuracy is 1% on average, for the most efficient configuration of the proposed approach.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.11997 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.11997 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.11997 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.