Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Oct 8, 2024
Post
2279
๐Ÿ’ฅ ๐‹-๐Œ๐ฎ๐ฅ: ๐€๐๐๐ข๐ญ๐ข๐จ๐ง-๐Ž๐ง๐ฅ๐ฒ ๐Œ๐ฎ๐ฅ๐ญ๐ข๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง ๐œ๐š๐ง ๐ฌ๐ฅ๐š๐ฌ๐ก ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐š๐ญ๐ข๐จ๐ง๐š๐ฅ ๐œ๐จ๐ฌ๐ญ๐ฌ ๐›๐ฒ ๐Ÿ–๐ŸŽ%!

Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.

๐Ÿ’ก Quick reminder on how floats are coded on 8 bits (FP8):
In the e4m3 FP8 standard, you encode a number as:
Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits)
Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625)
Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):

โžก๏ธย You get (1 + 0.625) ร— 2^(8-7) = 3.25

Now back to the paper. ๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:

โšก๏ธ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!

๐Ÿงฎ Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) ยท 2^xe ยท (1 + ym) ยท 2^ye = (1 + xm + ym + xm ยท ym) ยท 2^(xe+ye)

๐Ÿ’ก L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) ยท 2^(xe+ye), eliminating the costly xm ยท ym term

๐Ÿ”ง l(m) term is adaptively set based on mantissa size for optimal accuracy

๐Ÿ“Š Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision

๐Ÿ’ฌ Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."

This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but itโ€™s a really exciting path for more sustainable AI! ๐ŸŒฑ

Read the paper here ๐Ÿ‘‰ย  Addition is All You Need for Energy-efficient Language Models (2410.00907)
In this post