Nauman Mustafa

naxautify

AI & ML interests

None yet

Recent Activity

liked a model 2 months ago
stepfun-ai/GOT-OCR2_0
liked a model 3 months ago
rain1011/pyramid-flow-sd3
reacted to m-ric's post with ❤️ 3 months ago
💥 𝐋-𝐌𝐮𝐥: 𝐀𝐝𝐝𝐢𝐭𝐢𝐨𝐧-𝐎𝐧𝐥𝐲 𝐌𝐮𝐥𝐭𝐢𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐧 𝐬𝐥𝐚𝐬𝐡 𝐜𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐜𝐨𝐬𝐭𝐬 𝐛𝐲 𝟖𝟎%! Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications. 💡 Quick reminder on how floats are coded on 8 bits (FP8): In the e4m3 FP8 standard, you encode a number as: Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits) Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625) Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3): ➡️ You get (1 + 0.625) × 2^(8-7) = 3.25 Now back to the paper. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀: ⚡️ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)! 🧮 Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) · 2^xe · (1 + ym) · 2^ye = (1 + xm + ym + xm · ym) · 2^(xe+ye) 💡 L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) · 2^(xe+ye), eliminating the costly xm · ym term 🔧 l(m) term is adaptively set based on mantissa size for optimal accuracy 📊 Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision 💬 Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%." This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but it’s a really exciting path for more sustainable AI! 🌱 Read the paper here 👉 https://huggingface.co/papers/2410.00907
View all activity

Organizations

Autify's profile picture

naxautify's activity

reacted to m-ric's post with ❤️ 3 months ago
view post
Post
2279
💥 𝐋-𝐌𝐮𝐥: 𝐀𝐝𝐝𝐢𝐭𝐢𝐨𝐧-𝐎𝐧𝐥𝐲 𝐌𝐮𝐥𝐭𝐢𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐧 𝐬𝐥𝐚𝐬𝐡 𝐜𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐜𝐨𝐬𝐭𝐬 𝐛𝐲 𝟖𝟎%!

Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.

💡 Quick reminder on how floats are coded on 8 bits (FP8):
In the e4m3 FP8 standard, you encode a number as:
Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits)
Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625)
Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):

➡️ You get (1 + 0.625) × 2^(8-7) = 3.25

Now back to the paper. 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

⚡️ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!

🧮 Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) · 2^xe · (1 + ym) · 2^ye = (1 + xm + ym + xm · ym) · 2^(xe+ye)

💡 L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) · 2^(xe+ye), eliminating the costly xm · ym term

🔧 l(m) term is adaptively set based on mantissa size for optimal accuracy

📊 Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision

💬 Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."

This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but it’s a really exciting path for more sustainable AI! 🌱

Read the paper here 👉  Addition is All You Need for Energy-efficient Language Models (2410.00907)