Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
csabakecskemeti 
posted an update about 20 hours ago
Post
1060
Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..

·

All correct!
What I've called out: a used MI100 is in the same price range as used V100 PCIe that's why I'm comparing with that.

And yes you're right FP16 performance would be more useful to mention (AFAIK V100 112TFLOPS, MI100 184TFLOPS) but regarding comparison it shows the same 164% (claimed) performance for MI100.

Please NOTE I'm building my hobby AI infra at home for myself, so mainly constrained by TOPS/$ :D