Bhargav Solanki

solankibhargav

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with šŸ‘ 6 days ago
š‡š®š š š¢š§š  š…šššœšž š«šžš„šžššš¬šžš¬ šš¢šœšØš­š«šØš§, šš š¦š¢šœš«šØš¬šœšØš©š¢šœ š„š¢š› š­š”ššš­ š¬šØš„šÆšžš¬ š‹š‹šŒ š­š«ššš¢š§š¢š§š  šŸ’šƒ š©ššš«ššš„š„šžš„š¢š³ššš­š¢šØš§ šŸ„³ šŸ•°ļø Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. šŸ‘“šŸ» If they had needed all this time, we would have GPU stories from the time of Pharaoh š“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " šŸ› ļø But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. šŸ¤ š—•š˜‚š˜ š—»š—¼š˜„ š˜„š—² š—±š—¼š—»'š˜ š—»š—²š—²š—± š—µš˜‚š—“š—² š—暝—²š—½š—¼š˜€ š—®š—»š˜†š—ŗš—¼š—暝—²! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! āš” š—œš˜'š˜€ š˜š—¶š—»š˜†, š˜†š—²š˜ š—½š—¼š˜„š—²š—暝—³š˜‚š—¹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look šŸ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
View all activity

Organizations

Abacus Research AG's profile picture

solankibhargav's activity

New activity in llava-hf/vip-llava-13b-hf 23 days ago
New activity in THUDM/glm-edge-v-5b 23 days ago
New activity in stepfun-ai/GOT-OCR2_0 3 months ago
New activity in AbacusResearch/Jallabi-34B 4 months ago
New activity in cognitivecomputations/dolphin-vision-72b 6 months ago

Steps to fine tune?

1
#5 opened 6 months ago by
solankibhargav
New activity in mistralai/Codestral-22B-v0.1 7 months ago
New activity in AbacusResearch/haLLawa4-7b 10 months ago
New activity in AbacusResearch/jaLLAbi 10 months ago
New activity in AbacusResearch/jaLLAbi2-7b 10 months ago
New activity in AbacusResearch/haLLAwa3 10 months ago
New activity in AbacusResearch/Jallabi-34B 10 months ago
New activity in AbacusResearch/haLLAwa2 11 months ago