77 7 129

t.d.a.g. PRO

sequelbox

sequelbox.bsky.social

AI & ML interests

open source, infinite games. (they/them)

Recent Activity

liked a dataset about 17 hours ago

KingNish/reasoning-base-20k

liked a dataset about 18 hours ago

qingy2024/QwQ-LongCoT-Verified-130K

liked a dataset about 18 hours ago

amphora/QwQ-LongCoT-130K

View all activity

Organizations

sequelbox's activity

liked a dataset about 17 hours ago

KingNish/reasoning-base-20k

Viewer • Updated Oct 5 • 19.9k • 1.6k • 192

liked 3 datasets about 18 hours ago

liked a model about 19 hours ago

answerdotai/ModernBERT-large

Fill-Mask • Updated 5 days ago • 6.79k • 206

upvoted a collection about 19 hours ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated Oct 15 • 148

liked a model about 19 hours ago

nvidia/Llama-3_1-Nemotron-51B-Instruct

Text Generation • Updated Oct 13 • 125k • 200

reacted to m-ric's post with 👀 4 days ago

Post

1999

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

liked 3 datasets 12 days ago

jondurbin/airoboros-2.2

Viewer • Updated Oct 3, 2023 • 44.8k • 85 • 20

microsoft/orca-math-word-problems-200k

Viewer • Updated Mar 4 • 200k • 1.13k • 423

WizardLMTeam/WizardLM_evol_instruct_70k

Viewer • Updated Mar 10 • 70k • 384 • 187

liked a model 12 days ago

nvidia/OpenMath2-Llama3.1-70B

Text Generation • Updated 29 days ago • 131 • 17

liked a dataset 12 days ago

nvidia/OpenMathInstruct-2

Viewer • Updated 29 days ago • 22M • 9.68k • 135

liked 5 models 12 days ago

Qwen/QwQ-32B-Preview

Text Generation • Updated 25 days ago • 122k • • 1.4k

nvidia/Hymba-1.5B-Instruct

Text Generation • Updated 5 days ago • 13.7k • 212

NousResearch/Hermes-3-Llama-3.2-3B

Text Generation • Updated 6 days ago • 12.4k • 105

NousResearch/Hermes-3-Llama-3.1-8B

Text Generation • Updated Sep 8 • 64.7k • • 269

NousResearch/Hermes-3-Llama-3.1-405B

Text Generation • Updated Oct 8 • 4.53k • 183

reacted to takarajordan's post with ❤️ 12 days ago

Post

2190

I'm super excited to release my first open-source text dataset:

WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.

I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.

I'd appreciate some feedback and thoughts on my new release! Thanks!

takarajordan/WorldScenario_20K

4 replies

liked a model 15 days ago

meta-llama/Llama-3.3-70B-Instruct

Text Generation • Updated 3 days ago • 277k • • 1.28k