๐ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
๐ด๐ป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
๐ ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic.
๐ค ๐๐๐ ๐ป๐ผ๐ ๐๐ฒ ๐ฑ๐ผ๐ป'๐ ๐ป๐ฒ๐ฒ๐ฑ ๐ต๐๐ด๐ฒ ๐ฟ๐ฒ๐ฝ๐ผ๐ ๐ฎ๐ป๐๐บ๐ผ๐ฟ๐ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
โก ๐๐'๐ ๐๐ถ๐ป๐, ๐๐ฒ๐ ๐ฝ๐ผ๐๐ฒ๐ฟ๐ณ๐๐น: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157. I would be happy to receive your feedback.
๐ RAGOndevice: High-Performance Local AI Document Analysis Assistant ๐ซ Core Value RAGOndevice is a high-performance AI system running locally without cloud dependency. Using CohereForAI's optimized 7B model, it enables professional-grade document analysis on standard PCs. โจ ๐ Ondevice AI Advantages 1. ๐ Efficient Resource Utilization
๐ฏ Optimized 7B Model: Runs on standard PCs โก Local Processing: Instant response without cloud ๐ป Low-Spec Compatible: Performs well on regular GPUs ๐ Optimized Memory: Ensures stable operation
2. ๐ก๏ธ Data Security & Cost Efficiency
๐ Complete Privacy: No external data transmission ๐ Offline Operation: No internet required ๐ฐ No Subscription: One-time installation โ๏ธ Resource Optimization: Uses existing hardware
๐ฎ Key Features 1. ๐ Powerful Document Analysis
๐ Multi-Format Support: TXT, CSV, PDF, Parquet ๐ง Intelligent Analysis: Automatic structure recognition ๐๏ธ OCR Support: Advanced PDF text extraction ๐ฌ Real-time Chat: Natural language interaction
2. ๐ Local RAG System
๐ฏ Efficient Search: TF-IDF based local search ๐งฉ Context Understanding: Accurate information retrieval ๐ Wikipedia Integration: Rich background knowledge
๐ฏ Use Cases
๐ข Enterprise: Secure confidential document processing ๐ฌ Personal Research: Private data analysis ๐ Education: Personal learning material analysis ๐ป Development: Local codebase analysis
โญ Differentiators
๐โโ๏ธ Independent Operation: Zero cloud dependency โก Instant Response: No network latency ๐ Complete Security: Full data control ๐ Cost Efficiency: No ongoing costs
๐ฎ Future Plans
๐ Enhanced model optimization ๐ Local knowledge base expansion โก Hardware optimization ๐ Extended file support
๐ RAGOndevice democratizes high-performance AI, providing the optimal local AI solution for security-sensitive environments. ๐
๐ฅ Power of Local AI: Experience enterprise-grade AI capabilities right on your device!
After some heated discussion ๐ฅ, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ