Training a model to reason in the continuous latent space based on Meta's Coconut. If it all works will apply it on the MiniCPM-o SVD-LR. Endgame is a multimodal, adaptive, and efficient foundational on device AI model.
Exciting breakthrough in e-commerce recommendation systems! Walmart Global Tech researchers have developed a novel Triple Modality Fusion (TMF) framework that revolutionizes how we make product recommendations.
>> Key Innovation The framework ingeniously combines three distinct data types: - Visual data to capture product aesthetics and context - Textual information for detailed product features - Graph data to understand complex user-item relationships
>> Technical Architecture The system leverages a Large Language Model (Llama2-7B) as its backbone and introduces several sophisticated components:
Modality Fusion Module - All-Modality Self-Attention (AMSA) for unified representation - Cross-Modality Attention (CMA) mechanism for deep feature integration - Custom FFN adapters to align different modality embeddings
Advanced Training Strategy - Curriculum learning approach with three complexity levels - Parameter-Efficient Fine-Tuning using LoRA - Special token system for behavior and item representation
>> Real-World Impact The results are remarkable: - 38.25% improvement in Electronics recommendations - 43.09% boost in Sports category accuracy - Significantly higher human evaluation scores compared to traditional methods
Currently deployed in Walmart's production environment, this research demonstrates how combining multiple data modalities with advanced LLM architectures can dramatically improve recommendation accuracy and user satisfaction.
š°ļø Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
š“š» If they had needed all this time, we would have GPU stories from the time of Pharaoh š: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
š ļø But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic.
š¤ ššš š»š¼š šš² š±š¼š»'š š»š²š²š± šµšš“š² šæš²š½š¼š š®š»ššŗš¼šæš²! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
ā” šš'š šš¶š»š, šš²š š½š¼šš²šæš³šš¹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157. I would be happy to receive your feedback.
š RAGOndevice: High-Performance Local AI Document Analysis Assistant š« Core Value RAGOndevice is a high-performance AI system running locally without cloud dependency. Using CohereForAI's optimized 7B model, it enables professional-grade document analysis on standard PCs. āØ š Ondevice AI Advantages 1. š Efficient Resource Utilization
šÆ Optimized 7B Model: Runs on standard PCs ā” Local Processing: Instant response without cloud š» Low-Spec Compatible: Performs well on regular GPUs š Optimized Memory: Ensures stable operation
2. š”ļø Data Security & Cost Efficiency
š Complete Privacy: No external data transmission š Offline Operation: No internet required š° No Subscription: One-time installation āļø Resource Optimization: Uses existing hardware
š® Key Features 1. š Powerful Document Analysis
š Multi-Format Support: TXT, CSV, PDF, Parquet š§ Intelligent Analysis: Automatic structure recognition šļø OCR Support: Advanced PDF text extraction š¬ Real-time Chat: Natural language interaction
š¢ Enterprise: Secure confidential document processing š¬ Personal Research: Private data analysis š Education: Personal learning material analysis š» Development: Local codebase analysis
ā Differentiators
šāāļø Independent Operation: Zero cloud dependency ā” Instant Response: No network latency š Complete Security: Full data control š Cost Efficiency: No ongoing costs
š® Future Plans
š Enhanced model optimization š Local knowledge base expansion ā” Hardware optimization š Extended file support
š RAGOndevice democratizes high-performance AI, providing the optimal local AI solution for security-sensitive environments. š
š„ Power of Local AI: Experience enterprise-grade AI capabilities right on your device!