Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

AdinaYΒ 
posted an update 2 days ago
view post
Post
4204
MiniCPM-V 4.5 πŸš€ New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB

openbmb/MiniCPM-V-4_5

✨ SOTA vision language capability
✨ 96Γ— video token compression > high-FPS & long video reasoning
✨ Switchable fast vs deep thinking modes
✨ Strong OCR, document parsing, supports 30+ languages
merveΒ 
posted an update 2 days ago
view post
Post
5213
first vision language model built off openai/gpt-oss-20b just dropped! πŸ”₯

InternVL3.5 comes with 32 models 🀯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ‡️
  • 1 reply
Β·
codelionΒ 
posted an update 2 days ago
view post
Post
4683
I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.

Logic puzzle accuracy: 61% β†’ 84%. 3 hours training on single GPU. 🧠

Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.

πŸ”— Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb

πŸ€— Model: codelion/gemma-3-1b-it-reasoning-grpo-lora

πŸ’» Code: https://github.com/codelion/ellora
  • 1 reply
Β·
ginipickΒ 
posted an update 2 days ago
view post
Post
3106
πŸŽ‰ Fashion Fit 360: The New Standard in AI Virtual Try-On!

πŸš€ Now Live and Free to Use!Say goodbye to online shopping uncertainty - "Will this look good on me?" - with our revolutionary solution!Fashion Fit 360 is a cutting-edge AI-powered virtual fitting service that transforms your fashion shopping experience.

LINK: ginigen/Fashion-Fit360

✨ Core Features
πŸ”„ 360-Degree Multi-Pose Generation
Transform a single front-facing photo into 6 different viewing angles!
Front, side, and back views for complete visualization
Experience a real fitting room mirror effect
Check fit and style from every perspective

πŸ‘— 15 Fashion Item Categories
Apparel: Tops, bottoms, dresses
Jewelry: Necklaces, earrings, rings, bracelets
Accessories: Sunglasses, eyewear, hats, ties, bow ties, belts
Essentials: Bags, shoes

🎯 Perfect For:
πŸ›οΈ Online Shopping Enthusiasts: Preview before purchase - zero return hassles!
πŸ’ Jewelry Lovers: Virtually try expensive pieces before investing
🎁 Thoughtful Gift-Givers: Test items on recipient photos beforehand
πŸ‘” Business Professionals: Preview suit and tie combinations
πŸ‘— Fashion Designers: Rapidly visualize design samples

πŸ’‘ Why Fashion Fit 360?Fashion Fit 360 delivers innovation beyond conventional services.While most virtual fitting platforms only support clothing, we offer complete support for 15 accessory types. Unlike competitors providing only front views, Fashion Fit 360 generates 6 poses for true 360-degree visualization, ensuring you can verify actual fit perfectly.Performance is unmatched - get results in under 20 seconds with one-click simplicity and no complex configurations. Plus, download all generated images as a convenient ZIP file, eliminating tedious individual saves.

πŸ”₯ Key Differentiators
🎨 360-Degree Multi-Pose Image Generation
πŸ€– FLUX.1-Fill based OmniTry integrated model with Flux.1 KONTEXT LoRA technology
prithivMLmodsΒ 
posted an update 2 days ago
view post
Post
4451
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.

✨ Space/App : prithivMLmods/Tiny-VLMs-Lab
πŸ«™ Model : OpenGVLab/InternVL3_5-2B-MPO
↗️ Collection: OpenGVLab/internvl35-68ac87bd52ebe953485927fb
πŸ—žοΈ Paper : https://arxiv.org/pdf/2508.18265
↗️ Multimodal Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

To learn more, visit the relevant spaces, collections, and model cards.
  • 1 reply
Β·
codelionΒ 
posted an update about 11 hours ago
view post
Post
736
I wanted to share a technique that's been working really well for recovering performance after INT4 quantization.

Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses.

Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47).

We saw similar results on Qwen3-0.6B:

Perplexity: 2.40 β†’ 2.09 (only 5.7% degradation from FP16 baseline)
Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction)
Speed: 3.0x faster inference than FP16
Quality: Generates correct, optimized code solutions

- Pre-trained adapter: codelion/Qwen3-0.6B-accuracy-recovery-lora
- GitHub repo: https://github.com/codelion/ellora

Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization.

Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!
openfreeΒ 
posted an update 2 days ago
view post
Post
5138
πŸ”’ Ansim Blur: Privacy-First Face Blurring for the AI Era

🚨 The Privacy Crisis is Now
Smart CCTVs πŸ“Ή, delivery robots πŸ€–, and autonomous vehicles πŸš— are everywhere. Your face is being captured, transmitted, and stored without your knowledge or consent.

openfree/Face-blurring

The privacy threat is real:
24/7 surveillance cameras recording your every move
Companies harvesting facial biometric data at scale
Your face becoming a commodity without your permission

πŸ’‘ The Solution: Ansim Blur
Real-time face anonymization powered by YOLOv8 🎯
βœ… Process images, videos, and live streams
βœ… Automatic GPU/CPU detection for universal deployment
βœ… Choose between Gaussian blur or mosaic pixelation
βœ… Fine-tune detection sensitivity for your needs
βœ… Preserve audio tracks in video processing
πŸ›‘οΈ Real-World Applications
Enterprise Use Cases

Privacy compliance for robotics and drone footage
CCTV feed anonymization for regulatory requirements
Customer data protection in retail analytics

Personal Protection

Anonymize bystanders before sharing content online
Protect family members' privacy in shared videos
Avoid portrait rights issues in content creation

πŸ“Š Technical Specifications

Model: YOLOv8-face (optimized variant)
Performance: 30fps real-time processing on RTX 3060
Accuracy: 95%+ face detection rate
Formats: JPG, PNG, MP4, AVI, MOV

🌍 Why This Matters
"Face blurring will become mandatory for all public-facing cameras"
With GDPR in Europe, CCPA in California, and similar regulations worldwide, biometric data protection is becoming non-negotiable. Soon, every camera-equipped system will require built-in face anonymization capabilities.
🀝 Join the Movement
Why open source?
Because privacy isn't a premium featureβ€”it's a fundamental right.

As technology advances, so must our commitment to privacy protection πŸ›‘οΈ
  • 2 replies
Β·
AdinaYΒ 
posted an update 2 days ago
view post
Post
910
πŸ‡¨πŸ‡³ China’s State Council just released its β€œAI+” Action Plan (2025)

<The State Council’s Guidance on Deepened Implementation of the β€˜AI+’ Strategy>
zh-ai-community/china-ai-policy-research

✨Goal: By 2035, AI will deeply empower all sectors, reshape productivity & society

✨Focus on 6 pillars:
>Science & Tech
>Industry
>Consumption
>Public welfare
>Governance
>Global cooperation

✨Highlights:
>Models: advance theory, efficient training/inference, evaluation system
>Data: high-quality datasets, IP/copyright reform, new incentives
>Compute: boost chips & clusters, improve national network, promote cloud standardization, and ensure inclusive, efficient, green, secure supply.
>Applications: AI-as-a-service, test bases, new standards
>Open-source: support communities, encourage contributions (incl. university credits & recognition), foster new application approaches, and build globally impactful ecosystems πŸ‘€
>Talent, policy & safety frameworks to secure sustainable growth
jeffboudierΒ 
posted an update 3 days ago
view post
Post
2612
Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf
tsungyiΒ 
posted an update about 14 hours ago
view post
Post
185
Cosmos Reason just topped Physical Reasoning Leaderboard on Hugging Face. πŸ‘πŸ”₯

Cosmos Reason is an open, customizable, commercial-ready 7B-parameter, reasoning vision language model (VLM) for physical AI and robotics. The VLM empowers robots and vision AI agents to reason like humans, leveraging prior knowledge, physics understanding, and common sense to understand and operate intelligently in the real world.

This model unlocks advanced capabilities for robotics, autonomous vehicles, and real-world operationsβ€”from cities to high-tech factories.

Key use cases include:
Data curation & annotation: Automate high-quality dataset curation and annotation at scale.
Robot planning & reasoning: Serve as the "brain" for deliberate, methodical decision-making with vision language action (VLA) models.
Video analytics AI agents: Extract actionable insights and perform root-cause analysis on massive video datasets.

Ready to build the next generation of physical AI? Get started πŸ‘‰ nvidia/Cosmos-Reason1-7B
Try the preview here: https://build.nvidia.com/nvidia/cosmos-reason1-7b