ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper β’ 2411.17465 β’ Published Nov 26, 2024 β’ 84
Improving Vision-Language-Action Model with Online Reinforcement Learning Paper β’ 2501.16664 β’ Published Jan 28 β’ 1
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper β’ 2503.16365 β’ Published 4 days ago β’ 33
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper β’ 2503.12605 β’ Published 8 days ago β’ 28
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper β’ 2501.16764 β’ Published Jan 28 β’ 22
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper β’ 2501.05874 β’ Published Jan 10 β’ 69
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper β’ 2501.06186 β’ Published Jan 10 β’ 62
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper β’ 2501.04689 β’ Published Jan 8 β’ 17 β’ 5
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 265 β’ 42
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 265
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper β’ 2501.04689 β’ Published Jan 8 β’ 17 β’ 5
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper β’ 2501.04689 β’ Published Jan 8 β’ 17
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization Paper β’ 2501.03271 β’ Published Jan 5 β’ 11
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper β’ 2501.00599 β’ Published Dec 31, 2024 β’ 44
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper β’ 2412.18176 β’ Published Dec 24, 2024 β’ 15
Health AI Developer Foundations (HAI-DEF) Collection Groups models released for use in health AI by Google. Read more about HAI-DEF at https://developers.google.com/health-ai-developer-foundations β’ 4 items β’ Updated 7 days ago β’ 29
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper β’ 2412.04440 β’ Published Dec 5, 2024 β’ 20
view post Post 1339 π Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.Sharing these new additions with the links in case itβs helpful:- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL- GOT-OCR integration stepfun-ai/GOT_official_online_demo- HTML to Markdown converter maxiw/HTML-to-Markdown- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasetsThere's a lot of potential here for journalism and beyond. Give these a try and let me know what you build! You can also add your favorite ones if you're part of the community!Check it out: https://huggingface.co/JournalistsonHF#AIforJournalism #HuggingFace #OpenSourceAI π 5 5 π 4 4 + Reply
ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning Paper β’ 2407.20806 β’ Published Jul 30, 2024 β’ 1