3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper β’ 2412.07825 β’ Published 15 days ago β’ 12
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper β’ 2410.13754 β’ Published Oct 17 β’ 74
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18 β’ 224
Marqo-FashionCLIP and Marqo-FashionSigLIP Collection SOTA multimodal models for fashion product embeddings -> https://github.com/marqo-ai/marqo-FashionCLIP/ β’ 11 items β’ Updated 13 days ago β’ 9
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper β’ 2408.16768 β’ Published Aug 29 β’ 26
CogVLM2: Visual Language Models for Image and Video Understanding Paper β’ 2408.16500 β’ Published Aug 29 β’ 56
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper β’ 2408.15881 β’ Published Aug 28 β’ 21
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper β’ 2408.08872 β’ Published Aug 16 β’ 98