Kyle Tuft's picture

Kyle Tuft

Chilangosta

·

AI & ML interests

None yet

Recent Activity

liked a Space 1 day ago

Qwen/Qwen2.5-VL-32B-Instruct

liked a model 2 days ago

ydeng9/OpenVLThinker-7B

upvoted a paper 2 days ago

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

View all activity

Organizations

None yet

Chilangosta's activity

upvoted a paper 2 days ago

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published 6 days ago • 32

upvoted a paper 8 days ago

VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published 12 days ago • 20

upvoted 2 papers 9 days ago

TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published 14 days ago • 42

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 75

upvoted 2 papers 16 days ago

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published 17 days ago • 26

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Paper • 2503.05639 • Published 19 days ago • 22

upvoted 2 papers 18 days ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published 21 days ago • 38

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 21 days ago • 85

upvoted an article 20 days ago

Article

Remote VAEs for decoding with HF endpoints 🤗

about 1 month ago

• 37

upvoted a paper 22 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 23 days ago • 77

upvoted a paper 29 days ago

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

Paper • 2502.17258 • Published about 1 month ago • 77

upvoted 9 papers about 1 month ago

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Paper • 2502.12146 • Published Feb 17 • 16

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Paper • 2502.08690 • Published Feb 12 • 41

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 32

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

Paper • 2502.08639 • Published Feb 12 • 40

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published Feb 11 • 29

Dual Caption Preference Optimization for Diffusion Models

Paper • 2502.06023 • Published Feb 9 • 9

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Paper • 2502.05179 • Published Feb 7 • 24

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 64

Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6 • 50