2 282 134

Yuseung "Phillip" Lee

phillipinseoul

https://phillipinseoul.github.io/

phillipinseoul

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 3 days ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

upvoted a paper 3 days ago

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

upvoted a paper 5 days ago

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

View all activity

Organizations

phillipinseoul's activity

upvoted 2 papers 3 days ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published 3 days ago • 47

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Paper • 2503.09151 • Published 4 days ago • 28

upvoted 3 papers 5 days ago

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Paper • 2503.02199 • Published 12 days ago • 7

Should VLMs be Pre-trained with Image Data?

Paper • 2503.07603 • Published 5 days ago • 3

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published 6 days ago • 22

liked a Space 14 days ago

106

ViewCrafter

🐨

Create a video from an image with camera motion

upvoted a paper 18 days ago

RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

Paper • 2502.15894 • Published 22 days ago • 20

liked a model 21 days ago

Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • Updated 9 days ago • 310k • • 372

upvoted a paper 22 days ago

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Paper • 2502.14377 • Published 24 days ago • 12

upvoted 2 papers 23 days ago

Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above

Paper • 2502.14127 • Published 24 days ago • 2

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 24 days ago • 164

upvoted 2 papers 24 days ago

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Paper • 2502.11271 • Published 27 days ago • 16

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published 27 days ago • 52

liked a model 25 days ago

llava-hf/llama3-llava-next-8b-hf

Image-Text-to-Text • Updated Jan 27 • 10.7k • 36

upvoted 5 papers 27 days ago

Large Language Diffusion Models

Paper • 2502.09992 • Published 30 days ago • 103

ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

Paper • 2502.09696 • Published about 1 month ago • 39

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published 29 days ago • 52

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published about 1 month ago • 143

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Paper • 2502.08690 • Published Feb 12 • 41

upvoted a paper 30 days ago

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published about 1 month ago • 184