2 224 7

Jaehyun Jun

btjhjeon

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Recent Activity

updated a collection about 20 hours ago

Multimodal Benchmarks

upvoted a paper about 20 hours ago

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

updated a collection about 20 hours ago

Multimodal Benchmarks

View all activity

Organizations

btjhjeon's activity

upvoted 3 papers about 20 hours ago

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

Paper • 2501.11858 • Published 5 days ago • 2

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 3 days ago • 18

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published 3 days ago • 17

upvoted a paper 3 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 4 days ago • 66

upvoted 3 papers 4 days ago

upvoted a paper 6 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published 10 days ago • 33

upvoted a paper 8 days ago

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published 10 days ago • 22

upvoted 4 papers 10 days ago

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published 12 days ago • 31

Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published 11 days ago • 10

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published 12 days ago • 7

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Paper • 2501.08828 • Published 11 days ago • 28

upvoted 2 papers 11 days ago

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published 12 days ago • 13

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Paper • 2501.08187 • Published 12 days ago • 24

upvoted a paper 12 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published 16 days ago • 40

upvoted 2 papers 13 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 16 days ago • 59

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published 17 days ago • 38

upvoted 2 papers 14 days ago

On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

Paper • 2501.04377 • Published 18 days ago • 14

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published 17 days ago • 37