Yongxin Guo's picture

Yongxin Guo

Yongxin-Guo

·

https://gyxxyg.github.io/yongxinguo/

gyxxyg

AI & ML interests

None yet

Recent Activity

new activity about 17 hours ago

Yongxin-Guo/TRACE:Missing ${SPLIT}.caption_coco_format.json in dense_video_caption/ActivityNet_Captions

updated a dataset about 17 hours ago

Yongxin-Guo/TRACE

upvoted a paper 1 day ago

Parallelized Autoregressive Visual Generation

View all activity

Organizations

Yongxin-Guo's activity

upvoted a paper 1 day ago

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published 6 days ago • 44

upvoted 6 papers 5 days ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published 8 days ago • 87

Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 7 days ago • 13

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 7 days ago • 103

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published 6 days ago • 45

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published 6 days ago • 66

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 6 days ago • 327

upvoted a paper 8 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 13 days ago • 74

upvoted 7 papers 9 days ago

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published 13 days ago • 84

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published 12 days ago • 7

Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published 14 days ago • 7

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 19 days ago • 121

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published 15 days ago • 49

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 13 days ago • 90

Phi-4 Technical Report

Paper • 2412.08905 • Published 14 days ago • 92

upvoted 3 papers about 2 months ago

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25 • 82

Can Knowledge Editing Really Correct Hallucinations?

Paper • 2410.16251 • Published Oct 21 • 54

LOGO -- Long cOntext aliGnment via efficient preference Optimization

Paper • 2410.18533 • Published Oct 24 • 42

upvoted 2 papers 2 months ago

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17 • 31

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17 • 74