CogView: Mastering Text-to-Image Generation via Transformers Paper • 2105.13290 • Published May 26, 2021
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations Paper • 2402.04236 • Published Feb 6, 2024 • 8
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis Paper • 2309.03350 • Published Sep 4, 2023
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers Paper • 2204.14217 • Published Apr 28, 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers Paper • 2205.15868 • Published May 29, 2022 • 1
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer Paper • 2405.04312 • Published May 7, 2024 • 1
LVBench: An Extreme Long Video Understanding Benchmark Paper • 2406.08035 • Published Jun 12, 2024 • 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 226
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29, 2024 • 58
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 40