liu's picture

39 114

liu

zhaocheng

·

AI & ML interests

None yet

Organizations

None yet

zhaocheng's activity

upvoted 2 papers about 1 month ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published Sep 10 • 55

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12 • 42

upvoted 3 papers 2 months ago

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 154

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 75

upvoted 8 papers 3 months ago

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

Paper • 2407.13301 • Published Jul 18 • 54

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23 • 23

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Visual Text Generation in the Wild

Paper • 2407.14138 • Published Jul 19 • 8

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 81

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17 • 49

upvoted 20 papers 4 months ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 94

RegMix: Data Mixture as Regression for Language Model Pre-training

Paper • 2407.01492 • Published Jul 1 • 33

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1 • 76

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27 • 51

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 57

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 29

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 51

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14 • 19

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13 • 13

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11 • 52

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Paper • 2406.07546 • Published Jun 11 • 8

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12 • 28

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Paper • 2406.08487 • Published Jun 12 • 11

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 65

upvoted a collection 5 months ago

Awesome SFT datasets

A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12 • 115

upvoted a paper 5 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 182

upvoted 2 papers 9 months ago

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 56

upvoted a collection 9 months ago

Tulu V2 Suite

The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated 24 days ago • 43

upvoted a paper 9 months ago

Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11 • 34