Jialiang Cheng

Julius-L

AI & ML interests

None yet

Recent Activity

upvoted a collection 24 days ago

Deepseek Papers

updated a collection about 2 months ago

multimodal dataset

updated a collection about 2 months ago

multimodal dataset

View all activity

Organizations

None yet

Julius-L's activity

upvoted a collection 24 days ago

Deepseek Papers

Collection

Deepseek papers collection • 18 items • Updated 24 days ago • 168

upvoted a paper about 2 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 48

upvoted 3 papers 4 months ago

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Paper • 2410.20650 • Published Oct 28, 2024 • 17

A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 40

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Paper • 2410.19313 • Published Oct 25, 2024 • 19

upvoted 15 papers 5 months ago

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

Paper • 2408.07666 • Published Aug 14, 2024 • 2

Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23, 2024 • 14

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 22

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

What Matters for Model Merging at Scale?

Paper • 2410.03617 • Published Oct 4, 2024 • 8

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 48

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published Sep 30, 2024 • 56

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26, 2024 • 47

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25, 2024 • 28