LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models Paper • 2411.06839 • Published Nov 11 • 1
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models Paper • 2411.06839 • Published Nov 11 • 1
LLM-Neo Collection Model hub for LLM-Neo, including Llama3.1-Neo-1B-100w and Minitron-4B-Depth-Neo-10w. • 3 items • Updated Nov 20 • 4
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 53
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities Paper • 2212.06385 • Published Dec 13, 2022
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer Paper • 2304.05659 • Published Apr 12, 2023
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Paper • 2405.14507 • Published May 23
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Paper • 2404.02657 • Published Apr 3
Weight-Inherited Distillation for Task-Agnostic BERT Compression Paper • 2305.09098 • Published May 16, 2023
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published Jun 14 • 54