Collections
Discover the best community collections!
Collections including paper arxiv:2412.09871
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 95 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 24 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 23 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 80
-
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Paper • 2403.12943 • Published • 15 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 127 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 53 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 68
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Formal Perspective on Byte-Pair Encoding
Paper • 2306.16837 • Published • 3 -
Byte-Pair Encoding for Text-to-SQL Generation
Paper • 1910.08962 • Published • 2 -
Pattern Discovery in Time Series with Byte Pair Encoding
Paper • 2106.00614 • Published • 2
-
Functional Interpolation for Relative Positions Improves Long Context Transformers
Paper • 2310.04418 • Published • 4 -
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs
Paper • 2106.09997 • Published • 2 -
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15