Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09871

Papers - Multilingual - Benchmarks

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2, 2024 • 23
ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 95

Papers - Multilingual - Encoders - BPE

Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 15
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 95

Papers - Embeddings

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29, 2024 • 48
Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 1
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 95
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Paper • 2410.20771 • Published Oct 28, 2024 • 3

LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 24
Garment3DGen: 3D Garment Stylization and Texture Generation

Paper • 2403.18816 • Published Mar 27, 2024 • 23
EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26, 2024 • 12
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 80

Papers - Attention - Cross

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Paper • 2403.12943 • Published Mar 19, 2024 • 15
Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 43
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 127
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 53
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 14
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 68

Papers - Encoder - Byte-Pair Encoding

Neural Machine Translation of Rare Words with Subword Units

Paper • 1508.07909 • Published Aug 31, 2015 • 4
A Formal Perspective on Byte-Pair Encoding

Paper • 2306.16837 • Published Jun 29, 2023 • 3
Byte-Pair Encoding for Text-to-SQL Generation

Paper • 1910.08962 • Published Oct 20, 2019 • 2
Pattern Discovery in Time Series with Byte Pair Encoding

Paper • 2106.00614 • Published May 30, 2021 • 2

Papers - Chinchilla

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 95

Papers - Encoders

Functional Interpolation for Relative Positions Improves Long Context Transformers

Paper • 2310.04418 • Published Oct 6, 2023 • 4
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

Paper • 2106.09997 • Published Jun 18, 2021 • 2
Neural Machine Translation of Rare Words with Subword Units

Paper • 1508.07909 • Published Aug 31, 2015 • 4
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21, 2024 • 2

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 15

Previous
1
...
6
7
8
9
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs