Collections
Discover the best community collections!
Collections including paper arxiv:2212.11685
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 19
-
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 4 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 25
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 14 -
Attention Is All You Need
Paper • 1706.03762 • Published • 44
-
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Paper • 2210.17432 • Published • 1 -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Paper • 2305.08379 • Published • 1 -
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Paper • 2308.12219 • Published • 1 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 69
-
Language Modeling Is Compression
Paper • 2309.10668 • Published • 82 -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 19 -
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Paper • 2309.15129 • Published • 6 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 77