Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2212.11685

Papers - Text - Pre-training - Synthetic Noise

Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2

Papers - Training - Synthetic Noise

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Paper • 2002.08155 • Published Feb 19, 2020 • 2
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2
ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21 • 19

Papers - Text - Pre-training - Research

Pretraining-Based Natural Language Generation for Text Summarization

Paper • 1902.09243 • Published Feb 25, 2019 • 2
Learning to Reason and Memorize with Self-Notes

Paper • 2305.00833 • Published May 1, 2023 • 4
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Paper • 2408.16293 • Published Aug 29 • 25

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 9
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 14
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44

SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

Paper • 2210.17432 • Published Oct 31, 2022 • 1
TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Paper • 2305.08379 • Published May 15, 2023 • 1
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

Paper • 2308.12219 • Published Aug 23, 2023 • 1
CodeFusion: A Pre-trained Diffusion Model for Code Generation

Paper • 2310.17680 • Published Oct 26, 2023 • 69

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 19
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 6
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs