SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22, 2024 • 7
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings Paper • 2406.19223 • Published Jun 27, 2024 • 11
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published 25 days ago • 26
Foundation Text-Generation Models Below 360M Parameters Collection Great candidates for fine-tuning targeting Wllama and Transformers.js for mobile devices, ordered by number of parameters. • 35 items • Updated 1 day ago • 29
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models Paper • 2503.08686 • Published 5 days ago • 16
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 60
Cheems: Wonderful Matrices More Efficient and More Effective Architecture Paper • 2407.16958 • Published Jul 24, 2024 • 4
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Paper • 2412.11834 • Published Dec 16, 2024 • 8
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, many of these are outdated. • 20 items • Updated 10 days ago • 50
Sparse Autoencoders Collection SAEs are tools for understanding the internal representations of neural networks. These can be loaded using https://github.com/EleutherAI/sae • 9 items • Updated 19 days ago • 3
Pythia Scaling Suite Collection Pythia is the first LLM suite designed specifically to enable scientific research on LLMs. To learn more see https://github.com/EleutherAI/pythia • 18 items • Updated 19 days ago • 29
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published 20 days ago • 19