RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
Paper
•
2505.03005
•
Published
•
35
hugging face space, for rwkv-x related developments, including build assets, etc Nothing in here are considered "official releases" AKA - we treat this as a giant file dump