Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published Jan 17 • 1 • 2