view article Article Gotchas in Tokenizer Behavior Every Developer Should Know By qgallouedec • Apr 18 • 40
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper • 2506.00338 • Published May 31 • 10
OWSM-CTC: Ultra-Fast Speech Foundation Models Collection CTC-based models from the OWSM project, designed for fast non-autoregressive inference: https://www.wavlab.org/activities/2024/owsm/ • 2 items • Updated Mar 8 • 1
OWSM: Fully Open Speech Recognition and Translation Models Collection A collection of models related to the Open Whisper-style Speech Models (OWSM) project from CMU: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated Mar 8 • 2
E-Branchformer: Branchformer with Enhanced merging for speech recognition Paper • 2210.00077 • Published Sep 30, 2022 • 2
Open Whisper-style Speech Models (OWSM) Collection Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated Jun 3 • 6
Magpie-Llama3.1 Datasets Collection Dataset built with Meta Llama 3.1 70B. • 6 items • Updated Jan 13 • 4
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Paper • 2402.12654 • Published Feb 20, 2024 • 1
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper • 2401.16658 • Published Jan 30, 2024 • 14