Splitformer

1. Overview

Splitformer is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the LibriSpeech dataset with an early‐exit objective.

This architecture introduces parallel downsampling layers before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed.

Our code for training and inference is available on our GitHub repository.

2. Results on LibriSpeech

Layer EE-baseline (31.5M) Splitformer (36.7M) Wav2Vec2 (94.0M) WavLM (94.7M)
test-clean test-other test-clean test-other test-clean test-other test-clean test-other
2 31.0 51.0 28.1 48.3 33.7 56.0 28.0 48.5
4 11.7 27.8 10.8 26.4 17.4 36.7 13.9 27.3
6 7.1 19.8 6.7 19.2 9.6 23.7 8.7 18.4
8 5.8 16.6 5.5 16.3 5.8 15.9 4.8 12.4
10 5.3 15.3 5.1 15.1 4.5 12.6 4.0 9.5
12 5.1 14.8 4.8 14.7 4.3 12.2 3.6 8.8

3. Citation

@misc{lasbordes2025splitformer,
      title={Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices}, 
      author={Maxence Lasbordes, Daniele Falavigna and Alessio Brutti},
      year={2025},
      note={Proc. of EUSIPCO 2025},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MaxLSB/Splitformer