File size: 3,223 Bytes
9a3c6c1 d8955a4 3867c63 9a3c6c1 e341838 9a3c6c1 e341838 88b950f 3867c63 88b950f 9a3c6c1 88b950f 9a3c6c1 88b950f e5878d9 ec04474 e5878d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: mit
datasets:
- openslr/librispeech_asr
language:
- en
pipeline_tag: automatic-speech-recognition
---
# Splitformer
<div align="center" style="line-height: 1;">
<a href="https://github.com/augustgw/early-exit-transformer" target="_blank" style="margin: 2px;">
<img alt="GitHub" src="https://img.shields.io/badge/GitHub-Splitformer-181717?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://www.arxiv.org/abs/2506.18035" target="_blank" style="margin: 2px;">
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2506.18035-B31B1B?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
## 1. Overview
**Splitformer** is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the **LibriSpeech dataset** with an **early‐exit objective**.
This architecture introduces **parallel downsampling layers** before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed.
Our code for training and inference is available on our [GitHub](https://github.com/augustgw/early-exit-transformer) repository.
### 2. Results on LibriSpeech
<table>
<thead>
<tr>
<th rowspan="2">Layer</th>
<th colspan="2">EE-baseline (31.5M)</th>
<th colspan="2">Splitformer (36.7M)</th>
<th colspan="2">Wav2Vec2 (94.0M)</th>
<th colspan="2">WavLM (94.7M)</th>
</tr>
<tr>
<th>test-clean</th>
<th>test-other</th>
<th>test-clean</th>
<th>test-other</th>
<th>test-clean</th>
<th>test-other</th>
<th>test-clean</th>
<th>test-other</th>
</tr>
</thead>
<tbody>
<tr>
<td>2</td>
<td>31.0</td>
<td>51.0</td>
<td>28.1</td>
<td>48.3</td>
<td>33.7</td>
<td>56.0</td>
<td>28.0</td>
<td>48.5</td>
</tr>
<tr>
<td>4</td>
<td>11.7</td>
<td>27.8</td>
<td>10.8</td>
<td>26.4</td>
<td>17.4</td>
<td>36.7</td>
<td>13.9</td>
<td>27.3</td>
</tr>
<tr>
<td>6</td>
<td>7.1</td>
<td>19.8</td>
<td>6.7</td>
<td>19.2</td>
<td>9.6</td>
<td>23.7</td>
<td>8.7</td>
<td>18.4</td>
</tr>
<tr>
<td>8</td>
<td>5.8</td>
<td>16.6</td>
<td>5.5</td>
<td>16.3</td>
<td>5.8</td>
<td>15.9</td>
<td>4.8</td>
<td>12.4</td>
</tr>
<tr>
<td>10</td>
<td>5.3</td>
<td>15.3</td>
<td>5.1</td>
<td>15.1</td>
<td>4.5</td>
<td>12.6</td>
<td>4.0</td>
<td>9.5</td>
</tr>
<tr>
<td>12</td>
<td>5.1</td>
<td>14.8</td>
<td>4.8</td>
<td>14.7</td>
<td>4.3</td>
<td>12.2</td>
<td>3.6</td>
<td>8.8</td>
</tr>
</tbody>
</table>
## 3. Citation
```bibtex
@misc{lasbordes2025splitformer,
title={Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices},
author={Maxence Lasbordes, Daniele Falavigna and Alessio Brutti},
year={2025},
note={Proc. of EUSIPCO 2025},
}
|