Update README.md
Browse files
README.md
CHANGED
@@ -118,15 +118,22 @@ Nachmani et al., 2020; Chazan et al., 2021).
|
|
118 |
|
119 |
### Results
|
120 |
|
121 |
-
The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps).
|
122 |
|
123 |
-
When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score.
|
|
|
|
|
|
|
124 |
|
125 |
-
Furthermore, it is observed that as the bandwidth increases, the compression ratio decreases. This behavior can be attributed to the small size of the Transformer model used, which makes it challenging to effectively model all codebooks together.
|
126 |
|
127 |
#### Summary
|
128 |
|
129 |
-
EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths.
|
|
|
|
|
|
|
|
|
|
|
130 |
|
131 |
|
132 |
## Citation
|
|
|
118 |
|
119 |
### Results
|
120 |
|
121 |
+
The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps).
|
122 |
|
123 |
+
When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score.
|
124 |
+
Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps.
|
125 |
+
Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%.
|
126 |
+
For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps.
|
127 |
|
|
|
128 |
|
129 |
#### Summary
|
130 |
|
131 |
+
EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths.
|
132 |
+
The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and
|
133 |
+
objective results. Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality.
|
134 |
+
Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights.
|
135 |
+
Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising
|
136 |
+
quality, particularly in applications where low latency is not critical (e.g., music streaming).
|
137 |
|
138 |
|
139 |
## Citation
|