ArthurZ
/

encodec_24khz

Feature Extraction

Transformers

PyTorch

encodec

Model card Files Files and versions Community

ArthurZ HF staff commited on Jun 15, 2023

Commit

ec73fe3

1 Parent(s): be02c2a

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -4

README.md CHANGED Viewed

@@ -118,15 +118,22 @@ Nachmani et al., 2020; Chazan et al., 2021).
 ### Results
-The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps). Figure 3 provides an overview of the streamable setup results, while Table 1 offers a category-wise breakdown. Although alternative quantizers such as Gumbel-Softmax and DiffQ were explored, their preliminary results did not surpass or match the performance of EnCodec, so they are not included in the report.
-When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score. Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps. Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%. For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps.
-Furthermore, it is observed that as the bandwidth increases, the compression ratio decreases. This behavior can be attributed to the small size of the Transformer model used, which makes it challenging to effectively model all codebooks together.
 #### Summary
-EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths. The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and objective results (Figure 3 and Table 4). Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality. Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights. Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising quality, particularly in applications where low latency is not critical (e.g., music streaming).
 ## Citation

 ### Results
+The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps).
+When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score.
+Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps.
+Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%.
+For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps.
 #### Summary
+EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths.
+The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and
+objective results. Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality.
+Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights.
+Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising
+quality, particularly in applications where low latency is not critical (e.g., music streaming).
 ## Citation