edwko's picture
Update README.md
b7d9437 verified
|
raw
history blame
663 Bytes
metadata
license: mit

This is a streamlined interface version of WavTokenizer-large-speech-75token, providing a clean, efficient way to interact with the model through separate encoder and decoder components.

  • Reduced model size from 1.75GB to ~330MB by keeping only necessary components for inference
  • Split interface (82MB encoder, 248MB decoder)
  • Simplified integration with just one .py file

The model is split into:

  • encoder/: Handles audio encoding
  • decoder/: Handles decoding and synthesis