Lin Z

linz

AI & ML interests

None yet

Recent Activity

Organizations

MLX Community's profile picture

linz's activity

reacted to hexgrad's post with πŸ”₯ 2 days ago
view post
Post
2872
Happy New Year! πŸŒƒ af_sky landed in Kokoro, along with an article: hexgrad/Kokoro-82M
  • 2 replies
Β·
reacted to hexgrad's post with πŸš€πŸ”₯ 2 days ago
view post
Post
5745
πŸ“£ Looking for labeled, high-quality synthetic audio/TTS data πŸ“£ Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. ❀️

More details at hexgrad/Kokoro-82M#21
Β·
reacted to hexgrad's post with πŸ”₯ 2 days ago
view post
Post
2872
Happy New Year! πŸŒƒ af_sky landed in Kokoro, along with an article: hexgrad/Kokoro-82M
  • 2 replies
Β·
reacted to hexgrad's post with πŸ‘πŸ€— 2 days ago
view post
Post
3108
Tonight, Adam & Michael join the 82M Apache TTS model in hexgrad/Kokoro-82M
reacted to hexgrad's post with ❀️πŸ”₯ 2 days ago
view post
Post
3949
Merry Christmas! πŸŽ„ Open sourced a small TTS model at hexgrad/Kokoro-82M
  • 2 replies
Β·
reacted to hexgrad's post with πŸ”₯πŸš€ 2 days ago
view post
Post
1087
πŸš€ Shipmas Day 2.5 πŸš€ Kokoro v0.22 packs 5 languages in 82M params! πŸ‡ΊπŸ‡ΈπŸ‡¬πŸ‡§πŸ‡«πŸ‡·πŸ‡―πŸ‡΅πŸ‡°πŸ‡·πŸ‡¨πŸ‡³ hexgrad/Kokoro-TTS

Feedback appreciated, both positive or negative. Non-English languages haven't been validated by the model creator(s), so if you're a native speaker, criticize away!

γ€Œγ‚³γ‚³γƒ­γƒ†γ‚£γƒΌγƒ†γ‚£γƒΌγ‚¨γ‚Ήγ―γ€θ‹±θͺžγ¨ζ—₯本θͺžγ«εŠ γˆγ¦γ€δΈ­ε›½θͺžγ€ιŸ“ε›½θͺžγ€γƒ•γƒ©γƒ³γ‚Ήθͺžγ‚’θ©±γ™γ“γ¨γŒγ§γγ‚‹γ‚ˆγ†γ«γͺγ‚ŠγΎγ—γŸγ€‚γ€

Wav converted to mp4 using FFmpeg, since audio attachments aren't allowed in Posts. You may have to unmute the video.
reacted to hexgrad's post with πŸ”₯ 2 days ago
view post
Post
2945
self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params
Β·
reacted to fdaudens's post with πŸ‘ 2 days ago
view post
Post
1042
The rapid progress in small audio models is mind-blowing! 🀯 Just tested OuteTTS v0.2 - cloned my voice from a 10s clip with impressive accuracy and natural prosody.

At 500M parameters, it's efficient enough to run on basic hardware but powerful enough for professional use.

This could transform how we produce audio content for new - think instant translated interviews keeping original voices, or scaled audio article production!

Demo and Model on the Hub: OuteAI/OuteTTS-0.2-500M h/t @reach-vb
  • 3 replies
Β·
reacted to hexgrad's post with πŸ”₯πŸ‘ 2 days ago
view post
Post
1718
hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! πŸ”₯

Read more and listen to before/after audio samples at https://hf.co/blog/hexgrad/kokoro-short-burst-upgrade

(Probably would have made that Article a Post instead, if audio could be embedded into Posts.)
  • 2 replies
Β·
reacted to hexgrad's post with πŸ”₯ 2 days ago
reacted to Pendrokar's post with πŸ‘β€οΈ 2 days ago
view post
Post
525
TTS: Sorry, I just cannot get the hype behind F5 TTS. It has now gathered a thousand votes in the TTS Arena fork and **has remained in #8 spot** against the _mostly_ Open TTS adversaries.

The voice sample used is the same as XTTS. F5 has so far been unstable, being unemotional/monotone/depressed and mispronouncing words (_awestruck_).

If you have suggestions please give feedback in the following thread:
mrfakename/E2-F5-TTS#32
Β·