AdamCodd/GPTSoVITS-JP-tts

[04/25/2024 Update]

The model has been finetuned on GPTSoVITS v4. All the previous issues with pitch have been corrected.

Reference audio: https://voca.ro/1fUSK3EpWaC3 音声メッセージが既存のウェブサイトを超えたコミュニケーションを実現目で見るだけだったウェブサイトに

Sample text used: 法律とは、人生を一瞬の詩に変へてしまはうとする欲求を、不断に妨げてゐる何ものかの集積だ。血しぶきを以て描く一行の詩と、人生とを引き換へにすることを、万人にゆるすのはたしかに穏当ではない。しかし内に雄心を持たぬ大多数の人は、そんな欲求を少しも知らないで人生を送るのだ。だとすれば、法律とは、本来ごく少数者のためのものなのだ。

Base model (zero-shot): https://voca.ro/1lL9bhC8DAup
Finetuned model: https://voca.ro/14d6S8utv01i

The same seed and settings have been used for both the base model and finetuned model.

This is a finetune on a subset of the moespeech's JP dataset. The models (GPT and VITS) are made to run with GPT-SoVITS. I used 6 hours of audio for the training. The selected audio samples were categorized into frequency bands (100–500 Hz) with 50 Hz intervals. Each band received equal representation in the final dataset to ensure the model learns from a diverse range of voice frequencies. Samples outside the 3–10 second range were discarded due to GPT-SoVITS limitations.

The model is proficient in Japanese only. Compared to the base model from GPT-SoVITS, the inflections are much more natural, including laughing, sighing, and other nuances.

The license is cc-by-nc-nd-4.0.

AdamCodd
/

GPTSoVITS-JP-tts

Dataset used to train AdamCodd/GPTSoVITS-JP-tts

Collection including AdamCodd/GPTSoVITS-JP-tts

Audio