i don't like phonemizer, can we do away with it?
The phonemizer library, which kokoro.py
relies on uses this the phonemizer library. Can you not use it? It requires a lot of dependencies as follows:
phonemizer 3.3.0 requires dlinfo
phonemizer 3.3.0 requires segments
segments 2.2.1 requires clldutils
segments 2.2.1 requires csvw
clldutils 3.24.0 requires bibtexparser
clldutils 3.24.0 requires colorlog
clldutils 3.24.0 requires pylatexenc
csvw 3.5.1 requires babel
csvw 3.5.1 requires isodate
csvw 3.5.1 requires jsonschema
csvw 3.5.1 requires language-tags
csvw 3.5.1 requires rdflib
csvw 3.5.1 requires rfc3986
csvw 3.5.1 requires uritemplate
jsonschema 4.23.0 requires jsonschema-specifications
jsonschema 4.23.0 requires referencing
jsonschema 4.23.0 requires rpds-py
I propose something like this these modifications to kokoro.py
script:
Add this:
def direct_espeak(text, lang='en-us'):
cmd = ['espeak-ng', '-q', '--ipa', '-v', lang, text]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout.strip()
Modify to this:
def phonemize(text, lang, norm=True):
if norm:
text = normalize_text(text)
# Map the 'a'/'b' language codes to espeak language codes
espeak_lang = 'en-us' if lang == 'a' else 'en-gb'
# Get phonemes from espeak
ps = direct_espeak(text, espeak_lang)
if not ps:
return ''
# Apply the same post-processing
ps = ps.replace('ʲ', 'j').replace('r', 'ɹ').replace('x', 'k').replace('ɬ', 'l')
ps = re.sub(r'(?<=[a-zɹː])(?=hˈʌndɹɪd)', ' ', ps)
ps = re.sub(r' z(?=[;:,.!?¡¿—…"«»"" ]|$)', 'z', ps)
if lang == 'a':
ps = re.sub(r'(?<=nˈaɪn)ti(?!ː)', 'di', ps)
ps = ''.join(filter(lambda p: p in VOCAB, ps))
return ps.strip()
You'd only need to ensue that espeak-ng is in the system's PATH or, better yet, bundle the library .exe and/or .dll files (these particular files are for windows obviously).