Text-to-Speech
ONNX
English

i don't like phonemizer, can we do away with it?

#67
by ctranslate2-4you - opened

The phonemizer library, which kokoro.py relies on uses this the phonemizer library. Can you not use it? It requires a lot of dependencies as follows:

phonemizer 3.3.0 requires dlinfo
phonemizer 3.3.0 requires segments

segments 2.2.1 requires clldutils
segments 2.2.1 requires csvw

clldutils 3.24.0 requires bibtexparser
clldutils 3.24.0 requires colorlog
clldutils 3.24.0 requires pylatexenc

csvw 3.5.1 requires babel
csvw 3.5.1 requires isodate
csvw 3.5.1 requires jsonschema
csvw 3.5.1 requires language-tags
csvw 3.5.1 requires rdflib
csvw 3.5.1 requires rfc3986
csvw 3.5.1 requires uritemplate

jsonschema 4.23.0 requires jsonschema-specifications
jsonschema 4.23.0 requires referencing
jsonschema 4.23.0 requires rpds-py

I propose something like this these modifications to kokoro.py script:

Add this:

def direct_espeak(text, lang='en-us'):
    cmd = ['espeak-ng', '-q', '--ipa', '-v', lang, text]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout.strip()

Modify to this:

def phonemize(text, lang, norm=True):
    if norm:
        text = normalize_text(text)
    
    # Map the 'a'/'b' language codes to espeak language codes
    espeak_lang = 'en-us' if lang == 'a' else 'en-gb'
    
    # Get phonemes from espeak
    ps = direct_espeak(text, espeak_lang)
    if not ps:
        return ''
        
    # Apply the same post-processing
    ps = ps.replace('ʲ', 'j').replace('r', 'ɹ').replace('x', 'k').replace('ɬ', 'l')
    ps = re.sub(r'(?<=[a-zɹː])(?=hˈʌndɹɪd)', ' ', ps)
    ps = re.sub(r' z(?=[;:,.!?¡¿—…"«»"" ]|$)', 'z', ps)
    if lang == 'a':
        ps = re.sub(r'(?<=nˈaɪn)ti(?!ː)', 'di', ps)
    ps = ''.join(filter(lambda p: p in VOCAB, ps))
    return ps.strip()

You'd only need to ensue that espeak-ng is in the system's PATH or, better yet, bundle the library .exe and/or .dll files (these particular files are for windows obviously).

Sign up or log in to comment