A newer version of the Gradio SDK is available:
5.24.0
metadata
title: All In One Translation
emoji: π
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
short_description: Convert text/image/audio/video from src language to English
Liked the setup? Put a like on top left, it takes only 2 seconds.
Replication
- Requirements
- Free API Key from https://detectlanguage.com/ for automatic language detection from text.
- GPU for
Whisper
model inference. It's slower in CPU.
- Notes
pytesseract
library (For image-to-text) is easier to install in linux machines.- If you have GPU, you can go for more sophisticated image-to-text models.
- The image-to-text setup works best for non-decorative and normal sized fonts.
The space consists of 3-4 parts: -
- Text translator - Input (Input Text, Target language), Output (Translated text in target language, Source language name)
- Image translator - Input (Image with any text, Source language, Target language), Output (Image text in source language, Image text translated to target language)
- Audio translator - Input (Audio in any language, Model size, Target language), Output (Transcribed original text, Transcribed text translated to target language, Original language name)
- Video translator - Input (Video, Model size, Target language), Output (Translated text version of the audio) [Not yet implemented]
Demo
Text translator
Image translator
- Best works with simple fonts. Performance detoriates with decorative fonts.
- For now, you have to choose the language, choosing "English" can work for almost all Latin-script languages like (Spanish, Romanian etc.)
- Using
pytesseract
model for image-to-text conversion. It's installation is a bit complicated. Follow this link for installation
Audio translator
- Since I am on a free-tier space, the inference takes a lot of time (1000 seconds for 10 seconds of audio)
- If one has HuggingFace pro, he/she can get a GPU and get reasonable inference time. But for now, this is just a demo.
- If you have an OpenAPI key, you can use whisper speech-to-text model via API call. But since I don't have it, I used the whisper library method, where you have to take care of the inference hardware yourself.
- Here is a 10 seconds translation of the famous Russian song Kukushka
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference