Spaces:

aletrn
/

ai-pronunciation-trainer

Running

App Files Files Community

alessandro trinca tornidor commited on Mar 7

Commit

187549a

1 Parent(s): 595d5ff

doc: update README.md

Browse files

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🎤
 colorFrom: red
 colorTo: blue
 sdk: gradio
-sdk_version: 5.20.0
 app_file: app.py
 pinned: false
 license: mit
@@ -22,7 +22,7 @@ My [HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-tra
 ## Installation
 To run the program locally, you need to install the requirements and run the main python file.
-These commands assume you have an active virtualenv (locally I'm using python 3.12, on HuggingFace the gradio SDK - version 5.20.0 at the moment - uses python 3.10):
 ```bash
 pip install -r requirements.txt
@@ -42,7 +42,7 @@ Currently the best way to exec the project is using the Gradio frontend:
 python app.py
 ```
-I upgraded the old custom frontend ([email protected], [email protected]) and backend (pytorch==2.6.0, torchaudio==2.6.0) libraries. On macOS intel it's possible to install from [pypi.org](https://pypi.org/project/torch/) only until the library version [2.2.2](https://pypi.org/project/torch/2.2.2/)
 (see [this github issue](https://github.com/instructlab/instructlab/issues/1469) and [this deprecation notice](https://dev-discuss.pytorch.org/t/pytorch-macos-x86-builds-deprecation-starting-january-2024/1690)).
 In case of missing TTS voices needed by the Text-to-Speech in-browser SpeechSynthesis feature (e.g. on Windows 11 you need to install manually the TTS voices for the languages you need), right now the Gradio frontend raises an alert message with a JavaScript message.
@@ -122,16 +122,16 @@ pnpm playwright test --workers 1 --retries 4 --project=chromium
 - Upgraded Speech-to-Text German [Silero](https://github.com/snakers4/silero-models) model that blocked the upgrade to PyTorch > 2.x
 - Upgraded PyTorch > 2.x
 - Improved backend tests with the [mutation test suite](https://en.wikipedia.org/wiki/Mutation_testing) [Cosmic Ray](https://cosmic-ray.readthedocs.io)
-- E2E [playwright](https://playwright.dev) tests
-- Added a new frontend based on [Gradio](https://gradio.app)
-- add an updated online version ([HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer))
-- Only on the Gradio frontend version - it's possible to insert custom sentences to read and evaluate
-- Gradio frontend version - play the isolated words in the recordings, to compare the 'ideal' pronunciation with the learner pronunciation
-- Gradio frontend version - re-added the Text-to-Speech in-browser (it works only if there are installed the required language packages. In case of failures there is the backend Text-to-Speech feature)
 - Fixed a [bug](https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14) with [whisper](https://huggingface.co/docs/transformers/model_doc/whisper) not properly transcribing the end timestamp for the last word in the recorded audio (in the end I solved it switching to [whisper python pip package](https://pypi.org/project/openai-whisper/))
 - Added [faster whisper](https://pypi.org/project/faster-whisper/) model support:
-    - it avoids `None` values on `end_ts` timestamps for the last elements, unlike the HuggingFace Whisper's output
-    - it uses silero-vad to detect long silences within the audio
 ### TODO

 colorFrom: red
 colorTo: blue
 sdk: gradio
+sdk_version: 5.18.0
 app_file: app.py
 pinned: false
 license: mit
 ## Installation
 To run the program locally, you need to install the requirements and run the main python file.
+These commands assume you have an active virtualenv (locally I'm using python 3.12, on HuggingFace the gradio SDK - version 5.6.0 at the moment - uses python 3.10):
 ```bash
 pip install -r requirements.txt
 python app.py
 ```
+I upgraded the old custom frontend ([email protected], [email protected]) and backend (pytorch==2.5.1, torchaudio==2.5.1) libraries. On macOS intel it's possible to install from [pypi.org](https://pypi.org/project/torch/) only until the library version [2.2.2](https://pypi.org/project/torch/2.2.2/)
 (see [this github issue](https://github.com/instructlab/instructlab/issues/1469) and [this deprecation notice](https://dev-discuss.pytorch.org/t/pytorch-macos-x86-builds-deprecation-starting-january-2024/1690)).
 In case of missing TTS voices needed by the Text-to-Speech in-browser SpeechSynthesis feature (e.g. on Windows 11 you need to install manually the TTS voices for the languages you need), right now the Gradio frontend raises an alert message with a JavaScript message.
 - Upgraded Speech-to-Text German [Silero](https://github.com/snakers4/silero-models) model that blocked the upgrade to PyTorch > 2.x
 - Upgraded PyTorch > 2.x
 - Improved backend tests with the [mutation test suite](https://en.wikipedia.org/wiki/Mutation_testing) [Cosmic Ray](https://cosmic-ray.readthedocs.io)
+- Added E2E [playwright](https://playwright.dev) tests
+- Added a new frontend based on [Gradio](https://gradio.app) with an updated online version ([HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer))
+- It's possible to insert custom sentences to read and evaluate
+- Play the isolated words in the recordings, to compare the 'ideal' pronunciation with the learner pronunciation
+- re-added the Text-to-Speech in-browser (it works only if there are installed the required language packages; in case of failures there is the backend Text-to-Speech feature - Gradio frontend version)
 - Fixed a [bug](https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14) with [whisper](https://huggingface.co/docs/transformers/model_doc/whisper) not properly transcribing the end timestamp for the last word in the recorded audio (in the end I solved it switching to [whisper python pip package](https://pypi.org/project/openai-whisper/))
 - Added [faster whisper](https://pypi.org/project/faster-whisper/) model support:
+  - it avoids `None` values on `end_ts` timestamps for the last elements, unlike the HuggingFace Whisper's output
+  - it uses silero-vad to detect long silences within the audio
+- webApp frontend - improved css on mobile devices
 ### TODO