alessandro trinca tornidor commited on
Commit
187549a
·
1 Parent(s): 595d5ff

doc: update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: 🎤
4
  colorFrom: red
5
  colorTo: blue
6
  sdk: gradio
7
- sdk_version: 5.20.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
@@ -22,7 +22,7 @@ My [HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-tra
22
  ## Installation
23
 
24
  To run the program locally, you need to install the requirements and run the main python file.
25
- These commands assume you have an active virtualenv (locally I'm using python 3.12, on HuggingFace the gradio SDK - version 5.20.0 at the moment - uses python 3.10):
26
 
27
  ```bash
28
  pip install -r requirements.txt
@@ -42,7 +42,7 @@ Currently the best way to exec the project is using the Gradio frontend:
42
  python app.py
43
  ```
44
 
45
- I upgraded the old custom frontend ([email protected], [email protected]) and backend (pytorch==2.6.0, torchaudio==2.6.0) libraries. On macOS intel it's possible to install from [pypi.org](https://pypi.org/project/torch/) only until the library version [2.2.2](https://pypi.org/project/torch/2.2.2/)
46
  (see [this github issue](https://github.com/instructlab/instructlab/issues/1469) and [this deprecation notice](https://dev-discuss.pytorch.org/t/pytorch-macos-x86-builds-deprecation-starting-january-2024/1690)).
47
 
48
  In case of missing TTS voices needed by the Text-to-Speech in-browser SpeechSynthesis feature (e.g. on Windows 11 you need to install manually the TTS voices for the languages you need), right now the Gradio frontend raises an alert message with a JavaScript message.
@@ -122,16 +122,16 @@ pnpm playwright test --workers 1 --retries 4 --project=chromium
122
  - Upgraded Speech-to-Text German [Silero](https://github.com/snakers4/silero-models) model that blocked the upgrade to PyTorch > 2.x
123
  - Upgraded PyTorch > 2.x
124
  - Improved backend tests with the [mutation test suite](https://en.wikipedia.org/wiki/Mutation_testing) [Cosmic Ray](https://cosmic-ray.readthedocs.io)
125
- - E2E [playwright](https://playwright.dev) tests
126
- - Added a new frontend based on [Gradio](https://gradio.app)
127
- - add an updated online version ([HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer))
128
- - Only on the Gradio frontend version - it's possible to insert custom sentences to read and evaluate
129
- - Gradio frontend version - play the isolated words in the recordings, to compare the 'ideal' pronunciation with the learner pronunciation
130
- - Gradio frontend version - re-added the Text-to-Speech in-browser (it works only if there are installed the required language packages. In case of failures there is the backend Text-to-Speech feature)
131
  - Fixed a [bug](https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14) with [whisper](https://huggingface.co/docs/transformers/model_doc/whisper) not properly transcribing the end timestamp for the last word in the recorded audio (in the end I solved it switching to [whisper python pip package](https://pypi.org/project/openai-whisper/))
132
  - Added [faster whisper](https://pypi.org/project/faster-whisper/) model support:
133
- - it avoids `None` values on `end_ts` timestamps for the last elements, unlike the HuggingFace Whisper's output
134
- - it uses silero-vad to detect long silences within the audio
 
135
 
136
  ### TODO
137
 
 
4
  colorFrom: red
5
  colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 5.18.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
22
  ## Installation
23
 
24
  To run the program locally, you need to install the requirements and run the main python file.
25
+ These commands assume you have an active virtualenv (locally I'm using python 3.12, on HuggingFace the gradio SDK - version 5.6.0 at the moment - uses python 3.10):
26
 
27
  ```bash
28
  pip install -r requirements.txt
 
42
  python app.py
43
  ```
44
 
45
+ I upgraded the old custom frontend ([email protected], [email protected]) and backend (pytorch==2.5.1, torchaudio==2.5.1) libraries. On macOS intel it's possible to install from [pypi.org](https://pypi.org/project/torch/) only until the library version [2.2.2](https://pypi.org/project/torch/2.2.2/)
46
  (see [this github issue](https://github.com/instructlab/instructlab/issues/1469) and [this deprecation notice](https://dev-discuss.pytorch.org/t/pytorch-macos-x86-builds-deprecation-starting-january-2024/1690)).
47
 
48
  In case of missing TTS voices needed by the Text-to-Speech in-browser SpeechSynthesis feature (e.g. on Windows 11 you need to install manually the TTS voices for the languages you need), right now the Gradio frontend raises an alert message with a JavaScript message.
 
122
  - Upgraded Speech-to-Text German [Silero](https://github.com/snakers4/silero-models) model that blocked the upgrade to PyTorch > 2.x
123
  - Upgraded PyTorch > 2.x
124
  - Improved backend tests with the [mutation test suite](https://en.wikipedia.org/wiki/Mutation_testing) [Cosmic Ray](https://cosmic-ray.readthedocs.io)
125
+ - Added E2E [playwright](https://playwright.dev) tests
126
+ - Added a new frontend based on [Gradio](https://gradio.app) with an updated online version ([HuggingFace Space](https://huggingface.co/spaces/aletrn/ai-pronunciation-trainer))
127
+ - It's possible to insert custom sentences to read and evaluate
128
+ - Play the isolated words in the recordings, to compare the 'ideal' pronunciation with the learner pronunciation
129
+ - re-added the Text-to-Speech in-browser (it works only if there are installed the required language packages; in case of failures there is the backend Text-to-Speech feature - Gradio frontend version)
 
130
  - Fixed a [bug](https://github.com/Thiagohgl/ai-pronunciation-trainer/issues/14) with [whisper](https://huggingface.co/docs/transformers/model_doc/whisper) not properly transcribing the end timestamp for the last word in the recorded audio (in the end I solved it switching to [whisper python pip package](https://pypi.org/project/openai-whisper/))
131
  - Added [faster whisper](https://pypi.org/project/faster-whisper/) model support:
132
+ - it avoids `None` values on `end_ts` timestamps for the last elements, unlike the HuggingFace Whisper's output
133
+ - it uses silero-vad to detect long silences within the audio
134
+ - webApp frontend - improved css on mobile devices
135
 
136
  ### TODO
137