Spaces:
Runtime error
TODO
- should rename instances of "models" to "voice models"
Project/task management
- Should find tool for project/task management
- Tool should support:
- hierarchical tasks
- custom labels and or priorities on tasks
- being able to filter tasks based on those labels
- being able to close and resolve tasks
- Being able to integrate with vscode
- Access for multiple people (in a team)
- Should migrate the content of this file into tool
- Potential candidates
- GitHub projects
- Does not yet support hierarchical tasks so no
- Trello
- Does not seem to support hierarchical tasks either
- Notion
- Seems to support hierarchical tasks, but is complicated
- Todoist
- seems to support both hierarchical tasks, custom labels, filtering on those labels, multiple users and there are unofficial plugins for vscode.
- GitHub projects
Front end
Modularization
- Improve modularization of web code using helper functions defined here
- Split front-end modules into further sub-modules.
- Structure of web folder should be:
web
manage_models
__init__.py
main.py
manage_audio
__init__.py
main.py
generate_song_covers
__init__.py
main.py
one_click_generation
__init__.py
main.py
accordions
__init__.py
options_x.py
... ?
multi_step_generation
__init__.py
main.py
accordions
__init__.py
step_X.py
...
common.py
- For
multi_step_generation/step_X.py
, its potential render function might have to take the set of all "input tracks" in the multi-step generation tab, so these will then have to be defined inmulti_step_generation/main.py
. Other components passed tomulti_step_generation/main.py
might also need to be passed further down tomulti_step_generation/step_X.py
- For
one_click_generation/option_X.py
, its potential render function should render the accordion for the given options and return the components defined in the accordion? Other components passed toone_click_generation/main.py
might also need to be passed further down toone_click_generation/option_X.py
- Import components instead of passing them as inputs to render functions (DIFFICULT TO IMPLEMENT)
- We have had problems before with component ids when components are instantiated outside a Blocks context in a separate module and then import into other modules and rendered in their blocks contexts.
- Structure of web folder should be:
Multi-step generation
If possible merge two consecutive event listeners using
update_cached_songs
in the song retrieval accordion.add description describing how to use each accordion and suggestions for workflows
add option for adding more input tracks to the mix song step
- new components should be created dynamically based on a textfield with names and a button for creating new component
- when creating a new component a new transfer button and dropdown should also be created
- and the transfer choices for all dropdowns should be updated to also include the new input track
- we need to consider how to want to handle vertical space
- should be we make a new row once more than 3 tracks are on one row?
- yes and there should be also created the new slider on a new row
- right under the first row (which itself is under the row with song dir dropdown)
- should be we make a new row once more than 3 tracks are on one row?
should also have the possiblity to add more tracks to the pitch shift accordion.
add a confirmation box with warning if trying to transfer output track to input track that is not empty.
- could also have the possibility to ask the user to transfer to create a new input track and transfer the output track to it.
- this would just be the same pop up confirmation box as before but in addition to yes and cancel options it will also have a "transfer to new input track" option.
- we need custom javasctip for this.
Common
- fix problem with typing of block.launch()
- problem stems from doing from gradio import routes
- so instead should import from gradio.routes directly
- open a pr with changes
- save default values for options for song generation in an
SongCoverOptionDefault
enum.- then reference this enum across the two tabs
- and also use
list[SongCoverOptionDefault]
as input to reset settings click event listener in single click generation tab.
- Persist state of app (currently selected settings etc.) across re-renders
- This includes:
- refreshing a browser windows
- Opening app in new browser window
- Maybe it should also include when app is started anew?
- Possible solutions
use gr.browserstate to allow state to be preserved acrross page loads.
Save any changes to components to a session dictionary and load from it upon refresh
- See here
- Problem is that this solution might not work with accordions or other types of blocks * should use .expand() and .collapse() event listeners on accordions to programmatically reset the state of accordions to what they were before after user has refreshed the page
Use localstorage
Whenever the state of a component is changed save the new state to a custom JSON file.
- Then whenever the app is refreshed load the current state of components from the JSON file
- This solution should probably work for Block types that are not components
- This includes:
- need to fix the
INFO: Could not find files for the given pattern(s)
on startup of web application on windows (DIFFICULT TO IMPLEMENT)- this is an error that gradio needs to fix
- Remove reset button on slider components (DIFFICULT TO IMPLEMENT)
- this is a gradio feature that needs to be removed.
- Fix that gradio removes special symbols from audio paths when loaded into audio components (DIFFICULT TO IMPLEMENT)
- includes parenthesis, question marks, etc.
- its a gradio bug so report?
- Add button for cancelling any currently running jobs (DIFFICULT TO IMPLEMENT)
- Not supported by Gradio natively
- Also difficult to implement manually as Gradio seems to be running called backend functions in thread environments
- dont show error upon missing confirmation (DIFFICULT TO IMPLEMENT)
- can return
gr.update()
instead of raising an error in relevant event listener function - but problem is that subsequent steps will still be executed in this case
- can return
- clearing temporary files with the
delete_cache
parameter only seems to work if all windows are closed before closing the app process (DIFFICULT TO IMPLEMENT)- this is a gradio bug so report?
Online hosting optimization
- make concurrency_id and concurrency limit on components be dependent on whether gpu is used or not
- if only cpu then there should be no limit
- increase value of
default_concurrency_limit
inBlock.queue
so that the same event listener- can be called multiple times concurrently
- use
Block.launch()
withmax_file_size
to prevent too large uploads - define as many functions with async as possible to increase responsiveness of app
- and then use
Block.launch()
withmax_threads
set to an appropriate value representing the number of concurrent threads that can be run on the server (default is 40)
- and then use
- Persist state of app (currently selected settings etc.) across re-renders
- consider setting
max_size
inBlock.queue()
to explicitly limit the number of people that can be in the queue at the same time - clearing of temporary files should happen after a user logs in and out
- and in this case it should only be temporary files for the active user that are cleared
- Is that even possible to control?
- and in this case it should only be temporary files for the active user that are cleared
- enable server side rendering (requires installing node and setting ssr_mode = true in .launch) (DIFFICULT TO IMPLEMENT)
- Also needs to set GRADIO_NODE_PATH to point to the node executable
- problem is that on windows there is a ERR_UNSUPPORTED_ESM_URL_SCHEME which needs to be fixed by gradio
- on linux it works but it is not possible to shutdown server using CTRL+ C
Back end
generate_song_cover.py
intermediate file prefixes should be made into enums
find framework for caching intermediate results rather than relying on your homemade system
Support specific audio formats for intermediate audio file?
- it might require some more code to support custom output format for all pipeline functions.
expand
_get_model_name
so that it can take any audio file in an intermediate audio folder as input (DIFFICULT TO IMPLEMENT)- Function should then try to recursively
- look for a corresponding json metadata file
- find the model name in that file if it exists
- otherwise find the path in the input field in the metadata file
- repeat
- should also consider whether input audio file belongs to step before audio conversion step
- Function should then try to recursively
use pydantic models to constrain numeric inputs (DIFFICULT TO IMPLEMENT)
- for inputs to
convert
function for example - Use
Annotated[basic type, Field[constraint]]
syntax along with a @validate_call decorator on functions - Problem is that pyright does not support
Annotated
so we would have to switch to mypy
- for inputs to
manage_models.py
- use pandas.read_json to load public models table (DIFFICULT TO IMPLEMENT)
CLI
Add remaining CLI interfaces
- Interface for
core.manage_models
- Interface for
core.manage_audio
- Interfaces for individual pipeline functions defined in
core.generate_song_covers
python package management
- need to make project version (in
pyproject.toml
) dynamic so that it is updated automatically when a new release is made - once diffq-fixed is used by audio-separator we can remove the url dependency on windows
- we will still need to wait for uv to make it easy to install package with torch dependency
- also it is still necessary to install pytorch first as it is not on pypi index
- figure out way of making ./urvc commands execute faster
- when ultimate rvc is downloaded as a pypi package the exposed commands are much faster so investigate this
- update dependencies in pyproject.toml
- use latest compatible version of all packages
- remove commented out code, unless strictly necessary
Audio separation
- expand back-end function(s) so that they are parametrized by both model type as well as model settings
- Need to decide whether we only want to support common model settings or also settings that are unique to each model
- It will probably be the latter, which will then require some extra checks.
- Need to decide which models supported by
audio_separator
that we want to support- Not all of them seem to work
- Probably MDX models and MDXC models
- Maybe also VR and demucs?
- Revisit online guide for optimal models and settings
- Need to decide whether we only want to support common model settings or also settings that are unique to each model
- In multi-step generation tab
- Expand audio-separation accordion so that model can be selected and appropriate settings for that model can then be selected.
- Model specific settings should expand based on selected model
- Expand audio-separation accordion so that model can be selected and appropriate settings for that model can then be selected.
- In one-click generation
- Should have an "vocal extration" option accordion
- Should be able to choose which audio separation steps to include in pipeline
- possible steps
- step 1: separating audio form instrumentals
- step 2: separating main vocals from background vocals:
- step 3: de-reverbing vocals
- Should pick steps from dropdown?
- For each selected step a new sub-accordion with options for that step will then appear
- Each accordion should include general settings
- We should decide whether model specific settings should also be supported
- We Should also decide whether sub-accordion should setting for choosing a model and if so render specific settings based the chosen model
- possible steps
- Alternative layout:
- have option to choose number of separation steps
- then dynamically render sub accordions for each of the selected number of steps
- In this case it should be possible to choose models for each accordion
- this field should be iniitally empty
- Other setttings should probably have sensible defaults that are the same
- In this case it should be possible to choose models for each accordion
- It might also be a good idea to then have an "examples" pane with recommended combinations of extractions steps
- When one of these is selected, then the selected number of accordions with the preset settings should be filled out
- Should be able to choose which audio separation steps to include in pipeline
- optimize pre-processing
- Alternatives to
audio-separator
package:- Deezer Spleeter
- supports both CLI and python package
- Asteroid
- Nuzzle
- Deezer Spleeter
- Should have an "vocal extration" option accordion
GitHub
Actions
- linting with Ruff
- typechecking with Pyright
- running all tests
- automatic building and publishing of project to pypi
- includes automatic update of project version number
- or use pre-commit?
README
- Fill out TBA sections in README
- Add note about not using with VPN?
- Add different emblems/badges in header
- like test coverage, build status, etc. (look at other projects for inspiration)
- spice up text with emojis (look at tiango's projects for inspiration)
Releases
Make regular releases like done for Applio
- Will be an
.exe
file that when run unzips contents into application folder, where./urvc run
can then be executed. - Could it be possible to have
.exe
file just start webapp when clicked?
- Will be an
Could also include pypi package as a release?
use pyinstaller to install app into executable that also includes sox and ffmpeg as dependencies (DLLs)
Other
- In the future consider detaching repo from where it is forked from:
- because it is not possible to make the repo private otherwise
- see: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/detaching-a-fork
Incorporate upstream changes
- Incorporate RVC code from rvc-cli (i.e. changes from Applio)
- more options for voice conversion and more efficient voice conversion
- batch conversion sub-tab
- TTS tab
- Model training tab
- support more pre-trained models
- sub-tab under "manage models" tab
- support for querying online database with many models that can be downloaded
- support for audio and model analysis.
- Voice blending tab
- Incorporate latest changes from RVC-WebUI
Vocal Conversion
- support arbitrary combination of pitch detection algorithms
- Investigate using onnx models for inference speedup on cpu
- Add more pitch detection methods
- pm
- harvest
- dio
- rvmpe+
- Implement multi-gpu Inference
TTS conversion
- also include original edge voice as output
Model management
Training models
- have learning rate for training
- have a quick training button
- or have preprocess dataset, extract features and generate index happen by default
- Support a loss/training graph
Download models
- Support batch downloading multiple models
- requires a tabular request form where both a link column and a name column has to be filled out
- we can allow selecting multiple items from public models table and then copying them over
- support quering online database for models matching a given search string like what is done in applio app
- first n rows of online database should be shown by default in public models table
- more rows should be retrieved by scrolling down or clicking a button
- user search string should filter/narrow returned number of rows in public models table
- When clicking a set of rows they should then be copied over for downloading in the "download" table
- first n rows of online database should be shown by default in public models table
- support a column with preview sample in public models table
- Only possible if voice snippets are also returned when querying the online database
- Otherwise we can always support voice snippets for voice models that have already been downloaded
- run model on sample text ("quick brown fox runs over the lazy") after it is downloaded
- save the results in a
audio/model_preview
folder - Preview can then be loaded into a preview audio component when selecting a model from a dropdown
- or if we replace the dropdown with a table with two columns we can have the audio track displayed in the second column
Model analysis
we could provide a new tab to analyze an existing model like what is done in applio
- or this tab could be consolidated with the delete model tab?
we could also provide extra model information after model is downloaded
- potentialy in dropdown to expand?
Audio management
General
- Support audio information tool like in applio?
- A new tab where you can upload a song to analyze?
- more elaborate solution:
- tab where where you
- can select any song directory
- select any step in the audio generation pipeline
- then select any intermediate audio file generated in that step
- Then have the possibility to
- Listen to the song
- see a table with its metadata (based on its associated
.json
file)- add timestamp to json files so they can be sorted in table according to creation date
- And other statistics in a separate component (graph etc.)
- Could have delete buttons both at the level of song_directory, step, and for each song?
- Also consider splitting intermediate audio tracks for each step in to subfolder (0,1,2,3...)
- tab where where you
Other settings
- rework other settings tab
- this should also contain other settings such as the ability to change the theme of the app
- there should be a button to apply settings which will reload the app with the new settings
Audio post-processing
- Support more effects from the
pedalboard
pakcage.- Guitar-style effects: Chorus, Distortion, Phaser, Clipping
- Loudness and dynamic range effects: Compressor, Gain, Limiter
- Equalizers and filters: HighpassFilter, LadderFilter, LowpassFilter
- Spatial effects: Convolution, Delay, Reverb
- Pitch effects: PitchShift
- Lossy compression: GSMFullRateCompressor, MP3Compressor
- Quality reduction: Resample, Bitcrush
- NoiseGate
- PeakFilter
Audio Mixing
Add main gain loudness slider?
Add option to equalize output audio with respect to input audio
- i.e. song cover gain (and possibly also more general dynamics) should be the same as those for source song.
- check to see if pydub has functionality for this
- otherwise a simple solution would be computing the RMS of the difference between the loudness of the input and output track
rms = np.sqrt(np.mean(np.square(signal))) dB = 20*np.log10(rms) #add db to output file in mixing function (using pydub)
- When this option is selected the option to set main gain of ouput should be disabled?
add more equalization options
- using
pydub.effects
andpydub.scipy_effects
?
- using
Custom UI
- Experiment with new themes including Building new ones
- first of all make new theme that is like the default gradio 4 theme in terms of using semi transparent orange as the main color and semi-transparent grey for secondary color. The new gradio 5 theme is good apart from using solid colors so maybe use that as base theme.
- Support both dark and light theme in app?
- Add Support for changing theme in app?
- Use Applio theme as inspiration for default theme?
- Experiment with using custom CSS
- Pass
css = {css_string}
togr.Blocks
and useelem_classes
andelem_id
to have components target the styles define in the CSS string.
- Pass
- Experiment with custom DataFrame styling
- Experiment with custom Javascript
- Look for opportunities for defining new useful custom components
Real-time vocal conversion
- Should support being used as OBS plugin
- Latency is real issue
- Implementations details:
- implement back-end in Rust?
- implement front-end using svelte?
- implement desktop application using C++ or C#?
- see https://github.com/w-okada/voice-changer and https://github.com/RVC-Project/obs-rvc for inspiration
AI assistant mode
- similar to vocal conversion streaming but instead of converting your voice on the fly, it should:
- take your voice,
- do some language modelling (with an LLM or something)
- then produce an appropriate verbal response
- We already have Kyutais moshi
- Maybe that model can be finetuned to reply with a voice
- i.e. your favorite singer, actor, best friend, family member.
Ultimate RVC bot for discord
- maybe also make a forum on discord?
Make app production ready
have a "report a bug" tab like in applio?
should have separate accounts for users when hosting online
- use
gr.LoginButton
andgr.LogoutButton
?
- use
deploy using docker
Host on own web-server with Nginx
Consider having concurrency limit be dynamic, i.e. instead of always being 1 for jobs using gpu consider having it depend upon what resources are available.
- We can app set the GPU_CONCURRENCY limit to be os.envrion["GPU_CONCURRENCY_LIMIT] or 1 and then pass GPU_CONCURRENCY as input to places where event listeners are defined
Colab notebook
- find way of saving virtual environment with python 3.11 in colab notebook (DIFFICULT TO IMPLEMENT)
- so that this environment can be loaded directly rather than downloading all dependencies every time app is opened
Testing
- Add example audio files to use for testing
- Should be located in
audio/examples
- could have sub-folders
input
andoutput
- in
output
folder we haveoutput_audio.ext
files each with a correspondinginput_audio.json
file containing metadata explaining arguments used to generate output - We can then test that actual output is close enough to expected output using audio similarity metric.
- in
- Should be located in
- Setup unit testing framework using pytest