TODO

should rename instances of "models" to "voice models"

Project/task management

Should find tool for project/task management
Tool should support:
- hierarchical tasks
- custom labels and or priorities on tasks
- being able to filter tasks based on those labels
- being able to close and resolve tasks
- Being able to integrate with vscode
- Access for multiple people (in a team)
Should migrate the content of this file into tool
Potential candidates
- GitHub projects
  - Does not yet support hierarchical tasks so no
- Trello
  - Does not seem to support hierarchical tasks either
- Notion
  - Seems to support hierarchical tasks, but is complicated
- Todoist
  - seems to support both hierarchical tasks, custom labels, filtering on those labels, multiple users and there are unofficial plugins for vscode.

Front end

Modularization

Improve modularization of web code using helper functions defined here
Split front-end modules into further sub-modules.
- Structure of web folder should be:
  - web
    - manage_models
      - __init__.py
      - main.py
    - manage_audio
      - __init__.py
      - main.py
    - generate_song_covers
      - __init__.py
      - main.py
      - one_click_generation
        
        __init__.py
        
        main.py
        
        accordions
        
        __init__.py
        
        options_x.py ... ?
      - multi_step_generation
        
        __init__.py
        
        main.py
        
        accordions
        
        __init__.py
        
        step_X.py ...
    - common.py
  - For multi_step_generation/step_X.py, its potential render function might have to take the set of all "input tracks" in the multi-step generation tab, so these will then have to be defined in multi_step_generation/main.py. Other components passed to multi_step_generation/main.py might also need to be passed further down to multi_step_generation/step_X.py
  - For one_click_generation/option_X.py, its potential render function should render the accordion for the given options and return the components defined in the accordion? Other components passed to one_click_generation/main.py might also need to be passed further down to one_click_generation/option_X.py
- Import components instead of passing them as inputs to render functions (DIFFICULT TO IMPLEMENT)
  - We have had problems before with component ids when components are instantiated outside a Blocks context in a separate module and then import into other modules and rendered in their blocks contexts.

Multi-step generation

If possible merge two consecutive event listeners using update_cached_songs in the song retrieval accordion.
add description describing how to use each accordion and suggestions for workflows
add option for adding more input tracks to the mix song step
- new components should be created dynamically based on a textfield with names and a button for creating new component
- when creating a new component a new transfer button and dropdown should also be created
- and the transfer choices for all dropdowns should be updated to also include the new input track
- we need to consider how to want to handle vertical space
  - should be we make a new row once more than 3 tracks are on one row?
    - yes and there should be also created the new slider on a new row
    - right under the first row (which itself is under the row with song dir dropdown)
should also have the possiblity to add more tracks to the pitch shift accordion.
add a confirmation box with warning if trying to transfer output track to input track that is not empty.
- could also have the possibility to ask the user to transfer to create a new input track and transfer the output track to it.
- this would just be the same pop up confirmation box as before but in addition to yes and cancel options it will also have a "transfer to new input track" option.
- we need custom javasctip for this.

Common

fix problem with typing of block.launch()
- problem stems from doing from gradio import routes
- so instead should import from gradio.routes directly
- open a pr with changes
save default values for options for song generation in an SongCoverOptionDefault enum.
- then reference this enum across the two tabs
- and also use list[SongCoverOptionDefault] as input to reset settings click event listener in single click generation tab.
Persist state of app (currently selected settings etc.) across re-renders
- This includes:
  - refreshing a browser windows
  - Opening app in new browser window
  - Maybe it should also include when app is started anew?
- Possible solutions
  - use gr.browserstate to allow state to be preserved acrross page loads.
  - Save any changes to components to a session dictionary and load from it upon refresh
    - See here
    - Problem is that this solution might not work with accordions or other types of blocks * should use .expand() and .collapse() event listeners on accordions to programmatically reset the state of accordions to what they were before after user has refreshed the page
  - Use localstorage
    - see here and here
  - Whenever the state of a component is changed save the new state to a custom JSON file.
    - Then whenever the app is refreshed load the current state of components from the JSON file
    - This solution should probably work for Block types that are not components
need to fix the INFO: Could not find files for the given pattern(s) on startup of web application on windows (DIFFICULT TO IMPLEMENT)
- this is an error that gradio needs to fix
Remove reset button on slider components (DIFFICULT TO IMPLEMENT)
- this is a gradio feature that needs to be removed.
Fix that gradio removes special symbols from audio paths when loaded into audio components (DIFFICULT TO IMPLEMENT)
- includes parenthesis, question marks, etc.
- its a gradio bug so report?
Add button for cancelling any currently running jobs (DIFFICULT TO IMPLEMENT)
- Not supported by Gradio natively
- Also difficult to implement manually as Gradio seems to be running called backend functions in thread environments
dont show error upon missing confirmation (DIFFICULT TO IMPLEMENT)
- can return gr.update()instead of raising an error in relevant event listener function
- but problem is that subsequent steps will still be executed in this case
clearing temporary files with the delete_cache parameter only seems to work if all windows are closed before closing the app process (DIFFICULT TO IMPLEMENT)
- this is a gradio bug so report?

Online hosting optimization

make concurrency_id and concurrency limit on components be dependent on whether gpu is used or not
- if only cpu then there should be no limit
increase value of default_concurrency_limit in Block.queue so that the same event listener
- can be called multiple times concurrently
use Block.launch() with max_file_size to prevent too large uploads
define as many functions with async as possible to increase responsiveness of app
- and then use Block.launch() with max_threadsset to an appropriate value representing the number of concurrent threads that can be run on the server (default is 40)
Persist state of app (currently selected settings etc.) across re-renders
consider setting max_size in Block.queue() to explicitly limit the number of people that can be in the queue at the same time
clearing of temporary files should happen after a user logs in and out
- and in this case it should only be temporary files for the active user that are cleared
  - Is that even possible to control?
enable server side rendering (requires installing node and setting ssr_mode = true in .launch) (DIFFICULT TO IMPLEMENT)
- Also needs to set GRADIO_NODE_PATH to point to the node executable
- problem is that on windows there is a ERR_UNSUPPORTED_ESM_URL_SCHEME which needs to be fixed by gradio
  - see here https://github.com/nodejs/node/issues/31710
- on linux it works but it is not possible to shutdown server using CTRL+ C

Back end

`generate_song_cover.py`

intermediate file prefixes should be made into enums
find framework for caching intermediate results rather than relying on your homemade system
Support specific audio formats for intermediate audio file?
- it might require some more code to support custom output format for all pipeline functions.
expand _get_model_name so that it can take any audio file in an intermediate audio folder as input (DIFFICULT TO IMPLEMENT)
- Function should then try to recursively
  - look for a corresponding json metadata file
  - find the model name in that file if it exists
  - otherwise find the path in the input field in the metadata file
  - repeat
- should also consider whether input audio file belongs to step before audio conversion step
use pydantic models to constrain numeric inputs (DIFFICULT TO IMPLEMENT)
- for inputs to convert function for example
- Use Annotated[basic type, Field[constraint]] syntax along with a @validate_call decorator on functions
- Problem is that pyright does not support Annotated so we would have to switch to mypy

`manage_models.py`

use pandas.read_json to load public models table (DIFFICULT TO IMPLEMENT)

CLI

Add remaining CLI interfaces

Interface for core.manage_models
Interface for core.manage_audio
Interfaces for individual pipeline functions defined in core.generate_song_covers

python package management

need to make project version (in pyproject.toml) dynamic so that it is updated automatically when a new release is made
once diffq-fixed is used by audio-separator we can remove the url dependency on windows
- we will still need to wait for uv to make it easy to install package with torch dependency
- also it is still necessary to install pytorch first as it is not on pypi index
figure out way of making ./urvc commands execute faster
- when ultimate rvc is downloaded as a pypi package the exposed commands are much faster so investigate this
update dependencies in pyproject.toml
- use latest compatible version of all packages
- remove commented out code, unless strictly necessary

Audio separation

expand back-end function(s) so that they are parametrized by both model type as well as model settings
- Need to decide whether we only want to support common model settings or also settings that are unique to each model
  - It will probably be the latter, which will then require some extra checks.
- Need to decide which models supported by audio_separator that we want to support
  - Not all of them seem to work
  - Probably MDX models and MDXC models
  - Maybe also VR and demucs?
- Revisit online guide for optimal models and settings
In multi-step generation tab
- Expand audio-separation accordion so that model can be selected and appropriate settings for that model can then be selected.
  - Model specific settings should expand based on selected model
In one-click generation
- Should have an "vocal extration" option accordion
  - Should be able to choose which audio separation steps to include in pipeline
    - possible steps
      - step 1: separating audio form instrumentals
      - step 2: separating main vocals from background vocals:
      - step 3: de-reverbing vocals
    - Should pick steps from dropdown?
    - For each selected step a new sub-accordion with options for that step will then appear
      - Each accordion should include general settings
      - We should decide whether model specific settings should also be supported
      - We Should also decide whether sub-accordion should setting for choosing a model and if so render specific settings based the chosen model
  - Alternative layout:
    - have option to choose number of separation steps
    - then dynamically render sub accordions for each of the selected number of steps
      - In this case it should be possible to choose models for each accordion
        
        this field should be iniitally empty
      - Other setttings should probably have sensible defaults that are the same
    - It might also be a good idea to then have an "examples" pane with recommended combinations of extractions steps
    - When one of these is selected, then the selected number of accordions with the preset settings should be filled out
- optimize pre-processing
  - check https://github.com/ArkanDash/Multi-Model-RVC-Inference
- Alternatives to audio-separator package:
  - Deezer Spleeter
    - supports both CLI and python package
  - Asteroid
  - Nuzzle

GitHub

Actions

linting with Ruff
typechecking with Pyright
running all tests
automatic building and publishing of project to pypi
- includes automatic update of project version number
or use pre-commit?

README

Fill out TBA sections in README
Add note about not using with VPN?
Add different emblems/badges in header
- like test coverage, build status, etc. (look at other projects for inspiration)
spice up text with emojis (look at tiango's projects for inspiration)

Releases

Make regular releases like done for Applio
- Will be an .exe file that when run unzips contents into application folder, where ./urvc run can then be executed.
- Could it be possible to have .exe file just start webapp when clicked?
Could also include pypi package as a release?
use pyinstaller to install app into executable that also includes sox and ffmpeg as dependencies (DLLs)

Other

In the future consider detaching repo from where it is forked from:
- because it is not possible to make the repo private otherwise
- see: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/detaching-a-fork

Incorporate upstream changes

Incorporate RVC code from rvc-cli (i.e. changes from Applio)
- more options for voice conversion and more efficient voice conversion
- batch conversion sub-tab
- TTS tab
- Model training tab
- support more pre-trained models
  - sub-tab under "manage models" tab
- support for querying online database with many models that can be downloaded
- support for audio and model analysis.
- Voice blending tab
Incorporate latest changes from RVC-WebUI

Vocal Conversion

support arbitrary combination of pitch detection algorithms
- source: https://github.com/gitmylo/audio-webui
Investigate using onnx models for inference speedup on cpu
Add more pitch detection methods
- pm
- harvest
- dio
- rvmpe+
Implement multi-gpu Inference

TTS conversion

also include original edge voice as output
- source: https://github.com/litagin02/rvc-tts-webui

Model management

Training models

have learning rate for training
- source: https://github.com/gitmylo/audio-webui
have a quick training button
- or have preprocess dataset, extract features and generate index happen by default
Support a loss/training graph
- source: https://github.com/gitmylo/audio-webui

Download models

Support batch downloading multiple models
- requires a tabular request form where both a link column and a name column has to be filled out
- we can allow selecting multiple items from public models table and then copying them over
support quering online database for models matching a given search string like what is done in applio app
- first n rows of online database should be shown by default in public models table
  - more rows should be retrieved by scrolling down or clicking a button
- user search string should filter/narrow returned number of rows in public models table
- When clicking a set of rows they should then be copied over for downloading in the "download" table
support a column with preview sample in public models table
- Only possible if voice snippets are also returned when querying the online database
Otherwise we can always support voice snippets for voice models that have already been downloaded
- run model on sample text ("quick brown fox runs over the lazy") after it is downloaded
- save the results in a audio/model_preview folder
- Preview can then be loaded into a preview audio component when selecting a model from a dropdown
- or if we replace the dropdown with a table with two columns we can have the audio track displayed in the second column

Model analysis

we could provide a new tab to analyze an existing model like what is done in applio
- or this tab could be consolidated with the delete model tab?
we could also provide extra model information after model is downloaded
- potentialy in dropdown to expand?

Audio management

General

Support audio information tool like in applio?
- A new tab where you can upload a song to analyze?
more elaborate solution:
- tab where where you
  - can select any song directory
  - select any step in the audio generation pipeline
  - then select any intermediate audio file generated in that step
  - Then have the possibility to
    - Listen to the song
    - see a table with its metadata (based on its associated .json file)
      - add timestamp to json files so they can be sorted in table according to creation date
    - And other statistics in a separate component (graph etc.)
- Could have delete buttons both at the level of song_directory, step, and for each song?
- Also consider splitting intermediate audio tracks for each step in to subfolder (0,1,2,3...)

Other settings

rework other settings tab
- this should also contain other settings such as the ability to change the theme of the app
- there should be a button to apply settings which will reload the app with the new settings

Audio post-processing

Support more effects from the pedalboard pakcage.
- Guitar-style effects: Chorus, Distortion, Phaser, Clipping
- Loudness and dynamic range effects: Compressor, Gain, Limiter
- Equalizers and filters: HighpassFilter, LadderFilter, LowpassFilter
- Spatial effects: Convolution, Delay, Reverb
- Pitch effects: PitchShift
- Lossy compression: GSMFullRateCompressor, MP3Compressor
- Quality reduction: Resample, Bitcrush
- NoiseGate
- PeakFilter

Audio Mixing

Add main gain loudness slider?
Add option to equalize output audio with respect to input audio
- i.e. song cover gain (and possibly also more general dynamics) should be the same as those for source song.
- check to see if pydub has functionality for this
- otherwise a simple solution would be computing the RMS of the difference between the loudness of the input and output track
```
  rms = np.sqrt(np.mean(np.square(signal)))
  dB  = 20*np.log10(rms)
  #add db to output file in mixing function (using pydub)
```
- When this option is selected the option to set main gain of ouput should be disabled?
add more equalization options
- using pydub.effects and pydub.scipy_effects?

Custom UI

Experiment with new themes including Building new ones
- first of all make new theme that is like the default gradio 4 theme in terms of using semi transparent orange as the main color and semi-transparent grey for secondary color. The new gradio 5 theme is good apart from using solid colors so maybe use that as base theme.
- Support both dark and light theme in app?
- Add Support for changing theme in app?
- Use Applio theme as inspiration for default theme?
Experiment with using custom CSS
- Pass css = {css_string} to gr.Blocks and use elem_classes and elem_id to have components target the styles define in the CSS string.
Experiment with custom DataFrame styling
Experiment with custom Javascript
Look for opportunities for defining new useful custom components

Real-time vocal conversion

Should support being used as OBS plugin
Latency is real issue
Implementations details:
- implement back-end in Rust?
- implement front-end using svelte?
- implement desktop application using C++ or C#?
see https://github.com/w-okada/voice-changer and https://github.com/RVC-Project/obs-rvc for inspiration

AI assistant mode

similar to vocal conversion streaming but instead of converting your voice on the fly, it should:
- take your voice,
- do some language modelling (with an LLM or something)
- then produce an appropriate verbal response
We already have Kyutais moshi
- Maybe that model can be finetuned to reply with a voice
- i.e. your favorite singer, actor, best friend, family member.

Ultimate RVC bot for discord

maybe also make a forum on discord?

Make app production ready

have a "report a bug" tab like in applio?
should have separate accounts for users when hosting online
- use gr.LoginButton and gr.LogoutButton?
deploy using docker
- See https://www.gradio.app/guides/deploying-gradio-with-docker
Host on own web-server with Nginx
- see https://www.gradio.app/guides/running-gradio-on-your-web-server-with-nginx
Consider having concurrency limit be dynamic, i.e. instead of always being 1 for jobs using gpu consider having it depend upon what resources are available.
- We can app set the GPU_CONCURRENCY limit to be os.envrion["GPU_CONCURRENCY_LIMIT] or 1 and then pass GPU_CONCURRENCY as input to places where event listeners are defined

Colab notebook

find way of saving virtual environment with python 3.11 in colab notebook (DIFFICULT TO IMPLEMENT)
- so that this environment can be loaded directly rather than downloading all dependencies every time app is opened

Testing

Add example audio files to use for testing
- Should be located in audio/examples
- could have sub-folders input and output
  - in output folder we have output_audio.ext files each with a corresponding input_audio.json file containing metadata explaining arguments used to generate output
  - We can then test that actual output is close enough to expected output using audio similarity metric.
Setup unit testing framework using pytest