# TODO * should rename instances of "models" to "voice models" ## Project/task management * Should find tool for project/task management * Tool should support: * hierarchical tasks * custom labels and or priorities on tasks * being able to filter tasks based on those labels * being able to close and resolve tasks * Being able to integrate with vscode * Access for multiple people (in a team) * Should migrate the content of this file into tool * Potential candidates * GitHub projects * Does not yet support hierarchical tasks so no * Trello * Does not seem to support hierarchical tasks either * Notion * Seems to support hierarchical tasks, but is complicated * Todoist * seems to support both hierarchical tasks, custom labels, filtering on those labels, multiple users and there are unofficial plugins for vscode. ## Front end ### Modularization * Improve modularization of web code using helper functions defined [here](https://huggingface.co/spaces/WoWoWoWololo/wrapping-layouts/blob/main/app.py) * Split front-end modules into further sub-modules. * Structure of web folder should be: * `web` * `manage_models` * `__init__.py` * `main.py` * `manage_audio` * `__init__.py` * `main.py` * `generate_song_covers` * `__init__.py` * `main.py` * `one_click_generation` * `__init__.py` * `main.py` * `accordions` * `__init__.py` * `options_x.py` ... ? * `multi_step_generation` * `__init__.py` * `main.py` * `accordions` * `__init__.py` * `step_X.py` ... * `common.py` * For `multi_step_generation/step_X.py`, its potential render function might have to take the set of all "input tracks" in the multi-step generation tab, so these will then have to be defined in `multi_step_generation/main.py`. Other components passed to `multi_step_generation/main.py` might also need to be passed further down to `multi_step_generation/step_X.py` * For `one_click_generation/option_X.py`, its potential render function should render the accordion for the given options and return the components defined in the accordion? Other components passed to `one_click_generation/main.py` might also need to be passed further down to `one_click_generation/option_X.py` * Import components instead of passing them as inputs to render functions (DIFFICULT TO IMPLEMENT) * We have had problems before with component ids when components are instantiated outside a Blocks context in a separate module and then import into other modules and rendered in their blocks contexts. ### Multi-step generation * If possible merge two consecutive event listeners using `update_cached_songs` in the song retrieval accordion. * add description describing how to use each accordion and suggestions for workflows * add option for adding more input tracks to the mix song step * new components should be created dynamically based on a textfield with names and a button for creating new component * when creating a new component a new transfer button and dropdown should also be created * and the transfer choices for all dropdowns should be updated to also include the new input track * we need to consider how to want to handle vertical space * should be we make a new row once more than 3 tracks are on one row? * yes and there should be also created the new slider on a new row * right under the first row (which itself is under the row with song dir dropdown) * should also have the possiblity to add more tracks to the pitch shift accordion. * add a confirmation box with warning if trying to transfer output track to input track that is not empty. * could also have the possibility to ask the user to transfer to create a new input track and transfer the output track to it. * this would just be the same pop up confirmation box as before but in addition to yes and cancel options it will also have a "transfer to new input track" option. * we need custom javasctip for this. ### Common * fix problem with typing of block.launch() * problem stems from doing from gradio import routes * so instead should import from gradio.routes directly * open a pr with changes * save default values for options for song generation in an `SongCoverOptionDefault` enum. * then reference this enum across the two tabs * and also use `list[SongCoverOptionDefault]` as input to reset settings click event listener in single click generation tab. * Persist state of app (currently selected settings etc.) across re-renders * This includes: * refreshing a browser windows * Opening app in new browser window * Maybe it should also include when app is started anew? * Possible solutions * use gr.browserstate to allow state to be preserved acrross page loads. * Save any changes to components to a session dictionary and load from it upon refresh * See [here](https://github.com/gradio-app/gradio/issues/3106#issuecomment-1694704623) * Problem is that this solution might not work with accordions or other types of blocks * should use .expand() and .collapse() event listeners on accordions to programmatically reset the state of accordions to what they were before after user has refreshed the page * Use localstorage * see [here](https://huggingface.co/spaces/YiXinCoding/gradio-chat-history/blob/main/app.py) and [here](https://huggingface.co/spaces/radames/gradio_window_localStorage/blob/main/app.py) * Whenever the state of a component is changed save the new state to a custom JSON file. * Then whenever the app is refreshed load the current state of components from the JSON file * This solution should probably work for Block types that are not components * need to fix the `INFO: Could not find files for the given pattern(s)` on startup of web application on windows (DIFFICULT TO IMPLEMENT) * this is an error that gradio needs to fix * Remove reset button on slider components (DIFFICULT TO IMPLEMENT) * this is a gradio feature that needs to be removed. * Fix that gradio removes special symbols from audio paths when loaded into audio components (DIFFICULT TO IMPLEMENT) * includes parenthesis, question marks, etc. * its a gradio bug so report? * Add button for cancelling any currently running jobs (DIFFICULT TO IMPLEMENT) * Not supported by Gradio natively * Also difficult to implement manually as Gradio seems to be running called backend functions in thread environments * dont show error upon missing confirmation (DIFFICULT TO IMPLEMENT) * can return `gr.update()`instead of raising an error in relevant event listener function * but problem is that subsequent steps will still be executed in this case * clearing temporary files with the `delete_cache` parameter only seems to work if all windows are closed before closing the app process (DIFFICULT TO IMPLEMENT) * this is a gradio bug so report? ## Online hosting optimization * make concurrency_id and concurrency limit on components be dependent on whether gpu is used or not * if only cpu then there should be no limit * increase value of `default_concurrency_limit` in `Block.queue` so that the same event listener * can be called multiple times concurrently * use `Block.launch()` with `max_file_size` to prevent too large uploads * define as many functions with async as possible to increase responsiveness of app * and then use `Block.launch()` with `max_threads`set to an appropriate value representing the number of concurrent threads that can be run on the server (default is 40) * Persist state of app (currently selected settings etc.) across re-renders * consider setting `max_size` in `Block.queue()` to explicitly limit the number of people that can be in the queue at the same time * clearing of temporary files should happen after a user logs in and out * and in this case it should only be temporary files for the active user that are cleared * Is that even possible to control? * enable server side rendering (requires installing node and setting ssr_mode = true in .launch) (DIFFICULT TO IMPLEMENT) * Also needs to set GRADIO_NODE_PATH to point to the node executable * problem is that on windows there is a ERR_UNSUPPORTED_ESM_URL_SCHEME which needs to be fixed by gradio * see here https://github.com/nodejs/node/issues/31710 * on linux it works but it is not possible to shutdown server using CTRL+ C ## Back end ### `generate_song_cover.py` * intermediate file prefixes should be made into enums * find framework for caching intermediate results rather than relying on your homemade system * Joblib: * scikit learn: * * * * Support specific audio formats for intermediate audio file? * it might require some more code to support custom output format for all pipeline functions. * expand `_get_model_name` so that it can take any audio file in an intermediate audio folder as input (DIFFICULT TO IMPLEMENT) * Function should then try to recursively * look for a corresponding json metadata file * find the model name in that file if it exists * otherwise find the path in the input field in the metadata file * repeat * should also consider whether input audio file belongs to step before audio conversion step * use pydantic models to constrain numeric inputs (DIFFICULT TO IMPLEMENT) * for inputs to `convert` function for example * Use `Annotated[basic type, Field[constraint]]` syntax along with a @validate_call decorator on functions * Problem is that pyright does not support `Annotated` so we would have to switch to mypy ### `manage_models.py` * use pandas.read_json to load public models table (DIFFICULT TO IMPLEMENT) ## CLI ### Add remaining CLI interfaces * Interface for `core.manage_models` * Interface for `core.manage_audio` * Interfaces for individual pipeline functions defined in `core.generate_song_covers` ## python package management * need to make project version (in `pyproject.toml`) dynamic so that it is updated automatically when a new release is made * once diffq-fixed is used by audio-separator we can remove the url dependency on windows * we will still need to wait for uv to make it easy to install package with torch dependency * also it is still necessary to install pytorch first as it is not on pypi index * figure out way of making ./urvc commands execute faster * when ultimate rvc is downloaded as a pypi package the exposed commands are much faster so investigate this * update dependencies in pyproject.toml * use latest compatible version of all packages * remove commented out code, unless strictly necessary ## Audio separation * expand back-end function(s) so that they are parametrized by both model type as well as model settings * Need to decide whether we only want to support common model settings or also settings that are unique to each model * It will probably be the latter, which will then require some extra checks. * Need to decide which models supported by `audio_separator` that we want to support * Not all of them seem to work * Probably MDX models and MDXC models * Maybe also VR and demucs? * Revisit online guide for optimal models and settings * In multi-step generation tab * Expand audio-separation accordion so that model can be selected and appropriate settings for that model can then be selected. * Model specific settings should expand based on selected model * In one-click generation * Should have an "vocal extration" option accordion * Should be able to choose which audio separation steps to include in pipeline * possible steps * step 1: separating audio form instrumentals * step 2: separating main vocals from background vocals: * step 3: de-reverbing vocals * Should pick steps from dropdown? * For each selected step a new sub-accordion with options for that step will then appear * Each accordion should include general settings * We should decide whether model specific settings should also be supported * We Should also decide whether sub-accordion should setting for choosing a model and if so render specific settings based the chosen model * Alternative layout: * have option to choose number of separation steps * then dynamically render sub accordions for each of the selected number of steps * In this case it should be possible to choose models for each accordion * this field should be iniitally empty * Other setttings should probably have sensible defaults that are the same * It might also be a good idea to then have an "examples" pane with recommended combinations of extractions steps * When one of these is selected, then the selected number of accordions with the preset settings should be filled out * optimize pre-processing * check * Alternatives to `audio-separator` package: * [Deezer Spleeter](https://github.com/deezer/spleeter) * supports both CLI and python package * [Asteroid](https://github.com/asteroid-team/asteroid) * [Nuzzle](https://github.com/nussl/nussl) ## GitHub ### Actions * linting with Ruff * typechecking with Pyright * running all tests * automatic building and publishing of project to pypi * includes automatic update of project version number * or use pre-commit? ### README * Fill out TBA sections in README * Add note about not using with VPN? * Add different emblems/badges in header * like test coverage, build status, etc. (look at other projects for inspiration) * spice up text with emojis (look at tiango's projects for inspiration) ### Releases * Make regular releases like done for Applio * Will be an `.exe` file that when run unzips contents into application folder, where `./urvc run` can then be executed. * Could it be possible to have `.exe` file just start webapp when clicked? * Could also include pypi package as a release? * use pyinstaller to install app into executable that also includes sox and ffmpeg as dependencies (DLLs) ### Other * In the future consider detaching repo from where it is forked from: * because it is not possible to make the repo private otherwise * see: ## Incorporate upstream changes * Incorporate RVC code from [rvc-cli](https://github.com/blaisewf/rvc-cli) (i.e. changes from Applio) * more options for voice conversion and more efficient voice conversion * batch conversion sub-tab * TTS tab * Model training tab * support more pre-trained models * sub-tab under "manage models" tab * support for querying online database with many models that can be downloaded * support for audio and model analysis. * Voice blending tab * Incorporate latest changes from [RVC-WebUI](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) ## Vocal Conversion * support arbitrary combination of pitch detection algorithms * source: * Investigate using onnx models for inference speedup on cpu * Add more pitch detection methods * pm * harvest * dio * rvmpe+ * Implement multi-gpu Inference ## TTS conversion * also include original edge voice as output * source: ## Model management ### Training models * have learning rate for training * source: * have a quick training button * or have preprocess dataset, extract features and generate index happen by default * Support a loss/training graph * source: ### Download models * Support batch downloading multiple models * requires a tabular request form where both a link column and a name column has to be filled out * we can allow selecting multiple items from public models table and then copying them over * support quering online database for models matching a given search string like what is done in applio app * first n rows of online database should be shown by default in public models table * more rows should be retrieved by scrolling down or clicking a button * user search string should filter/narrow returned number of rows in public models table * When clicking a set of rows they should then be copied over for downloading in the "download" table * support a column with preview sample in public models table * Only possible if voice snippets are also returned when querying the online database * Otherwise we can always support voice snippets for voice models that have already been downloaded * run model on sample text ("quick brown fox runs over the lazy") after it is downloaded * save the results in a `audio/model_preview` folder * Preview can then be loaded into a preview audio component when selecting a model from a dropdown * or if we replace the dropdown with a table with two columns we can have the audio track displayed in the second column ### Model analysis * we could provide a new tab to analyze an existing model like what is done in applio * or this tab could be consolidated with the delete model tab? * we could also provide extra model information after model is downloaded * potentialy in dropdown to expand? ## Audio management ### General * Support audio information tool like in applio? * A new tab where you can upload a song to analyze? * more elaborate solution: * tab where where you * can select any song directory * select any step in the audio generation pipeline * then select any intermediate audio file generated in that step * Then have the possibility to * Listen to the song * see a table with its metadata (based on its associated `.json` file) * add timestamp to json files so they can be sorted in table according to creation date * And other statistics in a separate component (graph etc.) * Could have delete buttons both at the level of song_directory, step, and for each song? * Also consider splitting intermediate audio tracks for each step in to subfolder (0,1,2,3...) ## Other settings * rework other settings tab * this should also contain other settings such as the ability to change the theme of the app * there should be a button to apply settings which will reload the app with the new settings ## Audio post-processing * Support more effects from the `pedalboard` pakcage. * Guitar-style effects: Chorus, Distortion, Phaser, Clipping * Loudness and dynamic range effects: Compressor, Gain, Limiter * Equalizers and filters: HighpassFilter, LadderFilter, LowpassFilter * Spatial effects: Convolution, Delay, Reverb * Pitch effects: PitchShift * Lossy compression: GSMFullRateCompressor, MP3Compressor * Quality reduction: Resample, Bitcrush * NoiseGate * PeakFilter ## Audio Mixing * Add main gain loudness slider? * Add option to equalize output audio with respect to input audio * i.e. song cover gain (and possibly also more general dynamics) should be the same as those for source song. * check to see if pydub has functionality for this * otherwise a simple solution would be computing the RMS of the difference between the loudness of the input and output track ```python rms = np.sqrt(np.mean(np.square(signal))) dB = 20*np.log10(rms) #add db to output file in mixing function (using pydub) ``` * When this option is selected the option to set main gain of ouput should be disabled? * add more equalization options * using `pydub.effects` and `pydub.scipy_effects`? ## Custom UI * Experiment with new themes including [Building new ones](https://www.gradio.app/guides/theming-guid) * first of all make new theme that is like the default gradio 4 theme in terms of using semi transparent orange as the main color and semi-transparent grey for secondary color. The new gradio 5 theme is good apart from using solid colors so maybe use that as base theme. * Support both dark and light theme in app? * Add Support for changing theme in app? * Use Applio theme as inspiration for default theme? * Experiment with using custom CSS * Pass `css = {css_string}` to `gr.Blocks` and use `elem_classes` and `elem_id` to have components target the styles define in the CSS string. * Experiment with [custom DataFrame styling](https://www.gradio.app/guides/styling-the-gradio-dataframe) * Experiment with custom Javascript * Look for opportunities for defining new useful custom components ## Real-time vocal conversion * Should support being used as OBS plugin * Latency is real issue * Implementations details: * implement back-end in Rust? * implement front-end using svelte? * implement desktop application using C++ or C#? * see and for inspiration ## AI assistant mode * similar to vocal conversion streaming but instead of converting your voice on the fly, it should: * take your voice, * do some language modelling (with an LLM or something) * then produce an appropriate verbal response * We already have Kyutais [moshi](https://moshi.chat/?queue_id=talktomoshi) * Maybe that model can be finetuned to reply with a voice * i.e. your favorite singer, actor, best friend, family member. ## Ultimate RVC bot for discord * maybe also make a forum on discord? ## Make app production ready * have a "report a bug" tab like in applio? * should have separate accounts for users when hosting online * use `gr.LoginButton` and `gr.LogoutButton`? * deploy using docker * See * Host on own web-server with Nginx * see * Consider having concurrency limit be dynamic, i.e. instead of always being 1 for jobs using gpu consider having it depend upon what resources are available. * We can app set the GPU_CONCURRENCY limit to be os.envrion["GPU_CONCURRENCY_LIMIT] or 1 and then pass GPU_CONCURRENCY as input to places where event listeners are defined ## Colab notebook * find way of saving virtual environment with python 3.11 in colab notebook (DIFFICULT TO IMPLEMENT) * so that this environment can be loaded directly rather than downloading all dependencies every time app is opened ## Testing * Add example audio files to use for testing * Should be located in `audio/examples` * could have sub-folders `input` and `output` * in `output` folder we have `output_audio.ext` files each with a corresponding `input_audio.json` file containing metadata explaining arguments used to generate output * We can then test that actual output is close enough to expected output using audio similarity metric. * Setup unit testing framework using pytest