HRVC / notes /TODO.md
SpyC0der77's picture
Upload folder using huggingface_hub
1a942eb verified
|
raw
history blame
23.5 kB

TODO

  • should rename instances of "models" to "voice models"

Project/task management

  • Should find tool for project/task management
  • Tool should support:
    • hierarchical tasks
    • custom labels and or priorities on tasks
    • being able to filter tasks based on those labels
    • being able to close and resolve tasks
    • Being able to integrate with vscode
    • Access for multiple people (in a team)
  • Should migrate the content of this file into tool
  • Potential candidates
    • GitHub projects
      • Does not yet support hierarchical tasks so no
    • Trello
      • Does not seem to support hierarchical tasks either
    • Notion
      • Seems to support hierarchical tasks, but is complicated
    • Todoist
      • seems to support both hierarchical tasks, custom labels, filtering on those labels, multiple users and there are unofficial plugins for vscode.

Front end

Modularization

  • Improve modularization of web code using helper functions defined here
  • Split front-end modules into further sub-modules.
    • Structure of web folder should be:
      • web
        • manage_models
          • __init__.py
          • main.py
        • manage_audio
          • __init__.py
          • main.py
        • generate_song_covers
          • __init__.py
          • main.py
          • one_click_generation
            • __init__.py
            • main.py
            • accordions
              • __init__.py
              • options_x.py ... ?
          • multi_step_generation
            • __init__.py
            • main.py
            • accordions
              • __init__.py
              • step_X.py ...
        • common.py
      • For multi_step_generation/step_X.py, its potential render function might have to take the set of all "input tracks" in the multi-step generation tab, so these will then have to be defined in multi_step_generation/main.py. Other components passed to multi_step_generation/main.py might also need to be passed further down to multi_step_generation/step_X.py
      • For one_click_generation/option_X.py, its potential render function should render the accordion for the given options and return the components defined in the accordion? Other components passed to one_click_generation/main.py might also need to be passed further down to one_click_generation/option_X.py
    • Import components instead of passing them as inputs to render functions (DIFFICULT TO IMPLEMENT)
      • We have had problems before with component ids when components are instantiated outside a Blocks context in a separate module and then import into other modules and rendered in their blocks contexts.

Multi-step generation

  • If possible merge two consecutive event listeners using update_cached_songs in the song retrieval accordion.

  • add description describing how to use each accordion and suggestions for workflows

  • add option for adding more input tracks to the mix song step

    • new components should be created dynamically based on a textfield with names and a button for creating new component
    • when creating a new component a new transfer button and dropdown should also be created
    • and the transfer choices for all dropdowns should be updated to also include the new input track
    • we need to consider how to want to handle vertical space
      • should be we make a new row once more than 3 tracks are on one row?
        • yes and there should be also created the new slider on a new row
        • right under the first row (which itself is under the row with song dir dropdown)
  • should also have the possiblity to add more tracks to the pitch shift accordion.

  • add a confirmation box with warning if trying to transfer output track to input track that is not empty.

    • could also have the possibility to ask the user to transfer to create a new input track and transfer the output track to it.
    • this would just be the same pop up confirmation box as before but in addition to yes and cancel options it will also have a "transfer to new input track" option.
    • we need custom javasctip for this.

Common

  • fix problem with typing of block.launch()
    • problem stems from doing from gradio import routes
    • so instead should import from gradio.routes directly
    • open a pr with changes
  • save default values for options for song generation in an SongCoverOptionDefault enum.
    • then reference this enum across the two tabs
    • and also use list[SongCoverOptionDefault] as input to reset settings click event listener in single click generation tab.
  • Persist state of app (currently selected settings etc.) across re-renders
    • This includes:
      • refreshing a browser windows
      • Opening app in new browser window
      • Maybe it should also include when app is started anew?
    • Possible solutions
      • use gr.browserstate to allow state to be preserved acrross page loads.

      • Save any changes to components to a session dictionary and load from it upon refresh

        • See here
        • Problem is that this solution might not work with accordions or other types of blocks * should use .expand() and .collapse() event listeners on accordions to programmatically reset the state of accordions to what they were before after user has refreshed the page
      • Use localstorage

      • Whenever the state of a component is changed save the new state to a custom JSON file.

        • Then whenever the app is refreshed load the current state of components from the JSON file
        • This solution should probably work for Block types that are not components
  • need to fix the INFO: Could not find files for the given pattern(s) on startup of web application on windows (DIFFICULT TO IMPLEMENT)
    • this is an error that gradio needs to fix
  • Remove reset button on slider components (DIFFICULT TO IMPLEMENT)
    • this is a gradio feature that needs to be removed.
  • Fix that gradio removes special symbols from audio paths when loaded into audio components (DIFFICULT TO IMPLEMENT)
    • includes parenthesis, question marks, etc.
    • its a gradio bug so report?
  • Add button for cancelling any currently running jobs (DIFFICULT TO IMPLEMENT)
    • Not supported by Gradio natively
    • Also difficult to implement manually as Gradio seems to be running called backend functions in thread environments
  • dont show error upon missing confirmation (DIFFICULT TO IMPLEMENT)
    • can return gr.update()instead of raising an error in relevant event listener function
    • but problem is that subsequent steps will still be executed in this case
  • clearing temporary files with the delete_cache parameter only seems to work if all windows are closed before closing the app process (DIFFICULT TO IMPLEMENT)
    • this is a gradio bug so report?

Online hosting optimization

  • make concurrency_id and concurrency limit on components be dependent on whether gpu is used or not
    • if only cpu then there should be no limit
  • increase value of default_concurrency_limit in Block.queue so that the same event listener
    • can be called multiple times concurrently
  • use Block.launch() with max_file_size to prevent too large uploads
  • define as many functions with async as possible to increase responsiveness of app
    • and then use Block.launch() with max_threadsset to an appropriate value representing the number of concurrent threads that can be run on the server (default is 40)
  • Persist state of app (currently selected settings etc.) across re-renders
  • consider setting max_size in Block.queue() to explicitly limit the number of people that can be in the queue at the same time
  • clearing of temporary files should happen after a user logs in and out
    • and in this case it should only be temporary files for the active user that are cleared
      • Is that even possible to control?
  • enable server side rendering (requires installing node and setting ssr_mode = true in .launch) (DIFFICULT TO IMPLEMENT)
    • Also needs to set GRADIO_NODE_PATH to point to the node executable
    • problem is that on windows there is a ERR_UNSUPPORTED_ESM_URL_SCHEME which needs to be fixed by gradio
    • on linux it works but it is not possible to shutdown server using CTRL+ C

Back end

generate_song_cover.py

manage_models.py

  • use pandas.read_json to load public models table (DIFFICULT TO IMPLEMENT)

CLI

Add remaining CLI interfaces

  • Interface for core.manage_models
  • Interface for core.manage_audio
  • Interfaces for individual pipeline functions defined in core.generate_song_covers

python package management

  • need to make project version (in pyproject.toml) dynamic so that it is updated automatically when a new release is made
  • once diffq-fixed is used by audio-separator we can remove the url dependency on windows
    • we will still need to wait for uv to make it easy to install package with torch dependency
    • also it is still necessary to install pytorch first as it is not on pypi index
  • figure out way of making ./urvc commands execute faster
    • when ultimate rvc is downloaded as a pypi package the exposed commands are much faster so investigate this
  • update dependencies in pyproject.toml
    • use latest compatible version of all packages
    • remove commented out code, unless strictly necessary

Audio separation

  • expand back-end function(s) so that they are parametrized by both model type as well as model settings
    • Need to decide whether we only want to support common model settings or also settings that are unique to each model
      • It will probably be the latter, which will then require some extra checks.
    • Need to decide which models supported by audio_separator that we want to support
      • Not all of them seem to work
      • Probably MDX models and MDXC models
      • Maybe also VR and demucs?
    • Revisit online guide for optimal models and settings
  • In multi-step generation tab
    • Expand audio-separation accordion so that model can be selected and appropriate settings for that model can then be selected.
      • Model specific settings should expand based on selected model
  • In one-click generation
    • Should have an "vocal extration" option accordion
      • Should be able to choose which audio separation steps to include in pipeline
        • possible steps
          • step 1: separating audio form instrumentals
          • step 2: separating main vocals from background vocals:
          • step 3: de-reverbing vocals
        • Should pick steps from dropdown?
        • For each selected step a new sub-accordion with options for that step will then appear
          • Each accordion should include general settings
          • We should decide whether model specific settings should also be supported
          • We Should also decide whether sub-accordion should setting for choosing a model and if so render specific settings based the chosen model
      • Alternative layout:
        • have option to choose number of separation steps
        • then dynamically render sub accordions for each of the selected number of steps
          • In this case it should be possible to choose models for each accordion
            • this field should be iniitally empty
          • Other setttings should probably have sensible defaults that are the same
        • It might also be a good idea to then have an "examples" pane with recommended combinations of extractions steps
        • When one of these is selected, then the selected number of accordions with the preset settings should be filled out
    • optimize pre-processing
    • Alternatives to audio-separator package:

GitHub

Actions

  • linting with Ruff
  • typechecking with Pyright
  • running all tests
  • automatic building and publishing of project to pypi
    • includes automatic update of project version number
  • or use pre-commit?

README

  • Fill out TBA sections in README
  • Add note about not using with VPN?
  • Add different emblems/badges in header
    • like test coverage, build status, etc. (look at other projects for inspiration)
  • spice up text with emojis (look at tiango's projects for inspiration)

Releases

  • Make regular releases like done for Applio

    • Will be an .exe file that when run unzips contents into application folder, where ./urvc run can then be executed.
    • Could it be possible to have .exe file just start webapp when clicked?
  • Could also include pypi package as a release?

  • use pyinstaller to install app into executable that also includes sox and ffmpeg as dependencies (DLLs)

Other

Incorporate upstream changes

  • Incorporate RVC code from rvc-cli (i.e. changes from Applio)
    • more options for voice conversion and more efficient voice conversion
    • batch conversion sub-tab
    • TTS tab
    • Model training tab
    • support more pre-trained models
      • sub-tab under "manage models" tab
    • support for querying online database with many models that can be downloaded
    • support for audio and model analysis.
    • Voice blending tab
  • Incorporate latest changes from RVC-WebUI

Vocal Conversion

  • support arbitrary combination of pitch detection algorithms
  • Investigate using onnx models for inference speedup on cpu
  • Add more pitch detection methods
    • pm
    • harvest
    • dio
    • rvmpe+
  • Implement multi-gpu Inference

TTS conversion

Model management

Training models

Download models

  • Support batch downloading multiple models
    • requires a tabular request form where both a link column and a name column has to be filled out
    • we can allow selecting multiple items from public models table and then copying them over
  • support quering online database for models matching a given search string like what is done in applio app
    • first n rows of online database should be shown by default in public models table
      • more rows should be retrieved by scrolling down or clicking a button
    • user search string should filter/narrow returned number of rows in public models table
    • When clicking a set of rows they should then be copied over for downloading in the "download" table
  • support a column with preview sample in public models table
    • Only possible if voice snippets are also returned when querying the online database
  • Otherwise we can always support voice snippets for voice models that have already been downloaded
    • run model on sample text ("quick brown fox runs over the lazy") after it is downloaded
    • save the results in a audio/model_preview folder
    • Preview can then be loaded into a preview audio component when selecting a model from a dropdown
    • or if we replace the dropdown with a table with two columns we can have the audio track displayed in the second column

Model analysis

  • we could provide a new tab to analyze an existing model like what is done in applio

    • or this tab could be consolidated with the delete model tab?
  • we could also provide extra model information after model is downloaded

    • potentialy in dropdown to expand?

Audio management

General

  • Support audio information tool like in applio?
    • A new tab where you can upload a song to analyze?
  • more elaborate solution:
    • tab where where you
      • can select any song directory
      • select any step in the audio generation pipeline
      • then select any intermediate audio file generated in that step
      • Then have the possibility to
        • Listen to the song
        • see a table with its metadata (based on its associated .json file)
          • add timestamp to json files so they can be sorted in table according to creation date
        • And other statistics in a separate component (graph etc.)
    • Could have delete buttons both at the level of song_directory, step, and for each song?
    • Also consider splitting intermediate audio tracks for each step in to subfolder (0,1,2,3...)

Other settings

  • rework other settings tab
    • this should also contain other settings such as the ability to change the theme of the app
    • there should be a button to apply settings which will reload the app with the new settings

Audio post-processing

  • Support more effects from the pedalboard pakcage.
    • Guitar-style effects: Chorus, Distortion, Phaser, Clipping
    • Loudness and dynamic range effects: Compressor, Gain, Limiter
    • Equalizers and filters: HighpassFilter, LadderFilter, LowpassFilter
    • Spatial effects: Convolution, Delay, Reverb
    • Pitch effects: PitchShift
    • Lossy compression: GSMFullRateCompressor, MP3Compressor
    • Quality reduction: Resample, Bitcrush
    • NoiseGate
    • PeakFilter

Audio Mixing

  • Add main gain loudness slider?

  • Add option to equalize output audio with respect to input audio

    • i.e. song cover gain (and possibly also more general dynamics) should be the same as those for source song.
    • check to see if pydub has functionality for this
    • otherwise a simple solution would be computing the RMS of the difference between the loudness of the input and output track
      rms = np.sqrt(np.mean(np.square(signal)))
      dB  = 20*np.log10(rms)
      #add db to output file in mixing function (using pydub)
    
    • When this option is selected the option to set main gain of ouput should be disabled?
  • add more equalization options

    • using pydub.effects and pydub.scipy_effects?

Custom UI

  • Experiment with new themes including Building new ones
    • first of all make new theme that is like the default gradio 4 theme in terms of using semi transparent orange as the main color and semi-transparent grey for secondary color. The new gradio 5 theme is good apart from using solid colors so maybe use that as base theme.
    • Support both dark and light theme in app?
    • Add Support for changing theme in app?
    • Use Applio theme as inspiration for default theme?
  • Experiment with using custom CSS
    • Pass css = {css_string} to gr.Blocks and use elem_classes and elem_id to have components target the styles define in the CSS string.
  • Experiment with custom DataFrame styling
  • Experiment with custom Javascript
  • Look for opportunities for defining new useful custom components

Real-time vocal conversion

AI assistant mode

  • similar to vocal conversion streaming but instead of converting your voice on the fly, it should:
    • take your voice,
    • do some language modelling (with an LLM or something)
    • then produce an appropriate verbal response
  • We already have Kyutais moshi
    • Maybe that model can be finetuned to reply with a voice
    • i.e. your favorite singer, actor, best friend, family member.

Ultimate RVC bot for discord

  • maybe also make a forum on discord?

Make app production ready

  • have a "report a bug" tab like in applio?

  • should have separate accounts for users when hosting online

    • use gr.LoginButton and gr.LogoutButton?
  • deploy using docker

  • Host on own web-server with Nginx

  • Consider having concurrency limit be dynamic, i.e. instead of always being 1 for jobs using gpu consider having it depend upon what resources are available.

    • We can app set the GPU_CONCURRENCY limit to be os.envrion["GPU_CONCURRENCY_LIMIT] or 1 and then pass GPU_CONCURRENCY as input to places where event listeners are defined

Colab notebook

  • find way of saving virtual environment with python 3.11 in colab notebook (DIFFICULT TO IMPLEMENT)
    • so that this environment can be loaded directly rather than downloading all dependencies every time app is opened

Testing

  • Add example audio files to use for testing
    • Should be located in audio/examples
    • could have sub-folders input and output
      • in output folder we have output_audio.ext files each with a corresponding input_audio.json file containing metadata explaining arguments used to generate output
      • We can then test that actual output is close enough to expected output using audio similarity metric.
  • Setup unit testing framework using pytest