Spaces:
Runtime error
Runtime error
File size: 13,032 Bytes
5ba2f98 1a942eb 5ba2f98 1a942eb 5ba2f98 1a942eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
---
title: HRVC
app_file: src/ultimate_rvc/web/main.py
sdk: gradio
sdk_version: 5.6.0
---
# Ultimate RVC
An extension of [AiCoverGen](https://github.com/SociallyIneptWeeb/AICoverGen), which provides several new features and improvements, enabling users to generate song covers using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber, or for people who want to hear their favourite characters sing their favourite song.
<!-- Showcase: TBA -->
![ ](images/webui_generate.png?raw=true)
Ultimate RVC is under constant development and testing, but you can try it out right now locally or on Google Colab!
## New Features
* Easy and automated setup using launcher scripts for both windows and Debian-based linux systems
* Caching system which saves intermediate audio files as needed, thereby reducing inference time as much as possible. For example, if song A has already been converted using model B and now you want to convert song A using model C, then vocal extraction can be skipped and inference time reduced drastically
* Ability to listen to intermediate audio files in the UI. This is useful for getting an idea of what is happening in each step of the song cover generation pipeline
* A "multi-step" song cover generation tab: here you can try out each step of the song cover generation pipeline in isolation. For example, if you already have extracted vocals available and only want to convert these using your voice model, then you can do that here. Besides, this tab is useful for experimenting with settings for each step of the song cover generation pipeline
* An overhaul of the song input component for the song cover generation pipeline. Now cached input songs can be selected from a dropdown, so that you don't have to supply the Youtube link of a song each time you want to convert it.
* A new "manage models" tab, which collects and revamps all existing functionality for managing voice models, as well as adds some new features, such as the ability to delete existing models
* A new "manage audio" tab, which allows you to interact with all audio generated by the app. Currently, this tab supports deleting audio files.
* Lots of visual and performance improvements resulting from updating from Gradio 3 to Gradio 5 and from python 3.9 to python 3.12
* A redistributable package on PyPI, which allows you to access the Ultimate RVC project without cloning any repositories.
## Colab notebook
For those without a powerful enough NVIDIA GPU, you may try Ultimate RVC out using Google Colab.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JackismyShephard/ultimate-rvc/blob/main/notebooks/ultimate_rvc_colab.ipynb)
For those who want to run the Ultimate RVC project locally, follow the setup guide below.
## Setup
The Ultimate RVC project currently supports Windows and Debian-based Linux distributions, namely Ubuntu 22.04 and Ubuntu 24.04. Support for other platforms is not guaranteed.
To setup the project follow the steps below and execute the provided commands in an appropriate terminal. On windows this terminal should be **powershell**, while on Debian-based linux distributions it should be a **bash**-compliant shell.
### Install Git
Follow the instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to install Git on your computer.
### Set execution policy (Windows only)
To execute the subsequent commands on Windows, it is necessary to first grant
powershell permission to run scripts. This can be done at a user level as follows:
```console
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
```
### Clone Ultimate RVC repository
```console
git clone https://github.com/JackismyShephard/ultimate-rvc
cd ultimate-rvc
```
### Install dependencies
```console
./urvc install
```
Note that on Linux, this command will install the CUDA 12.4 toolkit system-wide, if it is not already available. In case you have problems, you may need to install the toolkit manually.
## Usage
### Start the app
```console
./urvc run
```
Once the following output message `Running on local URL: http://127.0.0.1:7860` appears, you can click on the link to open a tab with the web app.
### Manage models
#### Download models
![ ](images/webui_dl_model.png?raw=true)
Navigate to the `Download model` subtab under the `Manage models` tab, and paste the download link to an RVC model and give it a unique name.
You may search the [AI Hub Discord](https://discord.gg/aihub) where already trained voice models are available for download.
The downloaded zip file should contain the .pth model file and an optional .index file.
Once the 2 input fields are filled in, simply click `Download`! Once the output message says `[NAME] Model successfully downloaded!`, you should be able to use it in the `Generate song covers` tab!
#### Upload models
![ ](images/webui_upload_model.png?raw=true)
For people who have trained RVC v2 models locally and would like to use them for AI cover generations.
Navigate to the `Upload model` subtab under the `Manage models` tab, and follow the instructions.
Once the output message says `Model with name [NAME] successfully uploaded!`, you should be able to use it in the `Generate song covers` tab!
#### Delete RVC models
TBA
### Generate song covers
#### One-click generation
![ ](images/webui_generate.png?raw=true)
* From the Voice model dropdown menu, select the voice model to use.
* In the song input field, copy and paste the link to any song on YouTube, the full path to a local audio file, or select a cached input song.
* Pitch should be set to either -12, 0, or 12 depending on the original vocals and the RVC AI modal. This ensures the voice is not *out of tune*.
* Other advanced options for vocal conversion, audio mixing and etc. can be viewed by clicking the appropriate accordion arrow to expand.
Once all options are filled in, click `Generate` and the AI generated cover should appear in a less than a few minutes depending on your GPU.
#### Multi-step generation
TBA
## CLI
### Manual download of RVC models
Unzip (if needed) and transfer the `.pth` and `.index` files to a new folder in the [rvc models](models/rvc) directory. Each folder should only contain one `.pth` and one `.index` file.
The directory structure should look something like this:
```text
βββ models
| βββ audio_separator
| βββ rvc
β βββ John
β β βββ JohnV2.pth
β β βββ added_IVF2237_Flat_nprobe_1_v2.index
β βββ May
β β βββ May.pth
β β βββ added_IVF2237_Flat_nprobe_1_v2.index
β βββ hubert_base.pt
βββ notebooks
βββ notes
βββ src
```
### Running the pipeline
#### Usage
```console
./urvc cli song-cover run-pipeline [OPTIONS] SOURCE MODEL_NAME
```
##### Arguments
* `SOURCE`: A Youtube URL, the path to a local audio file or the path to a song directory. [required]
* `MODEL_NAME`: The name of the voice model to use for vocal conversion. [required]
##### Options
* `--n-octaves INTEGER`: The number of octaves to pitch-shift the converted vocals by.Use 1 for male-to-female and -1 for vice-versa. [default: 0]
* `--n-semitones INTEGER`: The number of semi-tones to pitch-shift the converted vocals, instrumentals, and backup vocals by. Altering this slightly reduces sound quality [default: 0]
* `--f0-method [rmvpe|mangio-crepe]`: The method to use for pitch detection during vocal conversion. Best option is RMVPE (clarity in vocals), then Mangio-Crepe (smoother vocals). [default: rmvpe]
* `--index-rate FLOAT RANGE`: A decimal number e.g. 0.5, Controls how much of the accent in the voice model to keep in the converted vocals. Increase to bias the conversion towards the accent of the voice model. [default: 0.5; 0<=x<=1]
* `--filter-radius INTEGER RANGE`: A number between 0 and 7. If >=3: apply median filtering to the pitch results harvested during vocal conversion. Can help reduce breathiness in the converted vocals. [default: 3; 0<=x<=7]
* `--rms-mix-rate FLOAT RANGE`: A decimal number e.g. 0.25. Controls how much to mimic the loudness of the input vocals (0) or a fixed loudness (1) during vocal conversion. [default: 0.25; 0<=x<=1]
* `--protect FLOAT RANGE`: A decimal number e.g. 0.33. Controls protection of voiceless consonants and breath sounds during vocal conversion. Decrease to increase protection at the cost of indexing accuracy. Set to 0.5 to disable. [default: 0.33; 0<=x<=0.5]
* `--hop-length INTEGER`: Controls how often the CREPE-based pitch detection algorithm checks for pitch changes during vocal conversion. Measured in milliseconds. Lower values lead to longer conversion times and a higher risk of voice cracks, but better pitch accuracy. Recommended value: 128. [default: 128]
* `--room-size FLOAT RANGE`: The room size of the reverb effect applied to the converted vocals. Increase for longer reverb time. Should be a value between 0 and 1. [default: 0.15; 0<=x<=1]
* `--wet-level FLOAT RANGE`: The loudness of the converted vocals with reverb effect applied. Should be a value between 0 and 1 [default: 0.2; 0<=x<=1]
* `--dry-level FLOAT RANGE`: The loudness of the converted vocals wihout reverb effect applied. Should be a value between 0 and 1. [default: 0.8; 0<=x<=1]
* `--damping FLOAT RANGE`: The absorption of high frequencies in the reverb effect applied to the converted vocals. Should be a value between 0 and 1. [default: 0.7; 0<=x<=1]
* `--main-gain INTEGER`: The gain to apply to the post-processed vocals. Measured in dB. [default: 0]
* `--inst-gain INTEGER`: The gain to apply to the pitch-shifted instrumentals. Measured in dB. [default: 0]
* `--backup-gain INTEGER`: The gain to apply to the pitch-shifted backup vocals. Measured in dB. [default: 0]
* `--output-sr INTEGER`: The sample rate of the song cover. [default: 44100]
* `--output-format [mp3|wav|flac|ogg|m4a|aac]`: The audio format of the song cover. [default: mp3]
* `--output-name TEXT`: The name of the song cover.
* `--help`: Show this message and exit.
## Update to latest version
```console
./urvc update
```
## Development mode
When developing new features or debugging, it is recommended to run the app in development mode. This enables hot reloading, which means that the app will automatically reload when changes are made to the code.
```console
./urvc dev
```
## PyPI package
The Ultimate RVC project is also available as a [distributable package](https://pypi.org/project/ultimate-rvc/) on [PyPI](https://pypi.org/).
### Installation
The package can be installed with pip in a **Python 3.12**-based environment. To do so requires first installing PyTorch with Cuda support:
```console
pip install torch==2.5.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
```
Additionally, on Windows the `diffq` package must be installed manually as follows:
```console
pip install https://huggingface.co/JackismyShephard/ultimate-rvc/resolve/main/diffq-0.2.4-cp312-cp312-win_amd64.whl
```
The Ultimate RVC project package can then be installed as follows:
```console
pip install ultimate-rvc
```
### Usage
The `ultimate-rvc` package can be used as a python library but is primarily intended to be used as a command line tool. The package exposes two top-level commands:
* `urvc` which lets the user generate song covers directly from their terminal
* `urvc-web` which starts a local instance of the Ultimate RVC web application
For more information on either command supply the option `--help`.
## Environment Variables
The behaviour of the Ultimate RVC project can be customized via a number of environment variables. Currently these environment variables control only logging behaviour. They are as follows:
* `URVC_CONSOLE_LOG_LEVEL`: The log level for console logging. If not set, defaults to `ERROR`.
* `URVC_FILE_LOG_LEVEL`: The log level for file logging. If not set, defaults to `INFO`.
* `URVC_LOGS_DIR`: The directory in which log files will be stored. If not set, logs will be stored in a `logs` directory in the current working directory.
* `URVC_NO_LOGGING`: If set to `1`, logging will be disabled.
## Terms of Use
The use of the converted voice for the following purposes is prohibited.
* Criticizing or attacking individuals.
* Advocating for or opposing specific political positions, religions, or ideologies.
* Publicly displaying strongly stimulating expressions without proper zoning.
* Selling of voice models and generated voice clips.
* Impersonation of the original owner of the voice with malicious intentions to harm/hurt others.
* Fraudulent purposes that lead to identity theft or fraudulent phone calls.
## Disclaimer
I am not liable for any direct, indirect, consequential, incidental, or special damages arising out of or in any way connected with the use/misuse or inability to use this software.
|