RVC-MAKER / README.md
NeoPy's picture
Update README.md
eb4943d verified
|
raw
history blame
10 kB
metadata
license: mit
title: RVC MAKER
sdk: gradio
emoji: 👀
colorFrom: blue
app_file: main/app/app.py
colorTo: blue
pinned: true
LOGO
yeh

A high-quality voice conversion tool focused on ease of use and performance.

RVC MAKER Open In Colab Licence

Description

This project is all in one, easy-to-use voice conversion tool. With the goal of creating high-quality and high-performance voice conversion products, the project allows users to change voices smoothly and naturally.

Project Features

  • Music separation (MDX-Net/Demucs)

  • Voice conversion (File conversion/Batch conversion/Conversion with Whisper/Text-to-speech conversion)

  • Background music editing

  • Apply effects to audio

  • Generate training data (From linked paths)

  • Model training (v1/v2, high-quality encoders)

  • Model fusion

  • Read model information

  • Export models to ONNX

  • Download from pre-existing model repositories

  • Search for models on the web

  • Pitch extraction

  • Support for audio conversion inference using ONNX models

  • ONNX RVC models also support indexing for inference

  • Multiple model options:

F0: pm, dio, mangio-crepe-tiny, mangio-crepe-small, mangio-crepe-medium, mangio-crepe-large, mangio-crepe-full, crepe-tiny, crepe-small, crepe-medium, crepe-large, crepe-full, fcpe, fcpe-legacy, rmvpe, rmvpe-legacy, harvest, yin, pyin, swipe

F0_ONNX: Some models are converted to ONNX to support accelerated extraction

F0_HYBRID: Multiple options can be combined, such as hybrid[rmvpe+harvest], or you can try combining all options together

EMBEDDERS: contentvec_base, hubert_base, japanese_hubert_base, korean_hubert_base, chinese_hubert_base, portuguese_hubert_base

EMBEDDERS_ONNX: All the above embedding models have ONNX versions pre-converted for accelerated embedding extraction

EMBEDDERS_TRANSFORMERS: All the above embedding models have versions pre-converted to Hugging Face for use as an alternative to Fairseq

SPIN_EMBEDDERS: A new embedding extraction model that may provide higher quality than older extractions

Usage Instructions

Will be provided if I’m truly free...

Installation and Usage

  • Step 1: Install Python from the official website or Python (REQUIRES PYTHON 3.10.x OR PYTHON 3.11.x)
  • Step 2: Install FFmpeg from FFMPEG, extract it, and add it to PATH
  • Step 3: Download and extract the source code
  • Step 4: Navigate to the source code directory and open Command Prompt or Terminal
  • Step 5: Run the command to install the required libraries

python -m venv envenv\Scripts\activate

If you have an NVIDIA GPU, run this step depending on your CUDA version (you may need to change cu117 to cu128, etc.):

If using Torch 2.3.1 python -m pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu117 If using Torch 2.6.0 python -m pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu117

Then run:

python -m pip install -r requirements.txt

  • Step 6: Run the run_app file to open the user interface (Note: Do not close the Command Prompt or Terminal for the interface)
  • Alternatively, use Command Prompt or Terminal in the source code directory
  • To allow the interface to access files outside the project, add --allow_all_disk to the command:

env\Scripts\python.exe main\app\app.py --open

To use TensorBoard for training monitoring:

Run the file: tensorboard or the command env\Scripts\python.exe main\app\tensorboard.py

Command-Line Usage

python main\app\parser.py --help

NOTES

  • This project only supports NVIDIA GPUs
  • Currently, new encoders like MRF HIFIGAN do not yet have complete pre-trained datasets
  • MRF HIFIGAN and REFINEGAN encoders do not support training without pitch training

Terms of Use

  • You must ensure that the audio content you upload and convert through this project does not violate the intellectual property rights of third parties.

  • The project must not be used for any illegal activities, including but not limited to fraud, harassment, or causing harm to others.

  • You are solely responsible for any damages arising from improper use of the product.

  • I will not be responsible for any direct or indirect damages arising from the use of this project.

This Project is Built Based on the Following Projects

Project Author/Organization License
Vietnamese-RVC Phạm Huỳnh Anh Apache License 2.0
Applio IAHispano MIT License
Python-audio-separator Nomad Karaoke MIT License
Retrieval-based-Voice-Conversion-WebUI RVC Project MIT License
RVC-ONNX-INFER-BY-Anh Phạm Huỳnh Anh MIT License
Torch-Onnx-Crepe-By-Anh Phạm Huỳnh Anh MIT License
Hubert-No-Fairseq Phạm Huỳnh Anh MIT License
Local-attention Phil Wang MIT License
TorchFcpe CN_ChiTu MIT License
FcpeONNX Yury MIT License
ContentVec Kaizhi Qian MIT License
Mediafiredl Santiago Ariel Mansilla MIT License
Noisereduce Tim Sainburg MIT License
World.py-By-Anh Phạm Huỳnh Anh MIT License
Mega.py O'Dwyer Software Apache 2.0 License
Gdown Kentaro Wada MIT License
Whisper OpenAI MIT License
PyannoteAudio pyannote MIT License
AudioEditingCode Hila Manor MIT License
StftPitchShift Jürgen Hock MIT License
Codename-RVC-Fork-3 Codename;0 MIT License

Model Repository for Model Search Tool

Pitch Extraction Methods in RVC

This document provides detailed information on the pitch extraction methods used, including their advantages, limitations, strengths, and reliability based on personal experience.

Method Type Advantages Limitations Strength Reliability
pm Praat Fast Less accurate Low Low
dio PYWORLD Suitable for rap Less accurate at high frequencies Medium Medium
harvest PYWORLD More accurate than DIO Slower processing High Very high
crepe Deep Learning High accuracy Requires GPU Very high Very high
mangio-crepe Crepe finetune Optimized for RVC Sometimes less accurate than original crepe Medium to high Medium to high
fcpe Deep Learning Accurate, real-time Requires powerful GPU Good Medium
fcpe-legacy Old Accurate, real-time Older Good Medium
rmvpe Deep Learning Effective for singing voices Resource-intensive Very high Excellent
rmvpe-legacy Old Supports older systems Older High Good
yin Librosa Simple, efficient Prone to octave errors Medium Low
pyin Librosa More stable than YIN More complex computation Good Good
swipe WORLD High accuracy Sensitive to noise High Good

Bug Reporting

  • If you encounter an error while using this source code, I sincerely apologize for the poor experience. You can report the bug using the methods below.

  • you can report bugs to us via ISSUE.