{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 测试效果\n", "\n", "- 测试代码: [speed_test.ipynb](speed_test.ipynb)\n", "- 测试环境: Intel i5-12400 CPU, 48GB RAM, 1x NVIDIA GeForce RTX 4070\n", "- 运行环境: Ubuntu 24.04.1 LTS, cuda 12.4, python 3.10.16\n", "- 测试说明: 单任务执行的数据(非并发测试)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 默认情况下使用" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "import asyncio\n", "import torchaudio\n", "\n", "import sys\n", "sys.path.append('third_party/Matcha-TTS')\n", "\n", "from cosyvoice.cli.cosyvoice import CosyVoice2\n", "from cosyvoice.utils.file_utils import load_wav\n", "\n", "prompt_text = '希望你以后能够做得比我还好哟'\n", "prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)\n", "\n", "# cosyvoice = CosyVoice2('./pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, fp16=True)\n", "cosyvoice = CosyVoice2('./pretrained_models/CosyVoice2-0.5B', load_jit=True, load_trt=True, fp16=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用vllm加速llm推理\n", "\n", "#### 1. **安装依赖**\n", "\n", "(该依赖环境下可以运行原本cosyvoice2代码)\n", "```bash\n", "pip install -r requirements_vllm.txt\n", "```\n", "\n", "#### 2. **文件复制**\n", "将 pretrained_models/CosyVoice2-0.5B/CosyVoice-BlankEN 文件夹下的部分文件复制到下载的CosyVoice2-0.5B模型文件夹下,并替换 config.json 文件中的 Qwen2ForCausalLM 为 CosyVoice2Model。\n", "```bash\n", "cp pretrained_models/CosyVoice2-0.5B/CosyVoice-BlankEN/{config.json,tokenizer_config.json,vocab.json,merges.txt} pretrained_models/CosyVoice2-0.5B/\n", "sed -i 's/Qwen2ForCausalLM/CosyVoice2Model/' pretrained_models/CosyVoice2-0.5B/config.json\n", "```\n", "\n", "#### **注意:**\n", "\n", "- 使用 load_trt 后,需要进行 **预热** 10次推理以上,使用流式推理预热效果较好\n", "- 在 jupyter notebook 中,如果要使用 **vllm** 运行下列代码,需要将vllm_use_cosyvoice2_model.py正确复制到 vllm 包中,并注册到 _VLLM_MODELS 字典中。运行下面的 code 完成" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import shutil\n", "\n", "# 获取vllm包的安装路径\n", "try:\n", " import vllm\n", "except ImportError:\n", " raise ImportError(\"vllm package not installed\")\n", "\n", "\n", "vllm_path = os.path.dirname(vllm.__file__)\n", "print(f\"vllm package path: {vllm_path}\")\n", "\n", "# 定义目标路径\n", "target_dir = os.path.join(vllm_path, \"model_executor\", \"models\")\n", "target_file = os.path.join(target_dir, \"cosyvoice2.py\")\n", "\n", "# 复制模型文件\n", "source_file = \"./cosyvoice/llm/vllm_use_cosyvoice2_model.py\"\n", "if not os.path.exists(source_file):\n", " raise FileNotFoundError(f\"Source file {source_file} not found\")\n", "\n", "shutil.copy(source_file, target_file)\n", "print(f\"Copied {source_file} to {target_file}\")\n", "\n", "# 修改registry.py文件\n", "registry_path = os.path.join(target_dir, \"registry.py\")\n", "new_entry = ' \"CosyVoice2Model\": (\"cosyvoice2\", \"CosyVoice2Model\"), # noqa: E501\\n'\n", "\n", "# 读取并修改文件内容\n", "with open(registry_path, \"r\") as f:\n", " lines = f.readlines()\n", "\n", "# 检查是否已存在条目\n", "entry_exists = any(\"CosyVoice2Model\" in line for line in lines)\n", "\n", "if not entry_exists:\n", " # 寻找插入位置\n", " insert_pos = None\n", " for i, line in enumerate(lines):\n", " if line.strip().startswith(\"**_FALLBACK_MODEL\"):\n", " insert_pos = i + 1\n", " break\n", " \n", " if insert_pos is None:\n", " raise ValueError(\"Could not find insertion point in registry.py\")\n", " \n", " # 插入新条目\n", " lines.insert(insert_pos, new_entry)\n", " \n", " # 写回文件\n", " with open(registry_path, \"w\") as f:\n", " f.writelines(lines)\n", " print(\"Successfully updated registry.py\")\n", "else:\n", " print(\"Entry already exists in registry.py, skipping modification\")\n", "\n", "print(\"All operations completed successfully!\")" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failed to import ttsfrd, use WeTextProcessing instead\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.\n", "/opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.\n", " deprecate(\"LoRACompatibleLinear\", \"1.0.0\", deprecation_message)\n", "2025-03-08 00:37:04,867 INFO input frame rate=25\n", "/opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:115: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'\n", " warnings.warn(\n", "2025-03-08 00:37:06,103 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst\n", "2025-03-08 00:37:06,103 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst\n", "2025-03-08 00:37:06,104 WETEXT INFO /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst\n", "2025-03-08 00:37:06,104 INFO /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst\n", "2025-03-08 00:37:06,104 WETEXT INFO skip building fst for zh_normalizer ...\n", "2025-03-08 00:37:06,104 INFO skip building fst for zh_normalizer ...\n", "2025-03-08 00:37:06,313 WETEXT INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst\n", "2025-03-08 00:37:06,313 INFO found existing fst: /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst\n", "2025-03-08 00:37:06,314 WETEXT INFO /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst\n", "2025-03-08 00:37:06,314 INFO /opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst\n", "2025-03-08 00:37:06,314 WETEXT INFO skip building fst for en_normalizer ...\n", "2025-03-08 00:37:06,314 INFO skip building fst for en_normalizer ...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO 03-08 00:37:07 __init__.py:207] Automatically detected platform cuda.\n", "WARNING 03-08 00:37:07 registry.py:352] Model architecture CosyVoice2Model is already registered, and will be overwritten by the new model class .\n", "WARNING 03-08 00:37:07 config.py:2517] Casting torch.bfloat16 to torch.float16.\n", "INFO 03-08 00:37:07 config.py:560] This model supports multiple tasks: {'embed', 'classify', 'reward', 'generate', 'score'}. Defaulting to 'generate'.\n", "INFO 03-08 00:37:07 config.py:1624] Chunked prefill is enabled with max_num_batched_tokens=1024.\n", "WARNING 03-08 00:37:08 utils.py:2164] CUDA was previously initialized. We must use the `spawn` multiprocessing start method. Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing for more information.\n", "INFO 03-08 00:37:10 __init__.py:207] Automatically detected platform cuda.\n", "INFO 03-08 00:37:11 core.py:50] Initializing a V1 LLM engine (v0.7.3.dev213+gede41bc7.d20250219) with config: model='./pretrained_models/CosyVoice2-0.5B', speculative_config=None, tokenizer='./pretrained_models/CosyVoice2-0.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=./pretrained_models/CosyVoice2-0.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={\"level\":3,\"custom_ops\":[\"none\"],\"splitting_ops\":[\"vllm.unified_attention\",\"vllm.unified_attention_with_output\"],\"use_inductor\":true,\"compile_sizes\":[],\"use_cudagraph\":true,\"cudagraph_num_of_warmups\":1,\"cudagraph_capture_sizes\":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],\"max_capture_size\":512}\n", "WARNING 03-08 00:37:11 utils.py:2298] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,list_loras,load_config,pin_lora,remove_lora,scheduler_config not implemented in \n", "INFO 03-08 00:37:11 parallel_state.py:948] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0\n", "INFO 03-08 00:37:11 gpu_model_runner.py:1055] Starting to load model ./pretrained_models/CosyVoice2-0.5B...\n", "INFO 03-08 00:37:11 cuda.py:157] Using Flash Attention backend on V1 engine.\n", "WARNING 03-08 00:37:11 topk_topp_sampler.py:46] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.\n", "WARNING 03-08 00:37:11 rejection_sampler.py:47] FlashInfer is not available. Falling back to the PyTorch-native implementation of rejection sampling. For the best performance, please install FlashInfer.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/torch/utils/_device.py:106: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n", " return func(*args, **kwargs)\n", "Loading pt checkpoint shards: 0% Completed | 0/1 [00:00\n", "2025-03-08 00:39:03,237 INFO not enough text token to decode, wait for more\n", "2025-03-08 00:39:03,252 INFO get fill token, need to append more text token\n", "2025-03-08 00:39:03,253 INFO append 5 text token\n", "2025-03-08 00:39:03,311 INFO get fill token, need to append more text token\n", "2025-03-08 00:39:03,312 INFO append 5 text token\n", "2025-03-08 00:39:03,456 INFO no more text token, decode until met eos\n", "2025-03-08 00:39:04,861 INFO yield speech len 15.16, rtf 0.1072180145334128\n", "100%|██████████| 1/1 [00:01<00:00, 1.88s/it]\n" ] } ], "source": [ "def text_generator():\n", " yield '收到好友从远方寄来的生日礼物,'\n", " yield '那份意外的惊喜与深深的祝福'\n", " yield '让我心中充满了甜蜜的快乐,'\n", " yield '笑容如花儿般绽放。'\n", "\n", " \n", "for i, j in enumerate(cosyvoice.inference_zero_shot(text_generator(), prompt_text, prompt_speech_16k, stream=False)):\n", " torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2025-03-08 00:39:04,878 INFO get tts_text generator, will skip text_normalize!\n", " 0%| | 0/1 [00:00\n", "2025-03-08 00:39:05,152 INFO not enough text token to decode, wait for more\n", "2025-03-08 00:39:05,169 INFO get fill token, need to append more text token\n", "2025-03-08 00:39:05,169 INFO append 5 text token\n", "2025-03-08 00:39:05,292 INFO get fill token, need to append more text token\n", "2025-03-08 00:39:05,293 INFO append 5 text token\n", "2025-03-08 00:39:05,438 INFO no more text token, decode until met eos\n", "2025-03-08 00:39:05,638 INFO yield speech len 1.84, rtf 0.26492670826289966\n", "2025-03-08 00:39:05,841 INFO yield speech len 2.0, rtf 0.10065567493438721\n", "2025-03-08 00:39:06,164 INFO yield speech len 2.0, rtf 0.16065263748168945\n", "2025-03-08 00:39:06,422 INFO yield speech len 2.0, rtf 0.12791669368743896\n", "2025-03-08 00:39:06,697 INFO yield speech len 2.0, rtf 0.13690149784088135\n", "2025-03-08 00:39:06,998 INFO yield speech len 2.0, rtf 0.14957869052886963\n", "2025-03-08 00:39:07,335 INFO yield speech len 1.0, rtf 0.3356931209564209\n", "100%|██████████| 1/1 [00:02<00:00, 2.46s/it]\n" ] } ], "source": [ "def text_generator():\n", " yield '收到好友从远方寄来的生日礼物,'\n", " yield '那份意外的惊喜与深深的祝福'\n", " yield '让我心中充满了甜蜜的快乐,'\n", " yield '笑容如花儿般绽放。'\n", "for i, j in enumerate(cosyvoice.inference_zero_shot(text_generator(), prompt_text, prompt_speech_16k, stream=True)):\n", " torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/1 [00:00