Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

.gitattributes +1 -0
README.md +73 -0
README_zh.md +63 -0
TensorRT-9.2.0.5.tar.gz +3 -0
fmha_plugins/9.2_plugin_cuda11/fMHAPlugin.so +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+fmha_plugins/9.2_plugin_cuda11/fMHAPlugin.so filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+---
+license: other
+license_name: tencent-hunyuan-community
+license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
+language:
+- en
+---
+# HunyuanDiT TensorRT Acceleration
+English | [中文](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs/blob/main/README_zh.md)
+We provide a TensorRT version of [HunyuanDiT](https://github.com/Tencent/HunyuanDiT) for inference acceleration
+(faster than flash attention). One can convert the torch model to TensorRT model using the following steps.
+## 1. Download dependencies from huggingface.
+```shell
+cd HunyuanDiT
+# Use the huggingface-cli tool to download the model.
+huggingface-cli download Tencent-Hunyuan/TensorRT-libs --local-dir ./ckpts/t2i/model_trt
+```
+## 2. Install the TensorRT dependencies.
+```shell
+sh trt/install.sh
+```
+## 3. Build the TensorRT engine.
+### Method 1: Use the prebuilt engine
+We provide some prebuilt TensorRT engines.
+|  Supported GPU   |                                                  Download Link                                                  |            Remote Path            |
+|:----------------:|:---------------------------------------------------------------------------------------------------------------:|:---------------------------------:|
+| GeForce RTX 3090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX3090/model_onnx.plan) | `engines/RTX3090/model_onnx.plan` |
+| GeForce RTX 4090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX4090/model_onnx.plan) | `engines/RTX4090/model_onnx.plan` |
+|       A100       |  [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/A100/model_onnx.plan)   |  `engines/A100/model_onnx.plan`   |
+Use the following command to download and place the engine in the specified location.
+```shell
+huggingface-cli download Tencent-Hunyuan/TensorRT-engine <Remote Path> --local-dir ./ckpts/t2i/model_trt/engine
+```
+### Method 2: Build your own engine
+If you are using a different GPU, you can build the engine using the following command.
+```shell
+# Set the TensorRT build environment variables first. We provide a script to set up the environment.
+source trt/activate.sh
+# Method 1: Build the TensorRT engine. By default, it will read the `ckpts` folder in the current directory.
+sh trt/build_engine.sh
+# Method 2: If your model directory is not `ckpts`, you need to specify the model directory.
+sh trt/build_engine.sh </path/to/ckpts>
+```
+4. Run the inference using the TensorRT model.
+```shell
+# Run the inference using the prompt-enhanced model + HunyuanDiT TensorRT model.
+python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt
+# Close prompt enhancement. (save GPU memory)
+python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt --no-enhance
+```

README_zh.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# 混元-DiT TensorRT 加速
+[English](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs/blob/main/README.md) | 中文
+我们提供了将 [混元-DiT](https://github.com/Tencent/HunyuanDiT) 中的文生图模型转换为 TensorRT 的代码和相关依赖用于推理加速
+（比 Flash Attention 更快）。 您可以使用以下步骤使用我们 TensorRT 模型。
+## 1. 从 Huggingface 下载 TensorRT 的依赖文件
+```shell
+cd HunyuanDiT
+# Download the dependencies
+huggingface-cli download Tencent-Hunyuan/TensorRT-libs --local-dir ./ckpts/t2i/model_trt
+```
+## 2. 安装 TensorRT 依赖
+```shell
+sh trt/install.sh
+```
+## 3. 构建 TensorRT engine
+### 方法1: 使用预构建的 engine
+本仓库提供了一些预构建的 TensorRT engine.
+|     支持的 GPU      |                                                      文件链接                                                       |               远程地址                |
+|:----------------:|:---------------------------------------------------------------------------------------------------------------:|:---------------------------------:|
+| GeForce RTX 3090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX3090/model_onnx.plan) | `engines/RTX3090/model_onnx.plan` |
+| GeForce RTX 4090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX4090/model_onnx.plan) | `engines/RTX4090/model_onnx.plan` |
+|       A100       |  [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/A100/model_onnx.plan)   |  `engines/A100/model_onnx.plan`   |
+可以使用以下命令下载并放置在指定的位置
+```shell
+huggingface-cli download Tencent-Hunyuan/TensorRT-engine <远程地址> --local-dir ./ckpts/t2i/model_trt/engine
+```
+### 方法2: 自行构建 engine
+如果您使用不同于上面表格中的 GPU, 可以使用以下命令构建适配于当前 GPU 的 engine.
+```shell
+# 首先设置 TensorRT 构建相关的环境变量，我们提供了一个脚本来一键设置
+source trt/activate.sh
+# 方式1: 构建 TensorRT engine. 默认会读取当前目录下的 ckpts 文件夹
+sh trt/build_engine.sh
+# 方式2: 如果您的模型目录不是 ckpts, 需要指定模型目录
+sh trt/build_engine.sh </path/to/ckpts>
+```
+4. 使用 TensorRT 模型进行推理.
+```shell
+# 使用 prompt 强化 + 文生图 TensorRT 模型进行推理
+python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt
+# 关闭 prompt 强化 (可以在显存不足时使用)
+python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt --no-enhance
+```

TensorRT-9.2.0.5.tar.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4ae57919c3747836e60ffb6b36a3975c9145e41cbe2027e7f6c9d1071c8e2b8
+size 2453863376

fmha_plugins/9.2_plugin_cuda11/fMHAPlugin.so ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:913f1697f0a16aa25d7814a6ee482d82cc2d083439f2efd5047ed4e728217687
+size 99438864