English
Jarvis73 commited on
Commit
76de374
1 Parent(s): 2758b30

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ fmha_plugins/9.2_plugin_cuda11/fMHAPlugin.so filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: tencent-hunyuan-community
4
+ license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ # HunyuanDiT TensorRT Acceleration
10
+
11
+ English | [中文](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs/blob/main/README_zh.md)
12
+
13
+ We provide a TensorRT version of [HunyuanDiT](https://github.com/Tencent/HunyuanDiT) for inference acceleration
14
+ (faster than flash attention). One can convert the torch model to TensorRT model using the following steps.
15
+
16
+ ## 1. Download dependencies from huggingface.
17
+
18
+ ```shell
19
+ cd HunyuanDiT
20
+ # Use the huggingface-cli tool to download the model.
21
+ huggingface-cli download Tencent-Hunyuan/TensorRT-libs --local-dir ./ckpts/t2i/model_trt
22
+ ```
23
+
24
+ ## 2. Install the TensorRT dependencies.
25
+
26
+ ```shell
27
+ sh trt/install.sh
28
+ ```
29
+
30
+ ## 3. Build the TensorRT engine.
31
+
32
+
33
+ ### Method 1: Use the prebuilt engine
34
+
35
+ We provide some prebuilt TensorRT engines.
36
+
37
+ | Supported GPU | Download Link | Remote Path |
38
+ |:----------------:|:---------------------------------------------------------------------------------------------------------------:|:---------------------------------:|
39
+ | GeForce RTX 3090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX3090/model_onnx.plan) | `engines/RTX3090/model_onnx.plan` |
40
+ | GeForce RTX 4090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX4090/model_onnx.plan) | `engines/RTX4090/model_onnx.plan` |
41
+ | A100 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/A100/model_onnx.plan) | `engines/A100/model_onnx.plan` |
42
+
43
+ Use the following command to download and place the engine in the specified location.
44
+
45
+ ```shell
46
+ huggingface-cli download Tencent-Hunyuan/TensorRT-engine <Remote Path> --local-dir ./ckpts/t2i/model_trt/engine
47
+ ```
48
+
49
+ ### Method 2: Build your own engine
50
+
51
+ If you are using a different GPU, you can build the engine using the following command.
52
+
53
+ ```shell
54
+ # Set the TensorRT build environment variables first. We provide a script to set up the environment.
55
+ source trt/activate.sh
56
+
57
+ # Method 1: Build the TensorRT engine. By default, it will read the `ckpts` folder in the current directory.
58
+ sh trt/build_engine.sh
59
+
60
+ # Method 2: If your model directory is not `ckpts`, you need to specify the model directory.
61
+ sh trt/build_engine.sh </path/to/ckpts>
62
+ ```
63
+
64
+ 4. Run the inference using the TensorRT model.
65
+
66
+ ```shell
67
+ # Run the inference using the prompt-enhanced model + HunyuanDiT TensorRT model.
68
+ python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt
69
+
70
+ # Close prompt enhancement. (save GPU memory)
71
+ python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt --no-enhance
72
+ ```
73
+
README_zh.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 混元-DiT TensorRT 加速
2
+
3
+ [English](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs/blob/main/README.md) | 中文
4
+
5
+ 我们提供了将 [混元-DiT](https://github.com/Tencent/HunyuanDiT) 中的文生图模型转换为 TensorRT 的代码和相关依赖用于推理加速
6
+ (比 Flash Attention 更快)。 您可以使用以下步骤使用我们 TensorRT 模型。
7
+
8
+ ## 1. 从 Huggingface 下载 TensorRT 的依赖文件
9
+
10
+ ```shell
11
+ cd HunyuanDiT
12
+
13
+ # Download the dependencies
14
+ huggingface-cli download Tencent-Hunyuan/TensorRT-libs --local-dir ./ckpts/t2i/model_trt
15
+ ```
16
+
17
+ ## 2. 安装 TensorRT 依赖
18
+
19
+ ```shell
20
+ sh trt/install.sh
21
+ ```
22
+
23
+ ## 3. 构建 TensorRT engine
24
+
25
+ ### 方法1: 使用预构建的 engine
26
+
27
+ 本仓库提供了一些预构建的 TensorRT engine.
28
+
29
+ | 支持的 GPU | 文件链接 | 远程地址 |
30
+ |:----------------:|:---------------------------------------------------------------------------------------------------------------:|:---------------------------------:|
31
+ | GeForce RTX 3090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX3090/model_onnx.plan) | `engines/RTX3090/model_onnx.plan` |
32
+ | GeForce RTX 4090 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/RTX4090/model_onnx.plan) | `engines/RTX4090/model_onnx.plan` |
33
+ | A100 | [HuggingFace](https://huggingface.co/Tencent-Hunyuan/TensorRT-engine/blob/main/engines/A100/model_onnx.plan) | `engines/A100/model_onnx.plan` |
34
+
35
+ 可以使用以下命令下载并放置在指定的位置
36
+
37
+ ```shell
38
+ huggingface-cli download Tencent-Hunyuan/TensorRT-engine <远程地址> --local-dir ./ckpts/t2i/model_trt/engine
39
+ ```
40
+
41
+ ### 方法2: 自行构建 engine
42
+ 如果您使用不同于上面表格中的 GPU, 可以使用以下命令构建适配于当前 GPU 的 engine.
43
+
44
+ ```shell
45
+ # 首先设置 TensorRT 构建相关的环境变量,我们提供了一个脚本来一键设置
46
+ source trt/activate.sh
47
+
48
+ # 方式1: 构建 TensorRT engine. 默认会读取当前目录下的 ckpts 文件夹
49
+ sh trt/build_engine.sh
50
+
51
+ # 方式2: 如果您的模型目录不是 ckpts, 需要指定模型目录
52
+ sh trt/build_engine.sh </path/to/ckpts>
53
+ ```
54
+
55
+ 4. 使用 TensorRT 模型进行推理.
56
+
57
+ ```shell
58
+ # 使用 prompt 强化 + 文生图 TensorRT 模型进行推理
59
+ python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt
60
+
61
+ # 关闭 prompt 强化 (可以在显存不足时使用)
62
+ python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt --no-enhance
63
+ ```
TensorRT-9.2.0.5.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4ae57919c3747836e60ffb6b36a3975c9145e41cbe2027e7f6c9d1071c8e2b8
3
+ size 2453863376
fmha_plugins/9.2_plugin_cuda11/fMHAPlugin.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:913f1697f0a16aa25d7814a6ee482d82cc2d083439f2efd5047ed4e728217687
3
+ size 99438864