MatteoFasulo commited on
Commit
7464ef1
·
verified ·
1 Parent(s): 0af3d42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +237 -1
README.md CHANGED
@@ -10,4 +10,240 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # Introducing Whisper-TikTok 🤖🎥
14
+
15
+ ## Star History
16
+
17
+ [![Star History Chart](https://api.star-history.com/svg?repos=MatteoFasulo/Whisper-TikTok&type=Date)](https://star-history.com/#MatteoFasulo/Whisper-TikTok&Date)
18
+
19
+ ## Table of Contents
20
+
21
+ - [Introduction](#introduction)
22
+ - [Video (demo)](#demo-video)
23
+ - [How it works?](#how-it-works)
24
+ - [Web App (Online)](#web-app-online)
25
+ - [Streamlit Web App](#streamlit-web-app)
26
+ - [Local Installation](#local-installation)
27
+ - [Dependencies](#dependencies)
28
+ - [Web-UI (Local)](#web-ui-local)
29
+ - [Command-Line](#command-line)
30
+ - [Usage Examples](#usage-examples)
31
+ - [Additional Resources](#additional-resources)
32
+ - [Code of Conduct](#code-of-conduct)
33
+ - [Contributing](#contributing)
34
+ - [Upcoming Features](#upcoming-features)
35
+ - [Acknowledgments](#acknowledgments)
36
+ - [License](#license)
37
+
38
+ ## Introduction
39
+ Discover Whisper-TikTok, an innovative AI-powered tool that leverages the prowess of **Edge TTS**, **OpenAI-Whisper**, and **FFMPEG** to craft captivating TikTok videos. Harnessing the capabilities of OpenAI's Whisper model, Whisper-TikTok effortlessly generates an accurate **transcription** from provided audio files, laying the foundation for the creation of mesmerizing TikTok videos through the utilization of **FFMPEG**. Additionally, the program seamlessly integrates the **Microsoft Edge Cloud Text-to-Speech (TTS) API** to lend a vibrant **voiceover** to the video. Opting for Microsoft Edge Cloud TTS API's voiceover is a deliberate choice, as it delivers a remarkably **natural and authentic** auditory experience, setting it apart from the often monotonous and artificial voiceovers prevalent in numerous TikTok videos.
40
+
41
+ ## Streamlit Web App
42
+
43
+ ![Webui](docs/WebuiDemo.png)
44
+
45
+ ## Demo Video
46
+
47
+ <https://github.com/MatteoFasulo/Whisper-TikTok/assets/74818541/68e25504-c305-4144-bd39-c9acc218c3a4>
48
+
49
+ ## How it Works
50
+
51
+ Employing Whisper-TikTok is a breeze: simply modify the [video.json](video.json). The JSON file contains the following fields:
52
+
53
+ - `series`: The name of the series.
54
+ - `part`: The part number of the video.
55
+ - `text`: The text to be spoken in the video.
56
+ - `outro`: The outro text to be spoken in the video.
57
+ - `tags`: The tags to be used for the video.
58
+
59
+ Summarizing the program's functionality:
60
+
61
+ > Furnished with a structured JSON dataset containing details such as the **series name**, **video part number**, **video text** and **outro text**, the program orchestrates the synthesis of a video incorporating the provided text and outro. Subsequently, the generated video is stored within the designated `output` folder.
62
+
63
+ <details>
64
+ <summary>Details</summary>
65
+
66
+ The program conducts the **sequence of actions** outlined below:
67
+
68
+ 1. Retrieve **environment variables** from the optional .env file.
69
+ 2. Validate the presence of **PyTorch** with **CUDA** installation. If the requisite dependencies are **absent**, the **program will use the CPU instead of the GPU**.
70
+ 3. Download a random video from platforms like YouTube, e.g., a Minecraft parkour gameplay clip.
71
+ 4. Load the OpenAI Whisper model into memory.
72
+ 5. Extract the video text from the provided JSON file and initiate a **Text-to-Speech** request to the Microsoft Edge Cloud TTS API, preserving the response as an .mp3 audio file.
73
+ 6. Utilize the OpenAI Whisper model to generate a detailed **transcription** of the .mp3 file, available in .srt format.
74
+ 7. Select a **random background** video from the dedicated folder.
75
+ 8. Integrate the srt file into the chosen video using FFMPEG, creating a final .mp4 output.
76
+ 9. Upload the video to TikTok using the TikTok session cookie. For this step it is required to have a TikTok account and to be logged in on your browser. Then the required `cookies.txt` file can be generated using [this guide available here](https://github.com/kairi003/Get-cookies.txt-LOCALLY). The `cookies.txt` file must be placed in the root folder of the project.
77
+ 10. Voila! In a matter of minutes, you've crafted a captivating TikTok video while sipping your favorite coffee ☕️.
78
+
79
+ </details>
80
+
81
+ ## Web App (Online)
82
+
83
+ There is a Web App hosted thanks to Streamlit which is public available, just click on the link that will take you directly to the Web App.
84
+ > https://convert.streamlit.app
85
+
86
+ ## Local Installation
87
+
88
+ Whisper-TikTok has undergone rigorous testing on Windows 10, Windows 11 and Ubuntu 23.04 systems equipped with **Python versions 3.8, 3.9 and 3.11**.
89
+
90
+ If you want to run Whisper-TikTok locally, you can clone the repository using the following command:
91
+
92
+ ```bash
93
+ git clone https://github.com/MatteoFasulo/Whisper-TikTok.git
94
+ ```
95
+
96
+ > However, there is also a Docker image available for Whisper-TikTok which can be used to run the program in a containerized environment.
97
+
98
+ # Dependencies
99
+
100
+ To streamline the installation of necessary dependencies, execute the following command within your terminal:
101
+
102
+ ```python
103
+ pip install -U -r requirements.txt
104
+ ```
105
+
106
+ It also requires the command-line tool [**FFMPEG**](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:
107
+
108
+ ```bash
109
+ # on Ubuntu or Debian
110
+
111
+ sudo apt update && sudo apt install ffmpeg
112
+
113
+ # on Arch Linux
114
+
115
+ sudo pacman -S ffmpeg
116
+
117
+ # on MacOS using Homebrew (<https://brew.sh/>)
118
+
119
+ brew install ffmpeg
120
+
121
+ # on Windows using Chocolatey (<https://chocolatey.org/>)
122
+
123
+ choco install ffmpeg
124
+
125
+ # on Windows using Scoop (<https://scoop.sh/>)
126
+
127
+ scoop install ffmpeg
128
+ ```
129
+
130
+ >Please note that for optimal performance, it's advisable to have a GPU when using the OpenAI Whisper model for speech recognition. However, the program will work without a GPU, but it will run more slowly. This performance difference is because GPUs efficiently handle fp16 computation, while CPUs use fp32 or fp64 (depending on your machine), which are slower.
131
+
132
+ ## Web-UI (Local)
133
+
134
+ To run the Web-UI locally, execute the following command within your terminal:
135
+
136
+ ```bash
137
+ streamlit run app.py --server.port=8501 --server.address=0.0.0.0
138
+ ```
139
+
140
+ ## Command-Line
141
+
142
+ To run the program from the command-line, execute the following command within your terminal:
143
+
144
+ ```bash
145
+ python main.py
146
+ ```
147
+
148
+ ### CLI Options
149
+
150
+ Whisper-TikTok supports the following command-line options:
151
+
152
+ ```
153
+ python main.py [OPTIONS]
154
+
155
+ Options:
156
+ --model TEXT Model to use [tiny|base|small|medium|large] (Default: small)
157
+ --non_english Use general model, not the English one specifically. (Flag)
158
+ --url TEXT YouTube URL to download as background video. (Default: <https://www.youtube.com/watch?v=intRX7BRA90>)
159
+ --tts TEXT Voice to use for TTS (Default: en-US-ChristopherNeural)
160
+ --list-voices Use `edge-tts --list-voices` to list all voices.
161
+ --random_voice Random voice for TTS (Flag)
162
+ --gender TEXT Gender of the random TTS voice [Male|Female].
163
+ --language TEXT Language of the random TTS voice(e.g., en-US)
164
+ --sub_format TEXT Subtitle format to use [u|i|b] (Default: b) | b (Bold), u (Underline), i (Italic)
165
+ --sub_position INT Subtitle position to use [1-9] (Default: 5)
166
+ --font TEXT Font to use for subtitles (Default: Lexend Bold)
167
+ --font_color TEXT Font color to use for subtitles in HEX format (Default: #FFF000).
168
+ --font_size INT Font size to use for subtitles (Default: 21)
169
+ --max_characters INT Maximum number of characters per line (Default: 38)
170
+ --max_words INT Maximum number of words per segment (Default: 2)
171
+ --upload_tiktok Upload the video to TikTok (Flag)
172
+ -v, --verbose Verbose (Flag)
173
+ ```
174
+
175
+ > If you use the --random_voice option, please specify both --gender and --language arguments. Also you will need to specify the --non_english argument if you want to use a non-English voice otherwise the program will use the English model. Whisper model will auto-detect the language of the audio file and use the corresponding model.
176
+
177
+ ## Usage Examples
178
+
179
+ - Generate a TikTok video using a specific TTS model and voice:
180
+
181
+ ```bash
182
+ python main.py --model medium --tts en-US-EricNeural
183
+ ```
184
+
185
+ - Generate a TikTok video without using the English model:
186
+
187
+ ```bash
188
+ python main.py --non_english --tts de-DE-KillianNeural
189
+ ```
190
+
191
+ - Use a custom YouTube video as the background video:
192
+
193
+ ```bash
194
+ python main.py --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --tts en-US-JennyNeural
195
+ ```
196
+
197
+ - Modify the font color of the subtitles:
198
+
199
+ ```
200
+ python main.py --sub_format b --font_color #FFF000 --tts en-US-JennyNeural
201
+ ```
202
+
203
+ - Generate a TikTok video with a random TTS voice:
204
+
205
+ ```bash
206
+ python main.py --random_voice --gender Male --language en-US
207
+ ```
208
+
209
+ - List all available voices:
210
+
211
+ ```bash
212
+ edge-tts --list-voices
213
+ ```
214
+
215
+ ## Additional Resources
216
+
217
+ ### Accelerate Video Creation
218
+ > Contributed by [@duozokker](<https://github.com/duozokker>)
219
+
220
+ **reddit2json** is a Python script that transforms Reddit post URLs into a JSON file, streamlining the process of creating video.json files. This tool not only converts Reddit links but also offers functionalities such as translating Reddit post content using DeepL and modifying content through custom OpenAI GPT calls.
221
+
222
+ #### reddit2json: Directly Convert Reddit Links to JSON
223
+
224
+ reddit2json is designed to process a list of Reddit post URLs, converting them into a JSON format that can be used directly for video creation. This tool enhances the video creation process by providing a faster and more efficient way to generate video.json files.
225
+
226
+ [Here is the detailed README for reddit2json](https://github.com/duozokker/reddit2json/blob/main/README.md) which includes instructions for installation, setting up the .env file, example calls, and more.
227
+
228
+ ## Code of Conduct
229
+
230
+ Please review our [Code of Conduct](./CODE_OF_CONDUCT.md) before contributing to Whisper-TikTok.
231
+
232
+ ## Contributing
233
+
234
+ We welcome contributions from the community! Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information.
235
+
236
+ ## Upcoming Features
237
+
238
+ - Integration with the OpenAI API to generate more advanced responses.
239
+ - Generate content by extracting it from reddit <https://github.com/MatteoFasulo/Whisper-TikTok/issues/22>
240
+
241
+ ## Acknowledgments
242
+
243
+ - We'd like to give a huge thanks to [@rany2](https://www.github.com/rany2) for their [edge-tts](https://github.com/rany2/edge-tts) package, which made it possible to use the Microsoft Edge Cloud TTS API with Whisper-TikTok.
244
+ - We also acknowledge the contributions of the Whisper model by [@OpenAI](https://github.com/openai/whisper) for robust speech recognition via large-scale weak supervision
245
+ - Also [@jianfch](https://github.com/jianfch/stable-ts) for the stable-ts package, which made it possible to use the OpenAI Whisper model with Whisper-TikTok in a stable manner with font color and subtitle format options.
246
+
247
+ ## License
248
+
249
+ Whisper-TikTok is licensed under the [Apache License, Version 2.0](https://github.com/MatteoFasulo/Whisper-TikTok/blob/main/LICENSE).