File size: 5,088 Bytes
d6d049b
 
 
 
 
 
 
 
 
 
 
3912a59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: Videodubber
emoji: ๐Ÿƒ
colorFrom: purple
colorTo: pink
sdk: streamlit
sdk_version: 1.38.0
app_file: app.py
pinned: false
---

# Video Dubber

The program for automatic dubbing any video file for a lot of languages.

This Python script extracts the audio from a video file, transcribes it,
translates it into a different language, generates a new audio file with
the translated text, and then merges it with the original video.

## Prerequisites

-   Python 3.8 or higher
-   [FFmpeg](https://ffmpeg.org/download.html)

## Technologies Used

-   [Google Cloud Text-to-Speech
    API](https://cloud.google.com/text-to-speech): Used to generate the
    audio for the translated text.
-   [Google Cloud Translate API](https://cloud.google.com/translate):
    Used to translate the transcribed text into a different language.
-   [Whisper ASR](https://www.openai.com/research/whisper/): Used to
    transcribe the audio from the video file.
-   [Spacy](https://spacy.io/): Used for natural language processing
    tasks, such as tokenization and syllable counting.
-   [PyDub](http://pydub.com/): Used for manipulating audio files.
-   [MoviePy](https://zulko.github.io/moviepy/): Used for extracting the
    audio from the video file.

## Installation

1.  Clone this repository:

        git clone https://github.com/am-sokolov/videodubber.git

2.  Install the required Python packages:

        pip install -r requirements.txt

## Google Cloud Credentials

This script uses Google Cloud's Text-to-Speech and Translate APIs, which
require authentication. Follow these steps to get your credentials:

1.  Create a new project in the [Google Cloud
    Console](https://console.cloud.google.com/).
2.  Enable the
    [Text-to-Speech](https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries)
    and [Translate](https://cloud.google.com/translate/docs/setup) APIs
    for your project.
3.  Create a new service account for your project in the [Service
    Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
    page.
4.  Create a new JSON key for your service account, and download it.
    This is your credentials file.

## Usage

Run the script with the following command:

    python main.py --input <path_to_video_file> --voice <target_voice> --credentials <path_to_credentials_file> --source_language <source_language>

-   `<path_to_video_file>`: Path to the source video file

-   `<target_voice>`: Target dubbing voice name from [Google Cloud
    Text-to-Speech
    Voices](https://cloud.google.com/text-to-speech/docs/voices).
    Default is "es-US-Neural2-B". Recommended voices are:

    -   English: "en-US-Neural2-J"
    -   Spanish: "es-US-Neural2-B"
    -   German: "de-DE-Neural2-D"
    -   Italian: "it-IT-Neural2-C"
    -   French: "fr-FR-Neural2-D"
    -   Russian: "ru-RU-Wavenet-D"
    -   Hindi: "hi-IN-Neural2-B"\
        But you feel free to use any other voice.

-   `<path_to_credentials_file>`: Path to the Google Cloud credentials
    JSON file

-   `<source_language>`: Source language, e.g.ย "english".

Now, the fully supported source languages are: English, German, French,
Italian, Catalan, Chinese, Croatian, Danish, Dutch, Finnish, Greek,
Japanese, Korean, Lithuanian, Macedonian, Polish, Portuguese, Romanian,
Russian, Spanish, Swedish, Ukrainian.

## Output

The script will create a new video file with the same name as the input
video file, but with "\_translated" appended to the name. The new video
file will have the original video with the new translated audio track.
Additionaly, the script will create a new `.wav` audio track with the
same name as the input video file contains translation only.

## Testing

You can test this script with any video that contains narration. For
example, you can use this [free video of US President Donald Trump
speaking at the Young Black Leadership Summit at the White
House](https://www.videvo.net/video/us-president-donald-trump-speaks-to-african-americans-young-black-leadership-summit-at-the-white-house-8/613121/).

Here are the step-by-step instructions for testing:

1.  Download the video from the link above.

2.  Save the video file in the same directory as the script under the
    name `trump_speech.mp4`.

3.  Run the script with the downloaded video file as the input. For
    example, if you saved the video as `trump_speech.mp4`, you would
    run:

        python main.py trump_speech.mp4  de-DE-Neural2-B path_to_credentials.json  english

    Replace `path_to_credentials.json` with the path to your Google
    Cloud credentials JSON file.

4.  The script will create a new `.wav` audio file named
    `trump_speech.wav` in the same directory. This file contains the
    translated audio.

5.  Listen to the `trump_speech.wav` file to verify that the script
    worked correctly. The audio should be a translation of the original
    speech in the video.

Feel free to replace `de-DE-Neural2-B` with the desired target voice.

## License

Alexey Sokolov (c). This project is licensed under the terms of the MIT
license included in this repository.