Overview
MultiTalk dataset is a new multilingual 2D video dataset featuring over 420 hours of talking videos across 20 languages. It contains 293,812 clips with a resolution of 512x512, a frame rate of 25 fps, and an average duration of 5.19 seconds per clip. The dataset shows a balanced distribution across languages, with each language representing between 2.0% and 9.7% of the total.

Detailed statistics
Language | Total Duration(h) | #Clips | Avg. Duration(s) | Annotation |
---|---|---|---|---|
Arabic | 10.32 | 9048 | 4.11 | arabic.json |
Catalan | 41.0 | 29232 | 5.05 | catalan.json |
Croatian | 41.0 | 25465 | 5.80 | croatian.json |
Czech | 18.9 | 11228 | 6.06 | czech.json |
Dutch | 17.05 | 14187 | 4.33 | dutch.json |
English | 15.49 | 11082 | 5.03 | english.json |
French | 13.17 | 11576 | 4.10 | french.json |
German | 16.25 | 10856 | 5.39 | german.json |
Greek | 17.53 | 12698 | 4.97 | greek.json |
Hindi | 24.41 | 16120 | 5.45 | hindi.json |
Italian | 13.59 | 9753 | 5.02 | italian.json |
Japanese | 8.36 | 5990 | 5.03 | japanese.json |
Mandarin | 8.73 | 6096 | 5.15 | mandarin.json |
Polish | 21.58 | 15181 | 5.12 | polish.json |
Portuguese | 41.0 | 25321 | 5.83 | portuguese.json |
Russian | 26.32 | 17811 | 5.32 | russian.json |
Spanish | 23.65 | 18758 | 4.54 | spanish.json |
Thai | 10.95 | 7595 | 5.19 | thai.json |
Turkish | 12.9 | 11165 | 4.16 | turkish.json |
Ukrainian | 41.0 | 24650 | 5.99 | ukrainian.json |
Download
Usage
Prepare the environment:
pip install pytube
pip install opencv-python
Run script:
cd MultiTalk_Dataset
You can pass the languages you want to download as arguments to the script. If you want to download all 20 languages, run the following script.
sh dataset.sh arabic catalan croatian czech dutch english french german greek hindi italian japanese mandarin polish portuguese russian spanish thai turkish ukrainian
After downloading, the folder structure will be as below. Each language folder contains the .mp4 videos.
You can change the ${ROOT} folder in the code.
${ROOT}
βββ multitalk_dataset # MultiTalk Dataset
β βββ arabic
β β βββ O-VJXuHb390_0.mp4
β β βββ O-VJXuHb390_1.mp4
β β βββ ...
β β βββ ...
β βββ catalan
β βββ ...
β βββ ...
βββ raw_video # Original videos (you can remove this directory after downloading)
βββ arabic
βββ catalan
βββ ...
βββ ...
JSON File Structure
{
"QrDZjUeiUwc_0": // clip 1
{
"youtube_id": "QrDZjUeiUwc", // youtube id
"duration": {"start_sec": 302.0, "end_sec": 305.56}, // start and end times in the original video
"bbox": {"top": 0.0, "bottom": 0.8167, "left": 0.4484, "right": 0.9453}, // bounding box
"language": "czech", // language
"transcript": "jΓ‘ jsem v podstatΔ obnovil svΕ―j list z minulΓ©ho roku" // transcript
},
"QrDZjUeiUwc_1": // clip 2
{
"youtube_id": "QrDZjUeiUwc",
"duration": {"start_sec": 0.12, "end_sec": 4.12},
"bbox": {"top": 0.0097, "bottom": 0.55, "left": 0.3406, "right": 0.6398},
"language": "czech",
"transcript": "ahoj tady aniΔka a vΓtejte u dalΕ‘Γho easycheck videa"
}
"..."
"..."
}