OldDragon commited on
Commit
17a1571
·
verified ·
1 Parent(s): 918dd87

Add 2 files

Browse files
Files changed (1) hide show
  1. index.html +4 -1
index.html CHANGED
@@ -110,7 +110,10 @@
110
  </div>
111
  <p class="text-gray-600 mb-6 text-justify leading-relaxed">
112
  Speaker diarization aims to solve the problem of "who speaks when". Existing data resources are often concentrated in specific scenarios such as meetings, resulting in insufficient generalization of the speaker diarization model. The M3SD dataset is a carefully organized speaker diarization dataset with detailed metadata, which aims to promote multi-modal, multi-scenario, and multi-language speaker diarization task research. The dataset contains 770+ hours of conversations, covering multiple scenarios such as online and offline meetings, home communications, outdoor conversations, interviews, movie clips, news broadcasts, and multiple languages ​​including English and Chinese. The data comes from YouTube and is pseudo-labeled through a variety of speaker diarization systems. We will provide audio files, annotation files, and video metadata including uid.
113
- You can also download videos from YouTube based on video meta information for multimodal research. The code for data collection has been open sourced: https://github.com/slwu0209/M3SD.
 
 
 
114
  </p>
115
  <div class="bg-indigo-50 border-l-4 border-indigo-400 p-4 mb-6">
116
  <div class="flex">
 
110
  </div>
111
  <p class="text-gray-600 mb-6 text-justify leading-relaxed">
112
  Speaker diarization aims to solve the problem of "who speaks when". Existing data resources are often concentrated in specific scenarios such as meetings, resulting in insufficient generalization of the speaker diarization model. The M3SD dataset is a carefully organized speaker diarization dataset with detailed metadata, which aims to promote multi-modal, multi-scenario, and multi-language speaker diarization task research. The dataset contains 770+ hours of conversations, covering multiple scenarios such as online and offline meetings, home communications, outdoor conversations, interviews, movie clips, news broadcasts, and multiple languages ​​including English and Chinese. The data comes from YouTube and is pseudo-labeled through a variety of speaker diarization systems. We will provide audio files, annotation files, and video metadata including uid.
113
+ You can also download videos from YouTube based on video meta information for multimodal research. The code for data collection has been open sourced:
114
+ <a href="https://github.com/slwu0209/M3SD" class="text-blue-500 hover:underline">
115
+ https://github.com/slwu0209/M3SD
116
+ </a>.
117
  </p>
118
  <div class="bg-indigo-50 border-l-4 border-indigo-400 p-4 mb-6">
119
  <div class="flex">