Spaces:

OldDragon
/

m3sd

Running

OldDragon commited on 21 days ago

Commit

17a1571

verified ·

1 Parent(s): 918dd87

Add 2 files

Files changed (1) hide show

index.html CHANGED Viewed

@@ -110,7 +110,10 @@
                 </div>
                 <p class="text-gray-600 mb-6 text-justify leading-relaxed">
                     Speaker diarization aims to solve the problem of "who speaks when". Existing data resources are often concentrated in specific scenarios such as meetings, resulting in insufficient generalization of the speaker diarization model. The M3SD dataset is a carefully organized speaker diarization dataset with detailed metadata, which aims to promote multi-modal, multi-scenario, and multi-language speaker diarization task research. The dataset contains 770+ hours of conversations, covering multiple scenarios such as online and offline meetings, home communications, outdoor conversations, interviews, movie clips, news broadcasts, and multiple languages including English and Chinese. The data comes from YouTube and is pseudo-labeled through a variety of speaker diarization systems. We will provide audio files, annotation files, and video metadata including uid.
-                    You can also download videos from YouTube based on video meta information for multimodal research.  The code for data collection has been open sourced: https://github.com/slwu0209/M3SD.
                 </p>
                 <div class="bg-indigo-50 border-l-4 border-indigo-400 p-4 mb-6">
                     <div class="flex">

                 </div>
                 <p class="text-gray-600 mb-6 text-justify leading-relaxed">
                     Speaker diarization aims to solve the problem of "who speaks when". Existing data resources are often concentrated in specific scenarios such as meetings, resulting in insufficient generalization of the speaker diarization model. The M3SD dataset is a carefully organized speaker diarization dataset with detailed metadata, which aims to promote multi-modal, multi-scenario, and multi-language speaker diarization task research. The dataset contains 770+ hours of conversations, covering multiple scenarios such as online and offline meetings, home communications, outdoor conversations, interviews, movie clips, news broadcasts, and multiple languages including English and Chinese. The data comes from YouTube and is pseudo-labeled through a variety of speaker diarization systems. We will provide audio files, annotation files, and video metadata including uid.
+                    You can also download videos from YouTube based on video meta information for multimodal research.  The code for data collection has been open sourced:
+                    <a href="https://github.com/slwu0209/M3SD" class="text-blue-500 hover:underline">
+                         https://github.com/slwu0209/M3SD
+                   </a>.
                 </p>
                 <div class="bg-indigo-50 border-l-4 border-indigo-400 p-4 mb-6">
                     <div class="flex">