Alignment gap issue in spanish dataset.

#2
by gcjavi - opened

Hello,
I was trying to align a dataset in spanish following the tutorials provided in https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/nemo_forced_aligner.html. However, in my dataset there are some segments that are present in the audio files, but not transcribed, i.e., there is a sentence mentioned in the audio but is not written in the transcription file. When the model finds this type of cases it simply extends the final time of the previous transcribed sentence, including the audio final time of the next missing sentence. E.g.: we have the sentence "Hi, how are you? I would like to talk with you. See you later" on the audio file, where "Hi, how are you?" goes from second 0.0 to 1.0 and "I would like to talk with you." from seconds 1.0 to 3.0 and "See you later" from 3.0 to 4.0, but in the transcription file we only have the "Hi, how are you? See you later." transcription. The model aligns the following: "Hi, how are you?" from 0.0 to 3.0 and "See you later." from 3.0 to 4.0 . I assume the model is working based on the transcribed texts, and extends the final time of the last sentence until it detects the start of the next transcribed one, but is there any way to fix this?
Thank you!

Sign up or log in to comment