SSL model
Collection
Self-Supervised Learning model
•
4 items
•
Updated
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Please read LICENSE.md before downloading this model.
Log in or Sign Up to review the conditions and access this model content.
imprt/izanami-wav2vec2-large
This is a Japanese wav2vec2.0 Large model pre-trained using 62215 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.
This model was trained using code from the official repository.
import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-large"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)
@inproceedings{NEURIPS2020_92d1e1eb,
author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
pages = {12449--12460},
publisher = {Curran Associates, Inc.},
title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf},
volume = {33},
year = {2020}
}
Read LICENSE when you use this model.