--- extra_gated_prompt: Please read LICENSE.md before downloading this model. extra_gated_fields: Country: country Affiliation: text I agree ALL the statements in LICENSE md: checkbox extra_gated_button_content: Acknowledge license license: other license_name: imprt-license license_link: LICENSE.md language: - ja pipeline_tag: feature-extraction tags: - wav2vec2 - speech --- # `imprt/izanami-wav2vec2-large` This is a Japanese wav2vec2.0 Large model pre-trained using 62215 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection. This model was trained using code from the [official repository](https://github.com/facebookresearch/fairseq/). ## Usage ```python import soundfile as sf from transformers import AutoFeatureExtractor model = "imprt/izanami-wav2vec2-large" feature_extractor = AutoFeatureExtractor.from_pretrained(model) audio_file="/path/to/16k_audio_file" audio_input, sr = sf.read(audio_file) feature_extractor(audio_input, sampling_rate=sr) ``` ## References ```bibtex @inproceedings{NEURIPS2020_92d1e1eb, author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael}, booktitle = {Advances in Neural Information Processing Systems}, editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin}, pages = {12449--12460}, publisher = {Curran Associates, Inc.}, title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations}, url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf}, volume = {33}, year = {2020} } ``` ## License / Terms Read [LICENSE](LICENSE.md) when you use this model.