---
extra_gated_prompt: Please read LICENSE.md before downloading this model.
extra_gated_fields:
  Country: country
  Affiliation: text
  I agree ALL the statements in LICENSE md: checkbox
extra_gated_button_content: Acknowledge license
license: other
license_name: imprt-license
license_link: LICENSE.md
language:
- ja
pipeline_tag: feature-extraction
tags:
- wav2vec2
- speech
---

# `imprt/izanami-wav2vec2-large`

This is a Japanese wav2vec2.0 Large model pre-trained using 62215 hours of audio extracted from large-scale Japanese TV broadcast audio data by voice activity detection.  
This model was trained using code from the [official repository](https://github.com/facebookresearch/fairseq/).  


## Usage
```python
import soundfile as sf
from transformers import AutoFeatureExtractor
model = "imprt/izanami-wav2vec2-large"
feature_extractor = AutoFeatureExtractor.from_pretrained(model)
audio_file="/path/to/16k_audio_file"
audio_input, sr = sf.read(audio_file)
feature_extractor(audio_input, sampling_rate=sr)
```

## References
```bibtex
@inproceedings{NEURIPS2020_92d1e1eb,
    author = {Baevski, Alexei and Zhou, Yuhao and Mohamed, Abdelrahman and Auli, Michael},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages = {12449--12460},
    publisher = {Curran Associates, Inc.},
    title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
    url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf},
    volume = {33},
    year = {2020}
}
```

## License / Terms

Read [LICENSE](LICENSE.md) when you use this model.