hishab
/

titu_stt_bn_fastconformer

Automatic Speech Recognition

Automatic Speech Recognition

Bangla fastconformer

https://arxiv.org/abs/2311.03196

Model card Files Files and versions Community

titu_stt_bn_fastconformer / README.md

sagorhishab's picture

Update README.md (#1)

22b22bd over 1 year ago

|

1.58 kB

	---
	license: cc-by-nc-4.0
	language:
	- bn
	library_name: nemo
	pipeline_tag: automatic-speech-recognition
	---
	## Hishab BN FastConformer
	__Hishab BN FastConformer__ is a [fastconformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#fast-conformer) based model trained on ~18K Hours [MegaBNSpeech]() corpus.

	## Using method
	This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using [NeMo](https://github.com/NVIDIA/NeMo) framework.

	### Installation
	To install [NeMo](https://github.com/NVIDIA/NeMo) check NeMo documentation.

	### Inferencing
	```py
	import nemo.collections.asr as nemo_asr
	asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/hishab_bn_fastconformer")

	transcriptions = asr_model.transcribe(["file.wav"])
	```
	## Training Datasets

	\| Channels Category \| Hours \|
	\| ----------------- \| ----------- \|
	\| News \| 17,640.00 \|
	\| Talkshow \| 688.82 \|
	\| Vlog \| 0.02 \|
	\| Crime Show \| 4.08 \|
	\| Total \| 18,332.92 \|


	## Training Details

	For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.

	## Evaluation


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/WvMlp95z2-GXT6AYfwW8Y.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/O2RA9TAedIv1OTqgdIap5.png)

	## Citation