Thaweewat
/

whisper-th-medium-ct2

Inference Endpoints

Model card Files Files and versions Community

whisper-th-medium-ct2 / README.md

Thaweewat's picture

Update README.md

928ce1a about 1 year ago

|

history blame contribute delete

2.03 kB

	---
	license: apache-2.0
	language:
	- th
	base_model: biodatlab/whisper-th-medium-combined
	tags:
	- whisper
	- Pytorch
	---

	# Whisper-th-medium-ct2

	whisper-th-medium-ct2 is the CTranslate2 format of [biodatlab/whisper-th-medium-combined](https://huggingface.co/biodatlab/whisper-th-medium-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables:

	- 🤏 Half the size of original Huggingface format.
	- ⚡️ Batched inference for 70x real-time transcription.
	- 🪶 A faster-whisper backend, requiring <8GB GPU memory with beam_size=5.
	- 🎯 Accurate word-level timestamps using wav2vec2 alignment.
	- 👯‍♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels).
	- 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation.

	### Usage

	```python
	!pip install git+https://github.com/m-bain/whisperx.git

	import whisperx
	import time

	# Setting
	device = "cuda"
	audio_file = "audio.mp3"
	batch_size = 16
	compute_type = "float16"

	"""
	Your Hugging Face token for the Diarization model is required.
	Additionally, you need to accept the terms and conditions before use.
	Please visit the model page here.
	https://huggingface.co/pyannote/segmentation-3.0
	"""
	HF_TOKEN = ""


	# load model and transcript
	model = whisperx.load_model("Thaweewat/whisper-th-medium-ct2", device, compute_type=compute_type)
	st_time = time.time()
	audio = whisperx.load_audio(audio_file)
	result = model.transcribe(audio, batch_size=batch_size)

	# Assign speaker labels
	diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
	diarize_segments = diarize_model(audio)
	result = whisperx.assign_word_speakers(diarize_segments, result)

	# Combine pure text if needed
	combined_text = ' '.join(segment['text'] for segment in result['segments'])

	print(f"Response time: {time.time() - st_time} seconds")
	print(diarize_segments)
	print(result)
	print(combined_text)
	```