# Making kurianbenoy/faster-speech-to-text-for-malayalam with Jupyter notebooks

## Install packages

In [1]:
!pip install -Uqq nbdev gradio==3.31.0 faster-whisper==0.5.1

## Basic inference code

In [2]:
#|export
import gradio as gr
from faster_whisper import WhisperModel

In [3]:
gr.__version__

'3.31.0'

In [5]:
def t_asr(folder="vegam-whisper-medium-ml-fp16", audio_file="vegam-whisper-medium-ml-fp16/00b38e80-80b8-4f70-babf-566e848879fc.webm", compute_type="float16", device="cpu"):
    model = WhisperModel(folder, device=device, compute_type=compute_type)
    
    segments, info = model.transcribe(audio_file, beam_size=5)
    
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

In [6]:
%%time
t_asr(compute_type="int8", device="cuda")

[0.00s -> 4.58s] ‡¥™‡¥æ‡¥≤‡¥Ç ‡¥ï‡¥ü‡µÅ‡¥ï‡µç‡¥ï‡µÅ‡¥µ‡µã‡¥≥‡¥Ç ‡¥®‡¥æ‡¥∞‡¥æ‡¥Ø‡¥£ ‡¥™‡¥æ‡¥≤‡¥Ç ‡¥ï‡¥ü‡¥®‡µç‡¥®‡¥æ‡¥≤‡µã ‡¥ï‡µÇ‡¥∞‡¥æ‡¥Ø‡¥£
CPU times: user 11.2 s, sys: 2.2 s, total: 13.4 s
Wall time: 6.54 s


In [7]:
#|export 
def transcribe_malayalam_speech(audio_file, compute_type="int8", device="cpu", folder="vegam-whisper-medium-ml-fp16"):
    
    model = WhisperModel(folder, device=device, compute_type=compute_type)
    segments, info = model.transcribe(audio_file, beam_size=5)

    lst = []
    for segment in segments:
        # print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
        lst.append(segment.text)

    return(" ".join(lst))

In [8]:
#|export
def gr_transcribe_malayalam_speech(microphone, file_upload, compute_type="int8", device="cpu", folder="vegam-whisper-medium-ml-fp16"):
    warn_output = ""
    if (microphone is not None) and (file_upload is not None):
        warn_output = (
            "WARNING: You've uploaded an audio file and used the microphone. "
            "The recorded file from the microphone will be used and the uploaded audio will be discarded.\n"
        )

    elif (microphone is None) and (file_upload is None):
        return "ERROR: You have to either use the microphone or upload an audio file"

    audio_file = microphone if microphone is not None else file_upload
    
    model = WhisperModel(folder, device=device, compute_type=compute_type)
    segments, info = model.transcribe(audio_file, beam_size=5)

    lst = []
    for segment in segments:
        # print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
        lst.append(segment.text)

    return(" ".join(lst))

In [9]:
%%time
transcribe_malayalam_speech(audio_file="vegam-whisper-medium-ml-fp16/00b38e80-80b8-4f70-babf-566e848879fc.webm")

CPU times: user 40.6 s, sys: 9.76 s, total: 50.3 s
Wall time: 13.6 s


'‡¥™‡¥æ‡¥≤‡¥Ç ‡¥ï‡¥ü‡µÅ‡¥ï‡µç‡¥ï‡µÅ‡¥µ‡µã‡¥≥‡¥Ç ‡¥®‡¥æ‡¥∞‡¥æ‡¥Ø‡¥£ ‡¥™‡¥æ‡¥≤‡¥Ç ‡¥ï‡¥ü‡¥®‡µç‡¥®‡¥æ‡¥≤‡µã ‡¥ï‡µÇ‡¥∞‡¥æ‡¥Ø‡¥£'

In [6]:
## Haha, You are burning GPUs and wasting CO2

## Figure out Whisper  Demo by Huggingface

## Make an app with Gradio

In [10]:
import gradio as gr

def greet(name):
    return "Hello " + name + "!!"

iface = gr.Interface(fn=greet, inputs="text", outputs="text")
iface.launch(share=True)

Running on local URL:  http://0.0.0.0:6006
Running on public URL: https://9fa992d2ba37b0af49.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [20]:
#|export
mf_transcribe = gr.Interface(
    fn=gr_transcribe_malayalam_speech,
    inputs=[
        gr.inputs.Audio(source="microphone", type="filepath", optional=True),
        gr.inputs.Audio(source="upload", type="filepath", optional=True),
    ],
    outputs="text",
    title="PALLAKKU (‡¥™‡¥≤‡µç‡¥≤‡¥ï‡µç‡¥ï‡µç)",
    description=(
        "Pallakku is a Malayalam speech to text demo leveraging the model-weights of [vegam-whisper-medium-ml](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml-fp16)."
    ),
    article="Please note that this demo now uses CPU only and in my testing for a 5 seconds audio file it can take upto 15 seconds for results to come. If you are interested to use a GPU based API instead, feel free to contact the author @ kurian.bkk@gmail.com",
    allow_flagging="never",
)

  super().__init__(source=source, type=type, label=label, optional=optional)


In [24]:
#|export
mf_transcribe.launch(share=True)

Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----
Running on local URL:  http://0.0.0.0:6010
Running on public URL: https://19b32861466405ac95.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




## Create a requirements.txt file

In [22]:
%%writefile requirements.txt
gradio==3.31.0
faster-whisper==0.5.1
torch

Writing requirements.txt


## Convert this notebook into a Gradio app

In [25]:
from nbdev.export import nb_export
nb_export('app.ipynb', lib_path='.', name='app')

## Reference

1. [Create A ü§ó Space From A Notebook](https://nbdev.fast.ai/blog/posts/2022-11-07-spaces/index.html)
2. [Nbdev Demo](https://gist.github.com/hamelsmu/35be07d242f3f19063c3a3839127dc67)
3. [Whisper-demo space by  ü§ó](https://huggingface.co/spaces/whisper-event/whisper-demo)