Transcribe speech and extract action details
Convert audio to text using Whisper
Recognize and transcribe spoken language into text