Detect objects in images and describe them with audio
Generate speech from text using a reference audio
Estimate gender, height, and torso area from an image