Inference Providers documentation
Building Your First AI App with Inference Providers
Building Your First AI App with Inference Providers
You’ve learned the basics and understand the provider ecosystem. Now let’s build something practical: an AI Meeting Notes app that transcribes audio files and generates summaries with action items.
This project demonstrates real-world AI orchestration using multiple specialized providers within a single application.
Project Overview
Our app will:
- Accept audio as a microphone input through a web interface
- Transcribe speech using a fast speech-to-text model
- Generate summaries using a powerful language model
- Deploy to the web for easy sharing
Tech Stack: Gradio (for the UI) + Inference Providers (for the AI)
Step 1: Set Up Authentication
Before we start coding, authenticate with Hugging Face using the CLI:
pip install huggingface_hub huggingface-cli login
When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from your settings page.
Step 2: Build the User Interface
Now let’s create a simple web interface using Gradio:
import gradio as gr
from huggingface_hub import InferenceClient
def process_meeting_audio(audio_file):
"""Process uploaded audio file and return transcript + summary"""
if audio_file is None:
return "Please upload an audio file.", ""
# We'll implement the AI logic next
return "Transcript will appear here...", "Summary will appear here..."
# Create the Gradio interface
app = gr.Interface(
fn=process_meeting_audio,
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
outputs=[
gr.Textbox(label="Transcript", lines=10),
gr.Textbox(label="Summary & Action Items", lines=8)
],
title="🎤 AI Meeting Notes",
description="Upload an audio file to get an instant transcript and summary with action items."
)
if __name__ == "__main__":
app.launch()
Here we’re using Gradio’s gr.Audio
component to either upload an audio file or use the microphone input. We’re keeping things simple with two outputs: a transcript and a summary with action items.
Step 3: Add Speech Transcription
Now let’s implement the transcription using OpenAI’s whisper-large-v3
model for fast, reliable speech processing.
def transcribe_audio(audio_file_path):
"""Transcribe audio using fal.ai for speed"""
client = InferenceClient(provider="auto")
# Pass the file path directly - the client handles file reading
transcript = client.automatic_speech_recognition(
audio=audio_file_path,
model="openai/whisper-large-v3"
)
return transcript.text
Step 4: Add AI Summarization
Next, we’ll use a powerful language model like deepseek-ai/DeepSeek-R1-0528
from DeepSeek via an Inference Provider, and just like in the previous step, we’ll use the auto
provider to automatically select the first available provider for the model.
We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made:
def generate_summary(transcript):
"""Generate summary using an Inference Provider"""
client = InferenceClient(provider="auto")
prompt = f"""
Analyze this meeting transcript and provide:
1. A concise summary of key points
2. Action items with responsible parties
3. Important decisions made
Transcript: {transcript}
Format with clear sections:
## Summary
## Action Items
## Decisions Made
"""
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[{"role": "user", "content": prompt}],
max_tokens=1000
)
return response.choices[0].message.content
Step 5: Deploy on Hugging Face Spaces
To deploy, we’ll need to create an app.py
file and upload it to Hugging Face Spaces.
📋 Click to view the complete app.py file
import gradio as gr
from huggingface_hub import InferenceClient
def transcribe_audio(audio_file_path):
"""Transcribe audio using an Inference Provider"""
client = InferenceClient(provider="auto")
# Pass the file path directly - the client handles file reading
transcript = client.automatic_speech_recognition(
audio=audio_file_path, model="openai/whisper-large-v3"
)
return transcript.text
def generate_summary(transcript):
"""Generate summary using an Inference Provider"""
client = InferenceClient(provider="auto")
prompt = f"""
Analyze this meeting transcript and provide:
1. A concise summary of key points
2. Action items with responsible parties
3. Important decisions made
Transcript: {transcript}
Format with clear sections:
## Summary
## Action Items
## Decisions Made
"""
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[{"role": "user", "content": prompt}],
max_tokens=1000,
)
return response.choices[0].message.content
def process_meeting_audio(audio_file):
"""Main processing function"""
if audio_file is None:
return "Please upload an audio file.", ""
try:
# Step 1: Transcribe
transcript = transcribe_audio(audio_file)
# Step 2: Summarize
summary = generate_summary(transcript)
return transcript, summary
except Exception as e:
return f"Error processing audio: {str(e)}", ""
# Create Gradio interface
app = gr.Interface(
fn=process_meeting_audio,
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
outputs=[
gr.Textbox(label="Transcript", lines=10),
gr.Textbox(label="Summary & Action Items", lines=8),
],
title="🎤 AI Meeting Notes",
description="Upload audio to get instant transcripts and summaries.",
)
if __name__ == "__main__":
app.launch()
Our app will run on port 7860 and look like this:
To deploy, we’ll need to create a new Space and upload our files.
- Create a new Space: Go to huggingface.co/new-space
- Choose Gradio SDK and make it public
- Upload your files: Upload
app.py
- Add your token: In Space settings, add
HF_TOKEN
as a secret (get it from your settings) - Launch: Your app will be live at
https://huggingface.co/spaces/your-username/your-space-name
Note: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment.
Next Steps
Congratulations! You’ve created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the Inference Providers page. Or here are some ideas for next steps:
- Improve your prompt: Try different prompts to improve the quality for your use case
- Try different models: Experiment with various speech and text models
- Compare performance: Benchmark speed vs. accuracy across providers