Inference Providers documentation
Your First Inference Provider Call
Your First Inference Provider Call
In this guide we’re going to help you make your first API call with Inference Providers.
Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required.
We’re going to use the FLUX.1-schnell model, which is a powerful text-to-image model.
This guide assumes you have a Hugging Face account. If you don’t have one, you can create one for free at huggingface.co.
Step 1: Find a Model on the Hub
Visit the Hugging Face Hub and look for models with the “Inference Providers” filter, you can select the provider that you want. We’ll go with fal
.
For this example, we’ll use FLUX.1-schnell, a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side.
Step 2: Try the Interactive Widget
Before writing any code, try the widget directly on the model page:
Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects.
- Enter a prompt like “A serene mountain landscape at sunset”
- Click “Generate”
- Watch as the model creates an image in seconds
This widget uses the same endpoint you’re about to implement in code.
You’ll need a Hugging Face account (free at huggingface.co) and remaining credits to use the model.
Step 3: From Clicks to Code
Now let’s replicate this with Python. Click the “View Code Snippets” button in the widget to see the generated code snippets.
You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your settings page.
Set your token as an environment variable:
export HF_TOKEN="your_token_here"
The Python or TypeScript code snippet will use the token from the environment variable.
Install the required package:
pip install huggingface_hub
You can now use the code snippet to generate an image in your app.
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="auto",
api_key=os.environ["HF_TOKEN"],
)
# output is a PIL.Image object
image = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-schnell",
)
What Just Happened?
Nice work! You’ve successfully used a production-grade AI model without any complex setup. In just a few lines of code, you:
- Connected to a powerful text-to-image model
- Generated a custom image from text
- Saved the result locally
The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically.
Dive Deeper: Provider Selection
You might have noticed the provider="auto"
parameter in the code examples above. This is a key feature of Inference Providers that gives you control over which infrastructure provider handles your request.
auto
is powerful because:
- It makes it easy to switch between providers, and to test different providers’ performance for your use case.
- It also gives a fallback mechanism in case a provider is unavailable.
But if you want to be more specific, you can also specify a provider. Let’s see how.
Understanding Provider Selection
When you use provider="auto"
(which is the default), the system automatically selects the first available provider for your chosen model based on your preference order in your Inference Provider settings. This provides:
- Automatic failover: If one provider is unavailable, the system tries the next one
- Simplified setup: No need to research which providers support your model
- Optimal routing: The system handles provider selection for you
Specifying a Specific Provider
Alternatively, you can explicitly choose a provider if you have specific requirements:
import os
from huggingface_hub import InferenceClient
client = InferenceClient(api_key=os.environ["HF_TOKEN"])
# Using automatic provider selection (default)
image_auto = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-schnell",
provider="auto" # This is the default
)
# Using a specific provider
image_fal = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-schnell",
provider="fal-ai" # Explicitly use Fal AI
)
# Using another specific provider
image_replicate = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-schnell",
provider="replicate" # Explicitly use Replicate
)
When to Use Each Approach
Use provider="auto"
when:
- You’re just getting started with Inference Providers
- You want the simplest setup and maximum reliability
- You don’t have specific infrastructure requirements
- You want automatic failover if a provider is unavailable
Use a specific provider when:
- You need consistent performance characteristics
- You have specific billing or cost requirements
- You want to test different providers’ performance for your use case
Next Steps
Now that you’ve seen how easy it is to use AI models, you might wonder:
- What was that “provider” system doing behind the scenes?
- How does billing work?
- What other models can you use?
Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing.
< > Update on GitHub