Microsoft Azure documentation
One-click deployments from the Hugging Face Hub on Azure AI
One-click deployments from the Hugging Face Hub on Azure AI
This guide introduces the Hugging Face Hub and Azure AI one-click deployment of open-source models as Azure ML Managed Online Endpoints real-time inference.
TL;DR The Hugging Face Hub is a collaborative platform hosting over a million open-source machine learning models, datasets, and demos. It supports a wide range of tasks across natural language processing, vision, and audio, and provides version-controlled repositories with metadata, model cards, and programmatic access via APIs and popular ML libraries. Azure AI Foundry builds on Azure ML but is tailored specifically for generative AI and agent-based applications. Azure Machine Learning is a cloud-based platform for building, deploying, and managing machine learning models at scale. It provides managed infrastructure, including powerful CPU and GPU instances, automated scaling, secure endpoints, and monitoring, making it suitable for both experimentation and production deployment.
The integration between Hugging Face Hub and Azure AI / ML allows users to deploy thousands of Hugging Face models directly onto Azure’s managed infrastructure with minimal configuration. This is achieved through a native model catalog in Azure AI Foundry Hub and Azure ML Studio, which features Hugging Face models ready for real-time deployment.
The steps required to deploy an open-source model from the Hugging Face Hub to Azure AI as an Azure ML Managed Online Endpoint for real-time inference are the following:
Go to the Hugging Face Hub Models page, and browse all the open-source models available on the Hub.
Alternatively, you can also start directly from the Hugging Face Collection on Azure AI (public URL, no authentication required), or from the Hugging Face Collection on Azure AI (requires Azure authentication) instead of the Hugging Face Hub, and just explore the available models using the Azure AI Model Catalog filters to deploy the models that you want.
Leverage the Hub filters to easily find and discover new models based on the filters as e.g. task type, size based in number of parameters, inference engine support, and much more.
Select the model that you want, and within its model card click on the “Deploy” button, and then select the option “Deploy on Azure AI”, and then click on “Go to model in Azure AI”. Note that the model may not be available for deployment, meaning that the “Deploy” button may not be enabled for some models; or that the “Deploy on Azure AI” option may not be listed, meaning that the model is not supported within any of the inference engines or tasks supported on Azure AI; or also that the “Deploy on Azure AI” button is available, but it says “Request to add”, meaning that model is not available but could be publish, so you can request its addition into the Hugging Face Collection in the Azure AI Foundry Hub Model Catalog.
On Azure AI Foundry, you will be redirected to the model card, and you need to click “Use this model”, and fill the configuration values for the endpoint and the deployment, such as the endpoint name, the instance type, or the instance count, among others; then click “Deploy”.
After the endpoint is created and the deployment is ready, you will be able to send requests to the deployed API. For more information on how to send inference requests to it, you can either check the “Consume” tab within the Azure ML Endpoint in Azure AI Foundry, or check any of the available Azure AI examples on the documentation.