Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

Published January 22, 2025
Update on GitHub

FriendliAI’s inference infrastructure is now integrated into the Hugging Face Hub as an option in the “Deploy this model” button, simplifying and accelerating generative AI model serving.

Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

A Collaboration to Advance AI Innovation

Hugging Face empowers developers, researchers, and businesses to innovate in AI. Our common priority is building impactful partnerships that simplify workflows and provide cutting-edge tools for the AI community.

Today, we are excited to announce a partnership between HF and FriendliAI, a leader in accelerated generative AI inference, to enhance how developers deploy and manage AI models. This integration introduces FriendliAI Endpoints as a deployment option within the Hugging Face Hub, offering developers direct access to high-performance, cost-effective inference infrastructure.

FriendliAI is ranked as the fastest GPU-based generative AI inference provider by Artificial Analysis, with groundbreaking technologies including continuous batching, native quantization, and best-in-class autoscaling. With this technology, FriendliAI continues to advance the standards for AI inference serving performance, delivering faster processing speeds, reduced latency, and substantial cost savings for deploying generative AI models at scale. Through this partnership, Hugging Face users and FriendliAI customers can effortlessly deploy open-source or custom generative AI models with unparalleled efficiency and reliability.

Simplifying Model Deployment

Last year, FriendliAI introduced a Hugging Face integration, enabling users to seamlessly deploy Hugging Face models directly within the Friendli Suite platform. Through this integration, users gained access to thousands of supported open-source models on Hugging Face, as well as the capability to deploy private models effortlessly. The list of model architectures currently supported by FriendliAI can be found here.

Today, we’re taking this integration further by enabling the same capability directly within the Hugging Face Hub, offering 1-click deployment for a seamless user experience. You can deploy models directly from the model card on the Hugging Face Hub using a Friendli Suite account.

Friendli Inference deployment option in Hugging Face

Selecting Friendli Endpoints will take you to FriendliAI’s model deployment page. Here, you can deploy the model on NVIDIA H100 GPUs while simultaneously interacting with optimized open-source models. The deployment page features an intuitive interface for setting up Friendli Dedicated Endpoints, the managed service for generative AI inference. Additionally, while your deployment is processing, you can chat with open-source models directly on the page, making it easy to explore and test their capabilities.

Deploy models with NVIDIA H100 in Friendli Dedicated Endpoints

With FriendliAI’s advanced GPU-optimized inference engine, Dedicated Endpoints delivers fast and cost-effective inference as a managed service. Developers can effortlessly deploy open-source or custom models on NVIDIA H100 GPUs using Friendli Dedicated Endpoints by clicking “Deploy now” on the model deployment page.

H100 GPUs are powerful but can be expensive to operate at scale. With FriendliAI’s optimized service, you can reduce the number of GPUs needed while maintaining peak performance, significantly lowering costs. Beyond cost efficiency, Dedicated Endpoints also simplifies the complexities of managing infrastructure.

Deploy Hugging Face models in the model deployment page

Inference Open-Source Models with Friendli Serverless Endpoints

Friendli Serverless Endpoints is the perfect solution for developers who want to efficiently inference open-source models. This service provides user-friendly APIs for models optimized by FriendliAI, ensuring high performance at a low cost. You can chat with these powerful open-source models directly on the model deployment page.

Try out Serverless Endpoints in the model deployment page

What’s Next

We’re thrilled to deepen the FriendliAI<>HF collaboration, enhancing accessibility to open-source AI for developers worldwide. FriendliAI’s high-speed, cost-efficient inference solution eliminates the complexities of infrastructure management, empowering users to focus on innovation. Together with FriendliAI, we remain committed to transforming how AI is developed, driving groundbreaking innovation that shapes the next era of AI.

You can also give us a Follow on our organization page to be updated about future news 🔥

Community

Nice news

Sign up or log in to comment