Skip to contentRed Hat

Navigation

AI
  • Our approach

    • News and insights
    • Technical blog
    • Research
    • Live AI events
    • Explore AI at Red Hat
  • Our portfolio

    • Red Hat AI
    • Red Hat Enterprise Linux AI
    • Red Hat OpenShift AI
    • Red Hat AI Inference Server New
  • Engage & learn

    • AI learning hub
    • AI partners
    • Services for AI
Hybrid cloud
  • Use cases

    • Artificial intelligence

      Build, deploy, and monitor AI models and apps.

    • Linux standardization

      Get consistency across operating environments.

    • Application development

      Simplify the way you build, deploy, and manage apps.

    • Automation

      Scale automation and unite tech, teams, and environments.

    • Virtualization

      Modernize operations for virtualized and containerized workloads.

    • Security

      Code, build, deploy, and monitor security-focused software.

    • Edge computing

      Deploy workloads closer to the source with edge technology.

    • Explore solutions
  • Solutions by industry

    • Automotive
    • Financial services
    • Healthcare
    • Industrial sector
    • Media and entertainment
    • Public sector
    • Telecommunications

Discover cloud technologies

Learn how to use our cloud products and solutions at your own pace in the Red Hat® Hybrid Cloud Console.

Products
  • Platforms

    • Red Hat AI

      Develop and deploy AI solutions across the hybrid cloud.

    • Red Hat Enterprise Linux

      Support hybrid cloud innovation on a flexible operating system.

      New version
    • Red Hat OpenShift

      Build, modernize, and deploy apps at scale.

    • Red Hat Ansible Automation Platform

      Implement enterprise-wide automation.

  • Featured

    • Red Hat OpenShift Virtualization Engine
    • Red Hat OpenShift Service on AWS
    • Microsoft Azure Red Hat OpenShift
    • See all products
  • Try & buy

    • Start a trial
    • Buy online
    • Integrate with major cloud providers
  • Services & support

    • Consulting
    • Product support
    • Services for AI
    • Technical Account Management
    • Explore services
Training
  • Training & certification

    • Courses and exams
    • Certifications
    • Red Hat Academy
    • Learning community
    • Learning subscription
    • Explore training
  • Featured

    • Red Hat Certified System Administrator exam
    • Red Hat System Administration I
    • Red Hat Learning Subscription trial (No cost)
    • Red Hat Certified Engineer exam
    • Red Hat Certified OpenShift Administrator exam
  • Services

    • Consulting
    • Partner training
    • Product support
    • Services for AI
    • Technical Account Management
Learn
  • Build your skills

    • Documentation
    • Hands-on labs
    • Hybrid cloud learning hub
    • Interactive learning experiences
    • Training and certification
  • More ways to learn

    • Blog
    • Events and webinars
    • Podcasts and video series
    • Red Hat TV
    • Resource library

For developers

Discover resources and tools to help you build, deliver, and manage cloud-native applications and services.

Partners
  • For customers

    • Our partners
    • Red Hat Ecosystem Catalog
    • Find a partner
  • For partners

    • Partner Connect
    • Become a partner
    • Training
    • Support
    • Access the partner portal

Build solutions powered by trusted partners

Find solutions from our collaborative community of experts and technologies in the Red Hat® Ecosystem Catalog.

Search

I'd like to:

  • Start a trial
  • Manage subscriptions
  • See Red Hat jobs
  • Explore tech topics
  • Contact sales
  • Contact customer service

Help me find:

  • Documentation
  • Developer resources
  • Skills assessments
  • Architecture center
  • Security updates
  • Support cases

I want to learn more about:

  • AI
  • Application modernization
  • Automation
  • Cloud-native applications
  • Linux
  • Virtualization
ConsoleDocsSupportNew For you

Recommended

We'll recommend resources you may like as you browse. Try these suggestions for now.

  • Product trial center
  • Courses and exams
  • All products
  • Tech topics
  • Resource library
Log in

Sign in or create an account to get more from Red Hat

  • World-class support
  • Training resources
  • Product trials
  • Console access

A subscription may be required for some services.

Log in or register
Contact us
  • Home
  • Resources
  • Red Hat AI Inference Server

Red Hat AI Inference Server

May 20, 2025•
Resource type: Datasheet

Overview

The true value of AI lies in rapid, accurate responses at scale. Red Hat® AI Inference Server directly enables this by optimizing the inference process—the crucial step allowing AI applications to communicate with large language models (LLMs) and generate a response based on data—across the hybrid cloud, creating faster and more cost-effective model deployments.

Fast and cost-effective inference anywhere

As part of the Red Hat AI platform, Red Hat AI Inference Server provides consistent, fast, and cost-effective inference at scale. AI Inference Server allows you to run any generative AI (gen AI) model on any hardware accelerator and in datacenter, cloud, and edge environments—providing the flexibility and choice you need to meet your business requirements. AI Inference Server provides capabilities for efficient inference through model optimization using LLM) Compressor to compress both foundational and trained models or by providing access to a collection of validated and optimized gen AI models ready for inference deployments in less time. 

Red Hat AI Inference Server works with a wide array of hardware accelerators and models and can run on your choice of infrastructure and operating system (OS), including Red Hat AI platforms, Red Hat Enterprise Linux®, Red Hat OpenShift®, and third-party Linux or Kubernetes distributions, giving customers flexibility to align with any architecture.

Table 1. Features and benefits

Benefit

Description

Efficient approach to model inferencing with virtual large language model

(vLLM)

AI Inference Server provides an efficient approach to model inference by optimizing graphics processing units (GPUs) memory usage and inference latency) with vLLM.

Reduce operational complexity

AI Inference Server provides a consistent platform for deploying and optimizing models across the hybrid cloud. It offers a user-friendly approach to managing advanced machine learning (ML) techniques including quantization, and offers integration with observability tools like Prometheus and Grafana.

Hybrid cloud flexibility

With vLLM at its core, AI Inference Server provides organizations the freedom to run AI models wherever they need them—in datacenters, cloud environments, and at the edge. 

Technical specifications

  • Inference runtime for the hybrid cloud. With a sophisticated and powerful inference runtime vLLM at its core, AI Inference Server provides businesses with a unified, high-performance platform to run their choice of models across various accelerators, Kubernetes, and Linux environments. It also integrates with observability tools for enhanced monitoring, and supports LLM APIs, such as OpenAI’s, for flexible deployment.
  • LLM Compressor. AI teams can compress both foundational and trained models of any size to reduce compute use and its related costs while maintaining high model response accuracy, and can work with Red Hat to receive support with their model optimization initiatives.
  • Optimized model repository. Hosted in the Red Hat AI page on Hugging Face, AI Inference Server offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy.
  • Certified for all Red Hat products. AI Inference Server is included as part of Red Hat OpenShift AI and Red Hat Enterprise Linux AI and is also supported on Red Hat OpenShift and Red Hat Enterprise Linux.
  • Third-party platform deployments. AI Inference Server can be deployed across third-party Linux and Kubernetes platforms and is covered under Red Hat’s third-party support policy. In these cases, Red Hat only supports the Inference Server component and the customer is responsible for issues related to their underlying platform if these cannot be reproduced on Red Hat Enterprise Linux or Red Hat OpenShift.

Explore a path to fully optimized automation

To discover how AI Inference Server helps deliver fast, cost-effective and scalable inference, visit the Red Hat AI Inference Server product page.

Tags:Artificial intelligence

Red Hat logoLinkedInYouTubeFacebookX

Products & portfolios

  • Red Hat AI
  • Red Hat Enterprise Linux
  • Red Hat OpenShift
  • Red Hat Ansible Automation Platform
  • Cloud services
  • See all products

Tools

  • Training and certification
  • My account
  • Customer support
  • Developer resources
  • Find a partner
  • Red Hat Ecosystem Catalog
  • Documentation

Try, buy, & sell

  • Product trial center
  • Red Hat Store
  • Buy online (Japan)
  • Console

Communicate

  • Contact sales
  • Contact customer service
  • Contact training
  • Social

About Red Hat

Red Hat is an open hybrid cloud technology leader, delivering a consistent, comprehensive foundation for transformative IT and artificial intelligence (AI) applications in the enterprise. As a trusted adviser to the Fortune 500, Red Hat offers cloud, developer, Linux, automation, and application platform technologies, as well as award-winning services.

  • Our company
  • How we work
  • Customer success stories
  • Analyst relations
  • Newsroom
  • Open source commitments
  • Our social impact
  • Jobs

Select a language

  • 简体中文
  • English
  • Français
  • Deutsch
  • Italiano
  • 日本語
  • 한국어
  • Português
  • Español

Red Hat legal and privacy links

  • About Red Hat
  • Jobs
  • Events
  • Locations
  • Contact Red Hat
  • Red Hat Blog
  • Inclusion at Red Hat
  • Cool Stuff Store
  • Red Hat Summit
© 2025 Red Hat

Red Hat legal and privacy links

  • Privacy statement
  • Terms of use
  • All policies and guidelines
  • Digital accessibility