AI010

Red Hat AI Inference Server Technical Overview

Choose a location to get started

Get started

On-site training available

If you would like to get your entire team trained, we can do it on your premises, in-person or remote.

Red Hat Learning Subscription

Comprehensive training and learning pathways on Red Hat products, industry-recognized certifications, and a flexible and dynamic IT learning experience.

Find out what other students have said about this course in our Red Hat Learning Community.

Overview

Unlock the full potential of your Kubernetes infrastructure.

Course Description

Gain essential insights into AI deployment with this Red Hat AI Inference Server technical overview. Learn how to address the complexities and costs of running AI models in production. Discover how Red Hat's solution, powered by vLLM, optimizes performance and delivers significant cost savings across cloud, on-premise, virtualized, and edge environments. Dive into advanced techniques like quantization and speculative decoding to enhance your AI inference capabilities. This on-demand video content demonstrates seamless model deployment and management within OpenShift AI, showcasing how you can achieve unparalleled efficiency and flexibility for your AI workloads.

Course Content Summary

What is Inference?
Challenges with Inference
Red Hat AI Inference Server Solution
Red Hat AI Portfolio Integration
Flexibility of Deployment
LLM Compression Tool (Quantization)
Performance Optimization Techniques (kV Cache, Speculative Decoding, Tensor Parallel Inference)
Case Studies
Model Deployment and Management
Storage Connections for Models
Metrics and Monitoring
Hugging Face Integration

Audience for this course

AI/ML Engineers and Practitioners
DevOps Engineers
Cloud Architects and Engineers
Technical Decision-Makers

Recommended training

There are no prerequisites for this Technical Overview.

Technology considerations

Outline

Course Outline

What is Inference?
Challenges with Inference
Red Hat AI Inference Server Solution
Red Hat AI Portfolio Integration
Flexibility of Deployment
LLM Compression Tool (Quantization)
Performance Optimization Techniques (kV Cache, Speculative Decoding, Tensor Parallel Inference)
Case Studies
Model Deployment and Management
Storage Connections for Models
Metrics and Monitoring
Hugging Face Integration

Outcomes

Recommended next course or exam

Developing and Deploying AI/ML Applications on Red Hat OpenShift AI (AI267)

Overview

Course Description

Course Content Summary

Audience for this course

Recommended training

Technology considerations

Outline

Course Outline

Outcomes

Recommended next course or exam

Products & portfolios

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links