AI010
Red Hat AI Inference Server Technical Overview
Overview
Unlock the full potential of your Kubernetes infrastructure.
Course Description
- Gain essential insights into AI deployment with this Red Hat AI Inference Server technical overview. Learn how to address the complexities and costs of running AI models in production. Discover how Red Hat's solution, powered by vLLM, optimizes performance and delivers significant cost savings across cloud, on-premise, virtualized, and edge environments. Dive into advanced techniques like quantization and speculative decoding to enhance your AI inference capabilities. This on-demand video content demonstrates seamless model deployment and management within OpenShift AI, showcasing how you can achieve unparalleled efficiency and flexibility for your AI workloads.
Course Content Summary
- What is Inference?
- Challenges with Inference
- Red Hat AI Inference Server Solution
- Red Hat AI Portfolio Integration
- Flexibility of Deployment
- LLM Compression Tool (Quantization)
- Performance Optimization Techniques (kV Cache, Speculative Decoding, Tensor Parallel Inference)
- Case Studies
- Model Deployment and Management
- Storage Connections for Models
- Metrics and Monitoring
- Hugging Face Integration
Audience for this course
- AI/ML Engineers and Practitioners
- DevOps Engineers
- Cloud Architects and Engineers
- Technical Decision-Makers
Recommended training
- There are no prerequisites for this Technical Overview.
Technology considerations
- N/A
Outline
Course Outline
- What is Inference?
- Challenges with Inference
- Red Hat AI Inference Server Solution
- Red Hat AI Portfolio Integration
- Flexibility of Deployment
- LLM Compression Tool (Quantization)
- Performance Optimization Techniques (kV Cache, Speculative Decoding, Tensor Parallel Inference)
- Case Studies
- Model Deployment and Management
- Storage Connections for Models
- Metrics and Monitoring
- Hugging Face Integration