Subscribe to the feed

Artificial intelligence is evolving beyond simple pattern recognition or text generation. The agents are here. These are systems designed not just to respond, but to act—to achieve goals within a given environment. At their core, agents combine foundational models with augmentation (like tools, retrieval and memory) operating in a continuous loop of perception, planning and action. Essentially, agent = model + augmentation + loop.

This combination transforms models into dynamic entities capable of tackling a variety of tasks, but the current AI ecosystem often feels scattered. Building these agents today frequently involves navigating a maze of disparate tools, frameworks, models, vector databases and custom integrations. This creates the NxM problem, where integrating N models with M tools demands N × M custom solutions/implementations, resulting in systems that are complex, difficult to scale, challenging to maintain and lack standardization.

To avoid repeatedly rebuilding tool logic, protocols like Anthropic's Model Context Protocol (MCP) are gaining traction by providing a common language that allows AI clients to discover and use tools exposed by MCP servers. This move towards a "one-to-many" interaction model simplifies integration compared to building bespoke connections for every tool.

This is why Red Hat is integrating Llama Stack and MCP into the Red Hat AI portfolio, expanding its unified platform for working with AI models, tools and other necessary components for agentic AI. This builds on Red Hat AI’s existing support for deploying agents built through technologies like LangGraph, LangChain, Crew AI and others.

mcp

 

Red Hat OpenShift AI: A flexible foundation for AI agent development

While standardizing tool interaction helps, developers still need a robust platform specifically designed for building, training, deploying and managing these AI systems. As part of the Red Hat AI portfolio, Red Hat OpenShift AI serves as that enterprise-grade platform for the entire AI lifecycle, empowering you to build AI-powered applications, models and, crucially, agents. OpenShift AI provides flexibility by offering:

  • Streamlined development via an AI API server: To accelerate development and simplify the creation of versatile AI systems, especially those involving retrieval-augmented generation (RAG) and agentic logic, OpenShift AI will incorporate Llama Stack as a unified AI API server. This integration will provide turn-key abstractions to build AI applications, packaging and exposing all the necessary AI primitives such as safety, evaluations, RAG and more.
  • Direct access to core components: For teams requiring more granular control, OpenShift AI provides direct access to powerful underlying building blocks. One example is high-performance inference, which is available today via optimized vLLM integration and model serving. You can use this standalone inference engine and integrate it with your preferred agent framework. Efficient inference is increasingly important in the agentic context, especially since reasoning and planning are at the heart of any agentic application. Similarly, advanced safety features through Trusty AI are accessible for building custom safety layers, including AI guardrails.

Llama Stack: Your unified AI API server on OpenShift AI

Remember the pre-Kubernetes era? Deploying applications meant wrestling with diverse infrastructure, custom scripts and ad hoc configurations. Kubernetes and OpenShift (as a Kubernetes distribution) changed the game by introducing a standardized control plane: an API server with core primitives (e.g., PodsDeployments) and extensibility with Custom Resource Definitions (CRDs). This abstracted away infrastructure complexity, allowing developers to focus on applications, not plumbing.

Today, building AI systems and agents often feels like a pre-Kubernetes world when trying to orchestrate workloads. It involves piecing together various tools, models, and vector databases while producing "glue code" to connect everything. It's complex and fragmented, and lacks those important AI API server abstractions. 

You can think of Llama Stack as the AI equivalent of that Kubernetes control-plane but for AI.

Llama Stack introduces a unified AI API server designed specifically for AI workloads, bringing order to this complexity. Here are some key primitives that Llama Stack provides out of the box:

  • Standard AI APIs: Llama Stack offers consistent endpoints for core AI tasks (inference, RAG, agents, safety). This eliminates the need to learn different APIs for each AI tool.
  • Abstracts complexity: It hides the differences between underlying AI tools, databases, inference engines (e.g., vLLM), and protocols like MCP (by integrating MCP client capabilities). This allows AI developers to switch tools or backends without rewriting significant portions of their code.
  • Extensible design: Its provider model allows plugging in different backends (e.g., vector databases) via standard interfaces, similar to Kubernetes' CRDs and operators. This design allows external contributions with the same API foundations.
  • Agent-focused: Includes built-in concepts critical for agent development (memory, tools, RAG).

 

unified AI API server

 

AI is going to continue to evolve, and the ecosystem will need an AI control-plane. Llama Stack, which is being integrated within OpenShift AI's generative AI (gen AI) experience, provides exactly that, letting developers focus on crafting AI systems and applications instead of  interoperability and hosting. It’s your AI platform behind a well-defined API. 

Simplifying tool interaction with Llama Stack

Llama Stack simplifies connecting to and interacting with tool servers using protocols like MCP, integrating client capabilities directly into its API layer. This shields developers from the lower-level protocol details when using tools through Llama Stack.

mcp client agent

 

Llama Stack with Red Hat OpenShift AI

Looking ahead, Llama Stack will become an integral part of the OpenShift AI platform, managed via the Llama Stack Operator (coming soon, initially as a technical preview). We aim to make this powerful AI control plane readily available out of the box for a seamless developer experience.

Development paths and support options with Red Hat OpenShift AI

OpenShift AI is poised to support your specific strategy for building AI agents, whether it is Llama Stack or other agentic frameworks. Offering this composability while offering Llama Stack integration provides flexibility backed by enterprise support.

Llama Stack simplifies AI development by providing a single point of integration for diverse AI tasks through its API-driven architecture. This design allows for extensibility via "providers," enabling vendors, partners and the community to integrate their own implementations of Llama Stack APIs. For agent development, Llama Stack includes a reference implementation and powerful API primitives (like tool interaction APIs, MCP, RAG, post-training and more), which Red Hat, together with the rest of the Llama Stack community, will continue to enhance.

To accommodate different development preferences, there will be multiple ways to build agents. 

  • Build with Llama Stack and OpenShift AI: For the most streamlined and fully supported experience, build your agents using Llama Stack's native capabilities as part of OpenShift AI with Red Hat’s supported providers/implementations. This approach simplifies development and comes with native Red Hat support.
  • BYO providers with Llama Stack and OpenShift AI: Create your agents with Llama Stack integrated into OpenShift AI, while also utilizing your own Llama Stack-compatible provider implementations (for instance, you can connect directly to an OpenAI endpoint instead of using vLLM for inference). In this situation, Red Hat will offer assistance at the API level, but support for your selected provider or implementation will be provided by its vendor or community. 

    OpenShift AI

 

  • BYO agent framework: You can bring your own agent framework while selectively using Llama Stack APIs, such as inference (OpenAI-compatible), safety, evaluation, etc. In this case, Red Hat will provide support for the Llama Stack APIs you choose to utilize.
  • Building with the core primitives: Finally, you can choose to build directly on OpenShift and OpenShift AI primitives, such as vLLM for inference and running your agent framework as a standard workload. Here, Red Hat supports the platform and its primitives, while support for your chosen framework comes from its vendor or community.

We have prioritized flexibility in choice, tailoring support to meet you at every level, wherever you are in your AI journey.

What’s next?

AI agents are here, and they need a platform. Building high-impact agents requires more than just stitching tools together, however—it demands a robust, flexible and enterprise-ready foundation. Red Hat OpenShift AI, featuring Llama Stack as its unified AI API server, will provide that essential platform with built-in integrations for RAG, agents and tools (with MCP), as well as the following key attributes: 

  • Open source core: Rooted in Red Hat's open-source ethos, the platform provides sensible defaults for inference (vLLM/Model Serving), evaluations, safety (via TrustyAI), RAG, and agents. You can customize every layer of your stack with Bring Your Own (BYO) options and benefit from API contracts designed for interoperability, without enforcing rigid implementations.
  • True hybrid: Deploy consistently and manage seamlessly across public, private, or edge environments. Avoid vendor lock-in and infuse AI directly into your existing enterprise infrastructure wherever it is.
  • Model customization for best outcomes: Go beyond vanilla or basic fine-tuning with state-of-the-art techniques for skill-building, knowledge distillation, and synthetic data generation, striking the right balance between inherent model knowledge and specialized skills.
  • Enterprise-grade security and responsible AI: Build your AI systems and applications with confidence on a security-focused platform that prioritizes data control, compliance and responsible AI principles. 

Build with us and be part of the journey. We would love your feedback. 

  • Explore Llama Stack on OpenShift for RAG/Agentic and start with our curated demos on GitHub, which can be found here.
  • Provide feedback on the demos and help us build for you through this short 2-3 minute survey.
  • Explore our interactive experiences here

Hub

Artificial intelligence (AI) at Red Hat

From live events to hands-on product demos to deep technical research, see what we're doing with AI at Red Hat.

About the author

Adel Zaalouk is a product manager at Red Hat who enjoys blending business and technology to achieve meaningful outcomes. He has experience working in research and industry, and he's passionate about Red Hat OpenShift, cloud, AI and cloud-native technologies. He's interested in how businesses use OpenShift to solve problems, from helping them get started with containerization to scaling their applications to meet demand.
 

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Keep exploring

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds