In our first post, "The Hope Phase," we shared how early wins with a structured content set gave us the confidence to scale. But optimism alone couldn’t carry us through the complexity ahead.
As we expanded into messy, dense, unstructured go-to-market content, the cracks showed fast. Retrieval quality slipped. Feedback loops stalled. Our assumptions about the bot’s capabilities, the flexibility of its architecture, and how to tune for relevance began to fall apart.
This phase wasn’t a misstep. It was friction showing us what wasn’t working—and what had to change before we could move ahead.
AI Challenge: Construct a chatbot that can leverage constantly changing, unstructured go-to-market (GTM) content to reduce sales friction by providing brief and accurate answers to seller questions as well as links to more detailed information.
The Build: We built this assistant on Red Hat OpenShift Platform Plus (MP+) and Red Hat OpenShift AI, using Granite as the core model giving us enterprise-grade model serving and deployment. LangChain orchestrated the retrieval flow, and PGVector handled vector storage (an extension to the popular PostgreSQL database). We used MongoDB for logging the interactions with the AI. To preserve context from long-form documents, we used structure-aware tools like Docling and experimented with Unstructured’s Python libraries to pull speaker notes from slides. While that code didn’t make it into production, the experiment revealed just how crucial structure and formatting are to successful retrieval—lessons that now guide our preprocessing efforts.
Phase 2: The crash
“Why is it doing that... and why can’t we fix it?”
Even in the misfires, we gained clarity. This gave us the context we needed to course-correct.
The chatbot did what bots do, it surfaced answers and recommended content. Unfortunately, it was the wrong content. The same decks, over and over. Outdated resources. Sometimes it pulled a niche version of a deck, not even the best one, or for a specific industry vertical the user never asked about. The answers technically made sense for the sources it retrieved, but they missed the mark for relevancy.
We pulled every lever we thought might help—signal tuning, experimenting with metadata filters, taxonomy alignment, even question design. Each adjustment moved the needle slightly, but none cracked the core problem. We were working with enterprise content that was too dense, too similar, and too unstructured for a basic retrieval-augmented generation (RAG) setup to handle. That’s when it became clear: the architecture itself wasn’t built for this level of complexity.
Example – Making the case for metadata
At the time, even getting structured metadata into the retrieval logic—or onto the project roadmap at all—felt like an uphill battle. Technical teams were focused on model performance, so as the project manager I had to advocate hard for metadata-informed retrieval and taxonomy-driven organization. I even built a slide titled “Why Metadata Matters” to make the business case. Back in 2024 metadata tags weren’t considered part of the retrieval pipeline, let alone chunking strategy. Now, approaches like taxonomy-based filtering and contextual chunking are becoming standard in RAG setups built for content-heavy environments like ours.
Handoffs across teams, shifting ownership, and a lack of shared AI experience meant no one had the full picture. Many of us were new—or very nearly new—to AI and chatbots. And we still had our day jobs. We were learning in real time, assessing, troubleshooting, and validating outputs while trying to understand the problem space. Without continuity or collective visibility, it was nearly impossible to connect the dots between architecture, tagging, source selection behavior, and user intent. What started as an asset type problem, then an assumed metadata problem, became a surfacing challenge, then a governance issue—until finally, we saw it for what it was: a system misfit.
We weren’t teaching the bot. We were dragging it into a use case it wasn’t designed to handle.
Lessons playbook: Phase 2 – When things break for a reason
- Retrieval ≠ summarization: Irrelevant output is usually a retrieval issue, not a summarization one. Engineering teams validated the summaries as technically accurate, but business teams flagged the source materials as wrong. Both were right. We just hadn’t separated our testing or metrics for each layer.
- LLM isn’t enough without structured retrieval: Even the best large language model (LLM) won’t deliver useful answers if chunking, embedding behavior, and retrieval design aren’t optimized. But too often, teams only focus on model choice, vector speed, or eval scores—missing what really makes retrieval for unstructured content at scale possible. Traditional RAG treats all chunks equally, so without role-awareness, filters, or intent signals, it can’t prioritize relevance. Tags alone don’t help unless they’re wired into the logic. We hit those limits early—when hybrid RAG and contextual retrieval were emerging. Now there’s better language and tooling, but the lesson holds: smart chunking + structured signals unlock retrieval—and that’s what lets the LLM actually deliver.
- Development efforts and tuning must match the use case: Tuning can make a huge difference, but only when you are tuning for the data sources you have. We wanted to try adjusting relevance scoring, popularity scoring, and other ranking levers, but our configuration wasn’t built to respond.
- Misalignment stalls progress: When systems cannot be tuned to your intent, morale suffers. Engineers saw green checks. Content owners saw wrong answers. That disconnect between technical validation and business expectations was a difficult gap to bridge. The distance between Engineering and SMEs quietly stalled progress.
- Everyone expected something different: We didn’t align early on what this chatbot was for and what its main function should be. Expectations varied across users, testers, stakeholders, and leadership. And when AI is the buzzword, everyone expects something different.
Red Hat reflection
If we’d been able to better align on intent, architecture constraints, and what tuning could (and couldn’t) accomplish, and if we had separated validation across retrieval and summarization layers, we might have moved faster. But like many others, we were learning as the market evolved, navigating the complexities of applying RAG to unstructured enterprise content at scale.
Deliver AI value with the resources you have, the insights you own, and the freedom you need. Red Hat AI is engineered to help you build and run AI solutions that work exactly how your business does—from first experiments to full production.
In Phase 3: Iteration with Intention, we’ll shift focus—drawing from the hard-won clarity of this phase to evolve our content strategy, retrieval approach, and enablement design. What we learned didn’t just shape our next bot, it sparked a vision for a dedicated agent designed to manage this content domain, feeding into an AI-ready content lifecycle, smarter retrieval infrastructure, and the early framework of our enterprise knowledge model.
resource
Get started with AI for enterprise: A beginner’s guide
About the author
Andrea Hudson is a program manager focused on making AI tools useful in the messy world of enterprise content. She has been at Red Hat since 2022, she helps teams untangle complexity, connect the dots, and turn good intentions into working systems. Most recently, she helped reshape an AI chatbot project that aimed to surface the right go-to-market content—but ran into the chaos of unstructured data.
Her background spans product launches, enablement, product testing, data prep, and evolving content for the new AI era. As a systems thinker with early training in the U.S. Navy, she relies on what works in practice, not just in theory. She’s focused on building things that scale, reduce rework, and make people’s lives easier.
Andrea writes with honesty, sharing lessons from the projects that don’t go as planned. She believes transparency is key to growth and wants others to have a starting point by sharing the messy middle and not just the polished end.
When she’s not wrangling AI or metadata, you’ll find her tinkering with dashboards, learning low/no-code tools, playing on Red Hat’s charity eSports teams, recruiting teammates, or enjoying time with her family.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Virtualization
The future of enterprise virtualization for your workloads on-premise or across clouds