Boosting support case summarization efficiency with AI: A Granite model success story

2 de julho de 2025Prasanna Pujari, Deep Gandhi5 minutos (tempo de leitura)

TL;DR

The article details the evolution of Red Hat’s AI-driven support case summarization feature through its transition to Granite models. Initially built using a Mistral model, evaluations showed that Granite performed better using Red Hat’s production data, leading to an increase in case summarization use and user productivity. Future plans include evaluating multilingual support and higher context limits for more comprehensive summaries.

How we started

In Red Hat’s global support organization, “Follow the Sun" (FTS) cases are recognized as a top priority, requiring smooth transitions of case ownership across different regions and time zones until an issue is resolved. For support engineers, a concise and accurate case summary is vital for quickly grasping the context, latest updates and customer impact related to the case. Manually creating a summary was often time-consuming and lacked a standard format.

We recognized this as a clear opportunity to infuse AI into our existing processes to make some clear and measurable improvements. This initiative began by collaborating with subject matter experts (SMEs) to more deeply understand user workflows, resulting in the development of a standardized summary template.

Following an evaluation of large language model (LLM) options and in partnership with Red Hat’s internal data science team, the Mistral-7b-Instruct-v0.2 model was initially selected, recognizing that this small language model (SLM) could deliver the targeted business outcome cost-effectively and resource-effectively without the overhead of a larger LLM. Deployed within Red Hat’s implementation of Salesforce, the solution allowed our support engineers to generate and review summaries directly alongside the rest of the existing case information.

Our initial rollout in August 2024 targeted FTS cases, with a strong focus on feedback-driven improvement. We achieved early success through the automation of case summaries, resulting in faster case resolution. This success led to expanding the case summary feature to all English language cases in December 2024.

Pivoting to a Granite model

Case summarisation powered by Mistral was providing adequate results, with a few areas of improvement identified through feedback collected from users who, for example, wanted specific details to always be part of a summary. This feedback showed us that our use case was adding value and our focus now pivoted to expanding on this success.

Of course, the AI space is moving very quickly, with new features, versions and technology to consider almost daily. The release of the Granite-3.1-8b-instruct model offered a strategic opportunity to evaluate its output accuracy for this use case, primarily due to its improved summarization, expanded context window and multilingual support.

The transition: A data-driven approach

Before fully adopting Granite-3.1-8b-instruct and making a production cut-over, we prioritized a rigorous evaluation. A real-world simulation was conducted by replaying actual production data through both the current Mistral model and the Granite model we were evaluating.

The actual production traffic we replayed included 250 AI-generated case summaries produced between February 15 and 17, 2025. The sample size was constrained due to limited GPU resources. The summaries produced by both models were compared using an “LLM as a Judge” method, with the prometheus-eval/prometheus-8x7b-v2.0 model serving as the evaluator. The LLM (acting as a judge) assessed generated summaries and rated them on a scale of 1 to 10, based on relevance, coherence and the level of detail provided in the case context for each record.

Summary generation may involve multiple steps, depending on the LLM's context window. If the case context exceeds the context window, it's divided into multiple chunks, and a summary is generated for each chunk as an intermediate step. In the final step, the intermediate summaries are combined with the generated summary to produce the final result.

Context window:

Mistral-7b: 32K tokens
Granite-3.1-8b-instruct: 128K tokens

Since the context window size of the Prometheus model is 32K, it was unable to evaluate the intermediate summaries produced by Granite because of its larger 128K context window. For the Mistral model evaluation, we averaged the intermediate and final ratings and compared this average to the final score for summaries generated by Granite.

Observations

We were looking for the overall rating to be higher than the existing 7.2 score, based on relevance, coherence and the level of detail provided in the case context.

Rating/Evaluation step	Average rating for Mistral-7b	Average rating for Granite-3.1-8b-instruct	Note
Final_summary	7.28	7.6
Intermediate_summary	7.14		Since the context window size of Prometheus is 32K, it was unable to evaluate the intermediate summaries produced by Granite because of its larger context window of 128K.
Overall Rating	7.21	7.6

Conclusion : Based on the overall rating it can be concluded that the Granite-3.1 model outperforms the Mistral-7b model by a small margin (0.39 Percentage point).

Technical parameters

When a support engineer is waiting on a summary in order to respond to a high-priority ticket, faster generation can directly speed up resolution time. This test measured how quickly and accurately different models could generate structured case summaries, focusing on both speed and how well they followed the required format—such as including the issue, steps so far and next steps.

Parameter	Mistral-7b	Granite-3.1-8b-instruct
Average summary generation time	39.17s	24.86s

(Output’s) Structural adherence
Summary has required sections	99.99%	99.98%
Summary has repeated sections	0%	5.41%

Based on the above comparison, Granite took less time to generate a summary compared to Mistral and has comparable performance for structural adherence. One caveat is that in ~5% of the sample, the Granite-generated summary had repeated sections, which can impact readability.

The faster generation time and slightly improved rating were encouraging signals to proceed with switching to the Granite model in our production solution. Before proceeding, we wanted to see if we could correct for the repeated sections, so we adjusted the prompts accordingly.

After this change, the evaluation pipeline showed no repeated sections for any of the summaries, with average time for summary generation as 19.62s improving the overall rating to 7.41.

After model transition

Since deploying the Granite model, we've seen its usage for case summarization increase by about 60%. This rise is partly due to leadership supporting the use of this new gen AI capability, believing it will improve the experience for both our support engineers and customers by providing faster and more effective incident resolution.

While strategic backing was important, the Granite model's performance and value for summarization exceeded our expectations, which played a key role in our decision to transition to Granite. In addition to improving case resolution time and quality, automating case summaries has given our support engineers more time to focus on more complex and strategic tasks.

Looking ahead: exciting innovations

As we look to improve the case summarization tool's performance and accuracy and further expand its use, there are several enhancements we are considering, including:

Multilingual support: We hope to use more of the Granite model's capabilities to address missing features in the existing solution. Our future roadmap includes incorporating multilingual support, which is available in newer versions of Granite. This will let us summarize cases across diverse linguistic localities, further expanding our reach and improving our service for the global customer and Red Hat support associate communities. This will further free up support engineers' time, contributing to greater overall efficiency and productivity.
Increased context limit: Since the Mistral model had a smaller context window, our solution was initially unable to process the input for cases that exceeded 150K characters. As new versions of Granite are released, we are working to increase the context limit that Granite can process. This will allow the model to consider a broader range of information when generating summaries, leading to even more comprehensive and nuanced outputs, particularly for complex cases with extensive histories.

These planned enhancements underscore our ongoing investment in AI-driven solutions that directly address the evolving needs of our business and our customers.

Conclusion

By automating the case summarization process with AI, we saw improvements in operational efficiency and consistency. Manual effort was significantly reduced, and summaries now follow a uniform structure with complete, relevant information. Most importantly, this speeds up the delivery of case details to regional teams, enabling faster handoffs and improving overall customer experience.

Learn more on how you can inference any open source model, including Granite models, across any accelerator and environment by visiting the Red Hat AI Inference Server product page.

Sobre os autores

Prasanna Pujari

Principal Business Analyst

Prasanna is Seasoned Business Analyst with a demonstrated history of Business analysis, Product Management with special focus on AI & Analytics . He has been working at Red Hat since Nov 2021. He has been a key member of XE- AI& Data team working on exciting AI use cases like Case Summarisation, KCS Drafting

Read full bio

Deep Gandhi

Principal Software Engineer

I am a Principal Software Engineer who has consistently delivered robust and scalable solutions. I focus on designing and driving AI-powered solutions that align with business goals and create tangible value. I specialize in identifying high-impact use cases, defining AI integration strategies, and collaborating with cross-functional teams to bring intelligent systems into production.

Read full bio