The Metadata Assistant: How Red Hat is using generative AI to make web content easier to find and use

2025년 9월 3일Anna McHugh, Gail Vadia4분 읽기

Most of us rely on good web metadata every day, perhaps hundreds of times a day, without ever realizing it. While we don’t all see metadata, it’s present on every webpage, providing details for categorizing content. Search and information retrieval software—from content management systems and search engines to generative AI (gen AI) chatbots—all depend on metadata to categorize, interpret, and display content to users.

In some ways, metadata is like the classification system librarians use to place books on shelves so that visitors can find them. Much like a library catalog, good web metadata makes content easier to find and use because it has consistent, meaningful labels.

At Red Hat, we maintain dozens of websites with tens of thousands of pages and millions of metadata selections. To help manage it all, we’ve developed a new internal generative AI (gen AI) tool: the Metadata Assistant.

Facing challenges with metadata accuracy

Despite our best efforts to stay organized, we struggle to make accurate and consistent tags. The overwhelming amount of content to classify can feel like a messy, towering pile of books instead of an organized library where you can browse the shelves by meaningful categories. Red Hatters must consider more than 150 taxonomy choices across several parent categories, like products, topics, industries, regions, and partners when classifying their content. Our product portfolio is complex, with products frequently being added, discontinued, or renamed.

Metadata best practices can be tricky for teams to keep up with. Meta titles and descriptions should be as specific as possible to help users select meaningful content. However, despite strict character limits to avoid truncation in search results, title and meta descriptions must be concise and enticing. Balancing specificity and brevity is difficult, and we have a lot of content with too many taxonomy tags, vague titles, or mediocre descriptions.

At Red Hat, content creators tend to overtag, which floods our visitors with irrelevant information in 3 ways:

Populates search filters with irrelevant results: Too many tags mean search filters become cluttered with a huge number of results that aren't truly helpful.
Adds irrelevant cross-links and dynamic content: Taxonomy tags are used to dynamically match up and display related content on our webpages. For example, a topic page called Understanding DevOps should link to other content about DevOps—and a shared DevOps taxonomy tag is how the content management system determines which content to display. Overtagging causes irrelevant links to show up on pages as cross-links or conversion points, which is frustrating for visitors who want to expand their learning by exploring content that’s directly relevant to their interests.
Serves up confusing personalization experiences: Red Hat uses Adobe personalization software that identifies visitor affinities and interests based on taxonomy tags. Overtagging causes these personalization experiences to become anything but personal, because we cannot determine a visitor’s true interests and intent.

Over the years, we built and used scripting solutions to try to address these issues. The scripted tagging solution used complex rules to update numerous webpages automatically. The scripted tagger couldn’t catch and fix every issue, and the rigid rules we originally set failed to give much insight into—or consideration for—the unique value of each individual piece of content.

How Red Hat's Metadata Assistant helps

In the fall of 2023, the Content Team formed a gen AI pilot working group and built experimental AI tools for supporting content creators in our Marketing organization. The team built a prototype of an internal gen AI tool called Metadata Assistant during a 3-week sprint. Since then, we have released numerous new versions of the application, improving prompt structure, UI and user feedback mechanisms, and refactoring the application to use open source technology and highly structured prompts.

The application evaluates Red Hat web content and marketing collateral using structured prompts, metadata best practices, and documentation for Red Hat writers and editors. It also collects and incorporates feedback from Red Hatters to suggest titles, meta descriptions, summaries, and taxonomy tags that are consistent with our writing standards and official taxonomy terms. In other words, the Metadata Assistant is helping us build our library, saving Red Hatters significant time.

The team designed and built the Metadata Assistant to help with quick tagging tasks and to reduce the number of irrelevant tags. The Metadata Assistant’s dual purpose is to reduce the time and tedium of tagging content and to declutter the content experience for our visitors.

How it works: Relieving Red Hatters from long forms and tedious tasks

The Metadata Assistant is a simple web application that takes a URL, PDF, or a plain text content draft and generates suggested metadata. At Red Hat, content strategists and writers complete a lengthy and wildly unpopular document called the web cover sheet. Despite being tedious, difficult, and time-consuming, the cover sheet is an important part of content creation because it contains all the metadata that needs to be added to each webpage. The Metadata Assistant uses the web cover sheet instructions, standardized tagging guidelines, and taxonomy tag choices to create proposed metadata in a matter of seconds.

The latest version of the Metadata Assistant is deployed on Red Hat OpenShift Service on AWS and uses the Mistral 7B Instruct v.03 large language model (LLM). Mistral is a fully open source LLM, provided by Red Hat’s internal LLM-as-a-Service (LLMaaS) tools. To ensure the tags are always up to date, the taxonomy is automatically updated nightly.

The Metadata Assistant performs 2 tasks:

Selects the most relevant taxonomy tags: The Metadata Assistant is instructed to select only one tag per taxonomy, which helps reduce overtagging. There are 7 taxonomies that contain predefined labels for things like products, topics, and partners.
Generates a first draft of descriptive metadata: This includes a title, meta description, and summary of the content. The Metadata Assistant generates a meta title tag (55-60 characters), a meta description (155-160 characters), and a brief summary that can be used as landing page text to encourage visitors to download assets like e-books, checklists, and overviews of Red Hat technologies.

Conclusion

Including other uses like large-scale automated taxonomy audits and updating existing pages with newly created taxonomy tags, it’s clear the Metadata Assistant has substantial potential to change and improve how Red Hatters create and look after the metadata that makes our content findable, meaningful, and relevant to a diverse global audience.

To understand how AI models are delivered as shared resources, learn more about Models-as-a-Service.

저자 소개

Anna McHugh

Gail Vadia

Senior Content Strategist

Gail Vadia is a Senior Content Strategist at Red Hat. She is a writer and editor focused on standardizing content across Red Hat’s core web properties to deliver a cohesive user journey throughout our digital ecosystem.

Read full bio

자세히 알아보기

채널별 검색

모든 채널 탐색

The Metadata Assistant: How Red Hat is using generative AI to make web content easier to find and use

Facing challenges with metadata accuracy

How Red Hat's Metadata Assistant helps

How it works: Relieving Red Hatters from long forms and tedious tasks

Conclusion

저자 소개

Anna McHugh

Gail Vadia

유사한 검색 결과

자세히 알아보기

채널별 검색

제품 & 포트폴리오

툴

체험, 구매 & 영업

커뮤니케이션

Red Hat 소개

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links