RSS 피드 구독하기

Most of us rely on good web metadata every day, perhaps hundreds of times a day, without ever realizing it. While we don’t all see metadata, it’s present on every webpage, providing details for categorizing content. Search and information retrieval software—from content management systems and search engines to generative AI (gen AI) chatbots—all depend on metadata to categorize, interpret, and display content to users.

In some ways, metadata is like the classification system librarians use to place books on shelves so that visitors can find them. Much like a library catalog, good web metadata makes content easier to find and use because it has consistent, meaningful labels.

At Red Hat, we maintain dozens of websites with tens of thousands of pages and millions of metadata selections. To help manage it all, we’ve developed a new internal generative AI (gen AI) tool: the Metadata Assistant.

Facing challenges with metadata accuracy

Despite our best efforts to stay organized, we struggle to make accurate and consistent tags. The overwhelming amount of content to classify can feel like a messy, towering pile of books instead of an organized library where you can browse the shelves by meaningful categories. Red Hatters must consider more than 150 taxonomy choices across several parent categories, like products, topics, industries, regions, and partners when classifying their content. Our product portfolio is complex, with products frequently being added, discontinued, or renamed.

Metadata best practices can be tricky for teams to keep up with. Meta titles and descriptions should be as specific as possible to help users select meaningful content. However, despite strict character limits to avoid truncation in search results, title and meta descriptions must be concise and enticing. Balancing specificity and brevity is difficult, and we have a lot of content with too many taxonomy tags, vague titles, or mediocre descriptions. 

At Red Hat, content creators tend to overtag, which floods our visitors with irrelevant information in 3 ways:

  • Populates search filters with irrelevant results: Too many tags mean search filters become cluttered with a huge number of results that aren't truly helpful.
  • Adds irrelevant cross-links and dynamic content: Taxonomy tags are used to dynamically match up and display related content on our webpages. For example, a topic page called Understanding DevOps should link to other content about DevOps—and a shared DevOps taxonomy tag is how the content management system determines which content to display. Overtagging causes irrelevant links to show up on pages as cross-links or conversion points, which is frustrating for visitors who want to expand their learning by exploring content that’s directly relevant to their interests.
  • Serves up confusing personalization experiences: Red Hat uses Adobe personalization software that identifies visitor affinities and interests based on taxonomy tags. Overtagging causes these personalization experiences to become anything but personal, because we cannot determine a visitor’s true interests and intent.

Over the years, we built and used scripting solutions to try to address these issues. The scripted tagging solution used complex rules to update numerous webpages automatically. The scripted tagger couldn’t catch and fix every issue, and the rigid rules we originally set failed to give much insight into—or consideration for—the unique value of each individual piece of content.

How Red Hat's Metadata Assistant helps

In the fall of 2023, the Content Team formed a gen AI pilot working group and built experimental AI tools for supporting content creators in our Marketing organization. The team built a prototype of an internal gen AI tool called Metadata Assistant during a 3-week sprint. Since then, we have released numerous new versions of the application, improving prompt structure, UI and user feedback mechanisms, and refactoring the application to use open source technology and highly structured prompts. 

The application evaluates Red Hat web content and marketing collateral using structured prompts, metadata best practices, and documentation for Red Hat writers and editors. It also collects and incorporates feedback from Red Hatters to suggest titles, meta descriptions, summaries, and taxonomy tags that are consistent with our writing standards and official taxonomy terms. In other words, the Metadata Assistant is helping us build our library, saving Red Hatters significant time.

The team designed and built the Metadata Assistant to help with quick tagging tasks and to reduce the number of irrelevant tags. The Metadata Assistant’s dual purpose is to reduce the time and tedium of tagging content and to declutter the content experience for our visitors.

How it works: Relieving Red Hatters from long forms and tedious tasks

The Metadata Assistant is a simple web application that takes a URL, PDF, or a plain text content draft and generates suggested metadata. At Red Hat, content strategists and writers complete a lengthy and wildly unpopular document called the web cover sheet. Despite being tedious, difficult, and time-consuming, the cover sheet is an important part of content creation because it contains all the metadata that needs to be added to each webpage. The Metadata Assistant uses the web cover sheet instructions, standardized tagging guidelines, and taxonomy tag choices to create proposed metadata in a matter of seconds.

The latest version of the Metadata Assistant is deployed on Red Hat OpenShift Service on AWS and uses the Mistral 7B Instruct v.03 large language model (LLM). Mistral is a fully open source LLM, provided by Red Hat’s internal LLM-as-a-Service (LLMaaS) tools. To ensure the tags are always up to date, the taxonomy is automatically updated nightly.

The Metadata Assistant performs 2 tasks:

  • Selects the most relevant taxonomy tags: The Metadata Assistant is instructed to select only one tag per taxonomy, which helps reduce overtagging. There are 7 taxonomies that contain predefined labels for things like products, topics, and partners.
  • Generates a first draft of descriptive metadata: This includes a title, meta description, and summary of the content. The Metadata Assistant generates a meta title tag (55-60 characters), a meta description (155-160 characters), and a brief summary that can be used as landing page text to encourage visitors to download assets like e-books, checklists, and overviews of Red Hat technologies.

Conclusion

Including other uses like large-scale automated taxonomy audits and updating existing pages with newly created taxonomy tags, it’s clear the Metadata Assistant has substantial potential to change and improve how Red Hatters create and look after the metadata that makes our content findable, meaningful, and relevant to a diverse global audience.

To understand how AI models are delivered as shared resources, learn more about Models-as-a-Service.  


저자 소개

Gail Vadia is a Senior Content Strategist at Red Hat. She is a writer and editor focused on standardizing content across Red Hat’s core web properties to deliver a cohesive user journey throughout our digital ecosystem.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래