Technically Speaking
Technically Speaking | How open source helps AI transparency

This video can't play due to privacy settings

To change your settings, select the "Cookie Preferences" link in the footer and opt in to "Advertising Cookies."

How open source can help with AI transparency

  |  Technically Speaking Team  
Artificial intelligence

This episode digs into the complexities of achieving transparency in artificial intelligence (AI) through open source practices. Red Hat CTO Chris Wright and guests JJ Ashgar, Richard Fontana, and Rob Geada help us explore the challenges around making AI more open, trustworthy, and accountable. We also look at how open source project TrustyAI — built on top of Red Hat OpenShift AI and Open Data Hub— is looking to address bias in AI and promotes Explainable AI (XAI). By exploring the challenges of transparency in AI we can learn how open source development processes can help create a more open and accessible future for AI.

Transcript

Transcript

00:00 - Chris Wright
AI has trust issues and the lack of transparency threatens to undermine its potential benefits. Just as Linux and Containers initially faced skepticism, they eventually won trust through open source principles, community engagement, standardization, and demonstrating reliability. So could a similar path exist for AI?

00:21 - Title Animation


00:29 - Chris Wright
The open source software that we all know and love is pretty well established. We get full access to the source code written by programmers and a license that allows users to modify, enhance, and even redistribute the software according to their needs. This foundation of openness is what builds trust and drives innovation.

00:49 - Richard Fontana
 Open source software definitionally requires you have the source code be available to your software. What is the analog to that for AI? That is not clear. Some people believe very strongly that that means your training data has to be open. That means you have to provide the training data. That would actually be highly impractical for pretty much any LLM. So if that's the answer, it raises some difficult problems for open source and AI because it suggests that at this state of the game, open source AI may not be practical or possible. It may be a sort of utopian thing that we have to aim towards. And I think that's the view that some people have.

01:29 - Chris Wright
When I think about open source AI, it starts with the model and extends to the software stack. The core components include an open data set, weights, and open source licensing of the resulting models so users can modify and share. In terms of the software stack, the majority of the software stack is already produced in open source and we can certainly imagine a future with an entirely open stack. Now, that may be a bit of a utopian view, but we are currently taking steps to prioritize our efforts in reproducibility and transparency.

02:04 - Richard Fontana
 What has been going on over the past few years is that machine learning practitioners and companies have commendably been releasing models to the public. We see the term open source used indiscriminately for any public release of a model, no matter how restrictive the license is. Many of the licenses that are being applied to public models that are being described as open source do discriminate. They discriminate against persons and groups. They discriminate against fields of endeavor. And yet people are calling them open source. So that's part of the situation we're in today.

02:44 - Chris Wright
But open source also conveys the sense that there's a community of contributors behind it, creating a system of checks and balances.

02:53 - JJ Ashgar
 So when you look at InstructLab, it's important to open source because the core value and the core draw to it is that it is a workflow that works on your laptop and can work in your data center. And it is a bunch of Python code that can build that workflow for you to get your downstream model, to have the fine tuning it is needed to do it. Okay, it's Apache 2.0 license. Sure, you can take it and build a proprietary system off of it, but there's no real value prop to doing that because we are building this in the open for the greater good of society.

03:32 - Chris Wright
Community involvement in AI model development ensures multiple validations, enhancing trust. InstructLab exemplifies this by enabling open source collaborations that refine AI models.

03:45 - JJ Ashgar
 What is the old saying? The four eye rule of development? Where you need at least four eyes before you hit that merge button. That means any knowledge that we're putting into the InstructLab ecosystem means there's four people saying, "Yes, this is okay."

04:02 - Chris Wright
As we work toward creating more trustworthy models through community involvement, it's equally important to know what data goes into the model and understand the decision-making processes. Which algorithm was used? Which data points were input? And how those data points were processed to get the final result? This is important because we want AI to make decisions that will have very real impacts on people's lives.

04:28 - Rob Geada
 One useful question to ask yourself is, what would happen if my model was wrong every time? Let's say you are deploying a model that predicts something like a loan acceptance rate in applicants and it takes in a bunch of information about the people like where they live, their demographic information, what they do for a job, et cetera. And from that it predicts whether or not you should give them a loan based on how likely they are to pay it back or something like that. Now, what you might want to do when you've deployed that model is monitor how biased it is against different values of, say, that demographic information that it's receiving.

05:02 - Rob Geada
 You might notice a certain skew in how likely your model predicts that applicants of a certain race will pay back their loan. And odds are that's going to be unacceptable for you. And so you need to be able to see this information and understand how your model operates over, say, these different demographic groupings and understand how likely it is to, say, give positive outcomes to all these different demographic groupings. To make sure that your model is operating fairly, that it's treating all of your customers with equal opportunity.

05:37 - Chris Wright
We need to understand how the system arrived at a particular decision or prediction and whether the decision it's making is fair and aligned with our values.

05:47 - Rob Geada
 TrustyAI is an open source set of responsible AI tools. Things like bias monitoring, explainability and model guard railing and language model evaluation scripts. A whole bunch of responsible AI tools to integrate into AI workflows. With the idea being if we can align it as closely as possible with AI deployments, we can make it as easy as possible to be safe, to be responsible, to be ethical with your AI. The analogy I like is that of a seatbelt. It's great that you have a seatbelt in your car, you have to, but it means nothing if you don't wear it. And that's the idea with responsible AI is you need to make use of those tools and know what you will do with the information that they provide you.

06:36 - Chris Wright
Today's efforts in making AI more open and transparent are just the beginning. Just like Linux and Containers revolutionized their fields through open standards and community engagement, AI too can follow a similar path. The journey to trustworthy AI is complex and filled with challenges, but it's a journey worth taking. By learning from the past and embracing open source principles, we can pave the way for a future where AI is not only powerful, but also trusted and transparent. Thanks for watching. We'll see you next time.

About the show

Technically Speaking

What’s next for enterprise IT? No one has all the answers—But CTO Chris Wright knows the tech experts and industry leaders who are working on them.