A lot of people are hoping for something called agentic AI. No, it's probably not built to spy on you. Agentic AI is the personal assistant who can take actions on your behalf. For Jamie Beckland, that means taking over a time-consuming task. I actually am using an agentic AI to send some emails on my own behalf, and it was a little bit scary. So I've tiptoed lightly into it, but early stage outbound prospecting emails, so kind of in a sales context to educate a new user about the benefits of the API Context as a product. It's been interesting because what I found is that having the personal touch is not that difficult in an email. I've been able to add prompts and instructions that sound like me and that are based on things that I have said before and that I would want somebody to know. Jamie’s agent is sending, reading, processing, and also responding to emails as if it were Jamie himself. With enough prompting, it even sounds like Jamie most of the time. There have been some mishaps where, you know, it used slang that I wouldn't use and things like that. So I've had to go in and correct it and sort of narrow the scope. But, you know, the net result has been very positive. I mean, it's touching many, many more thousands of people every single day than I could touch on my own. We may be at the start of a new era where AI can take actions on our behalf. How does that work? How do we make sure the AI agents act in the way that we want them to? And what are the consequences of getting it wrong? This is Compiler, an original podcast from Red Hat. I'm Johan Philippine. I'm Kim Huang. And I'm Angela Andrews. On this show, we go beyond the buzzwords and jargon and simplify tech topics. We're figuring out how people are working artificial intelligence into their lives. And in this episode, we're taking a look at agentic AI. So this is a relatively new term for me, agentic AI. Angela, Kim, is this something you've heard of before? Just recently. This has been the rage. I was on a panel, and one of the panelists was talking about it. I was on a stream, and my guest was talking about it. I've been hearing it, seeing it in my feed. Because I heard it so much, I had to Google it. So now I have a passing fancy about what it is. And of course, Jamie just gave us his example, which is kind of the example that other folks were giving as well. It's a little bit scary, but people are calling this a time saver. So is this the wave? Is this where we're going now? Apparently. Yeah. Okay. So this is kind of the stuff of movies and some recent advertisements, right? Where an AI can actually do something for you with minimal input, like what personal assistants were supposed to be able to do a few years ago. Right? Earlier, we heard from Jamie Beckland, he's the chief product officer at API Context, and he's been on the show before. His initial perspective on generative AI, based on previous experience with machine learning, was pretty skeptical. He didn't think that machines could take a task from start to finish by themselves. I think my expectation was that it was going to hit a couple of the high notes, more like an outline or more like a sketch or a thumbnail of what the finished product should look like. But it was really going to still take a lot of my own effort. So, I mean, that was the huge distinction for me, was seeing how polished large language models can produce generative AI outputs. So, you know, obviously we're dealing with language. But the ability for the AI to understand what I was asking for and then respond at the right altitude with the right nuance, that was really, really surprising. So his first experience with an actual large language model really completely shifted his perspective. And now, you know, a few months later, he's having an AI answer his emails for him. That's a pretty big shift. He drank the Kool-Aid. But getting to that point, even with the progress made in such a short period of time, still took some tweaking and human intervention to make sure the AI got it right, even if it doesn't always sound exactly like Jamie. Obviously, part of the opportunity for AI is to continue learning, continue growing, continue changing. And so the interaction when people are crafting their inputs, you know, you sometimes hear the notion of prompt engineering to get the AI to do exactly what you want. You need to ask it the question or give it the task in the exact right way. Right? You need to tell the AI exactly what you want it to do. I had a colleague once tell me something that stuck with me. He goes by Red Beard, and he was explaining to a group of us that a computer will do exactly what you tell it to do to the letter, which may not be the same thing as what you intend for it to do. That's right. Angela, have you kind of come across this in your own experience as a developer? Right? Like sometimes you'll write a command or a program and you want to do this one thing, but it'll just go off in a direction that you're not expecting. But kind of looking back at it, it's actually, yes, exactly what I told it to do. Yes, I've written bad code. It has done things that I didn't expect it to do because I told it something totally irrational. And it's going to do exactly what you ask it to do. So, you know, beware of that. This is a point that we're going to come back to a couple of times in this episode because it's a pretty big one. But in the meantime, it pays to get better at "prompt engineering," which is a fancy way of saying "writing the exact commands for an AI so it'll do what you want it to." But the level of precision you need for that is changing as well. So the skills around manipulating AI to get the output that you want is something that humans have to understand now, but that's changing. That's becoming less and less. And the reason is because the AI is getting smarter. So the whole idea behind a large language model is that you don't just upload a static set of data, but it continues to consume the data, evaluate the output, and use the human feedback loop in order to improve. Right. So in order to do that, it needs to be consuming more and more data. Part of that comes through our interactions, and part of that comes from, you know, the rest of the web and other external repositories. So to understand, the human must eat more of the human words. I was waiting for words. I was holding on there. Well, yeah. But then once the AI assimilates enough of the output, it can start doing things for us and do them right because it understands us better than we understand ourselves, right? I don't know about that. Okay. I'd say that's... I've seen this movie! ...I'd say that's way on the horizon. Yeah. Oh, we're going to talk about some movies in a little bit. Don't worry about it. So I'd say that the AI understanding us and being able to act on our behalf is still pretty much on the horizon, and being able to do things on our behalf again, I don't see it right away, but it would appear that apparently it's closer than that. The next frontier for AI is an agentic model where the AI can take an action on your behalf, like a travel agent would book a flight for you. We're working on capabilities for the AI to go out into the world and do things on your behalf, not just give you information, not just respond to you with text, but actually take actions for you on your behalf. Okay, so the root of this word is agent, right? Right. And in legalese, an agent is someone that can do things on your behalf. So if you are sending this agentic AI model out there on your behalf, anything that transpires from that, you could possibly be legally responsible. Liable? Yes, you can be very liable. So that is interesting because that's one more thing you have to worry about. We always have to worry about ourselves. Now we have to worry about our agentic AI doing things on our behalf, hopefully the right way. Yeah. Hopefully doing what we expect them to do and not go haywire. And you know, given what we've heard on this show about hallucinations, about the complexity of making sure the AI understands what we want, and the mechanisms needed to make sure that the AI is actually performing the action as desired, I don't think I'm quite ready for that agentic AI to start doing things for me. But it turns out the technology in agentic AI would use to do some of these tasks is already out there and in use all over the web. Agentic AI relies on APIs because the systems that you want to influence, or the systems where somebody wants to take an action, they require a command to be sent, and that command is sent through APIs. So a really simple example might be like booking a flight. If the AI understands what kinds of flight routes I like, what time of day I like to leave, how many bags I'm taking, what my relative budget is... It can go out and book a flight, but the way that it's going to do that is it's going to interface with the consolidated flight booking system that the whole travel industry uses. Yay! We can cross off APIs off the bingo card. There it is. Yeah, the agentic AIs, they're going to work through APIs which have set processes. Right. You can check out our episode on APIs with Jamie if you need more of a refresher on how they work and how important they are to the internet and how things work over the internet. Now, rather than having to, say, converse with a customer service representative on the telephone to book that flight, Jamie told us that the AI would make a request through an API call and then complete the action on the other side of that transaction. And for some tasks, that API availability puts the goal of agentic AI within reach. You can see how the framework of an API really becomes critical for AI systems to be able to operate. I mean, the whole point of AI systems is that they are taking queries and requests from real human users, other external applications. They're analyzing those requests, and they're turning around an output, so that the entire interface for AIs is driven by APIs. There's no AI without APIs. You need to interact with the AI model. You're going to send a query into a user interface, and that user interface turns it into an API call, and the response comes back over an API call. Exactly. I mean, you just ask. Didn't have to teach it. It's just doing the thing that's already established, and the process is already set up. So it's not as complex, right? Right. Didn't have to teach it. Easy peasy. What could go wrong? Hmm... I'm guessing that there's... yeah, I'm guessing that there's a lot of things that could go wrong here. It's just technology, right? Nothing ever goes wrong. Obviously, AI is not perfect. We already know that. And I think we are also imperfect vessels for directing AI. Right. Because I need to be able to not only understand my own preferences and my own trade-offs, I need to be able to articulate that for some other system or some other agent to be. And, you know, humans are not the best at defining what we actually mean. We struggle with language. We struggle to articulate exactly what we mean. That's very true. Like, we think we're communicating in such a way that it is crystal clear, and we find out that someone or something has received it partway, halfway. And the AI is a reflection of ourselves, right? Isn't that something! And it appears that this technology is no different. Yeah, it's like in the early episodes we were talking about how there are context clues in communications from human to human that an AI just can't capture. And even if it captures it, it can't quantify it or respond to it in such a way because it just doesn't understand it. Yeah. So we struggle to say what we mean. I'm gonna say it again because it's really true. And I mean, I'm a writer, and that's still very much true for me in my writing. Remember what I said earlier when the computer will do exactly what you ask it to do down to the letter? If you're not clear about your intentions, you might not get the outcome that you expect. So when I... a really simple example for flights would be just go find me the cheapest flight. Well, if the cheapest flight only saves me $1, but it leaves at five in the morning versus nine in the morning, and it has two connections versus zero connections, I would obviously prefer to spend that extra $1, sleep in, and have a direct flight, right? But I don't necessarily have the language to tell the AI that all at once, all perfectly from the first try. This is where talking to a person or doing it yourself still holds some advantages, right? I mean, you don't necessarily know all your preferences that you have to elaborate on to the AI to have all of that be clear to it. And, you know, having to add each preference in each iteration of your prompt would probably take a lot longer than just going to the website or picking up the phone and talking to a person... for now anyway. He could have just done it himself. I mean, offloading these types of tasks are definite time savers. But when the result is something that you would have never anticipated, like a 5 a.m. flight with three connections, doesn't that bring you a kind of pause, or does it make you more determined to make sure that your agentic AI is who you need it to be? We have to take some of the responsibility for this. Are we willing to make it work if we are going to be invested? And we might get there, you know. Right? Like it... right now it doesn't seem like it might be worth the effort. But in the future, it might be, right? We don't know. We don't know. Right. That's true. So something as personal as travel booking might take a while for agentic AI to get there. Even with the well-established processes and entry points, like an API for a travel site. But Jamie is seeing its benefits now, even as the more complex stuff isn't quite there yet. So I think what we'll see in practice is that we'll tiptoe into agentic AI, you know, little by little, starting with very small tasks where if the AI gets it wrong, the stakes are very low. You know, so something like setting a calendar appointment where you can see the results on your calendar and you can tell very quickly if there was an error or a mistake. And then it's a training opportunity for the AI. And for me as the user of that AI, to say, oh wait, I forgot to designate the time zone. And that's why this meeting got booked for the wrong time. More training is needed to get to where we want to go. You got to dip your toe in. Before taking the plunge. Yeah. We know that the tech industry, that's how things work, right? We never... yeah, move fast. No, not at all. No. Well, I'm going to keep advocating for taking that extra time because when we return, we're going to explore what could go wrong when you give a computer a task without thinking through what the outcomes could be. What could go wrong? Okay. Shifting gears a little bit. In the first section, we talked about the possibility of AI agents doing stuff for us. For this segment, we're diving deeper into the "But what could go wrong?" side of that conversation. We're going to bring back Christopher Nuland from our AI 101 episode. He's the technical marketing manager for artificial intelligence here at Red Hat. And you may remember that he led that job in part because of an AI model he built to play Double Dragon. And just to be clear here, this wasn't a generative AI model. This was one built on reinforcement learning. That means it's given a task, and it tries out every different possible path to see which one or which method best accomplishes that task. Before starting this project with Double Dragon, he had joined a team that was trying to do the same thing, but for Pokémon, and for various technical reasons, he chose to leave that project and find another game to have his AI model play. And I thought Double Dragon was a great candidate because A. it's just a fun game. It's cool. People played it in arcades. They played it on their gaming systems. And with it, I thought that there would be... with that linear nature, it would be one thing I could solve very quickly. And in all actuality, I think I had just as much complexity as the Pokémon group. And I learned a lot of stuff about AI through the process. All right, Kim, could you tell us a little bit about Double Dragon? What kind of game is it? What does he mean by this linear gameplay? What's the deal here? Yes. Jimmy and Billy. Double Dragon was a very popular arcade game. It is a side-scrolling beat 'em up. So, you go to one side of the screen to the other, and you don't go backwards. And you fight a bunch of enemies until you get to usually a boss at the end of a level. And then once you beat that boss, you go in and rinse and repeat. It's very linear. It's 2D. There's not a lot of... well, there is some kind of movement, but it's all very kind of linear. Think about it from... it's like a person moving from room to room in a straight line and never going backwards. What I think Christopher was talking about with Pokémon and the kind of difficulties with Pokémon using this to play through Pokémon is that Pokémon is an RPG. So there's a lot of different branches where you can go, and then there's also a lot of backtracking. So you go and then you go back to places you've been in the game before. The game design doesn't really lend itself very well for an AI playing it because there's just so many possibilities. Whereas Double Dragon is a little bit more straightforward. So to get the AI player integrated into the game and then to track its performance, Christopher had to do a little bit of hacking. He wanted to record the points gained to track the performance, right? He wanted to look at the health he lost, the distance traveled, and in addition to all that, he also had to create the access point that the AI would need to actually play the game. Right, because he didn't have a robot actually pressing the buttons and all that. Correct. So he did all that work. He built his model, his reinforcement learning model, and it worked. The AI could play the game and it learned how to get the most points. My challenge to begin with was that the game was very good at fighting the non-playing characters and would always get to the end of the level, and even early on it got onto the level, so I was really excited. This even at the beginning, like, man, this is working great. But I noticed that it would never finish the level. It had a lot of issues with the boss. That's not necessarily completely out of the ordinary, right? I mean, a boss is usually a big spike in difficulty, and it might just be that the model was having trouble finding the optimal method that needed to beat the boss. Right? That sounds reasonable, right? Kim? Angela? Yeah. Yeah, that's definitely a Kim question. I've never played Double Dragon. What? Oh my gosh. Oh, such a fun game. The first couple of levels... it's kind of designed... what we call Double Dragon, these types of games, a quarter-muncher. It's designed to take as much money from you as possible. So, you'll see extreme spikes in levels of difficulty when you get to around, like, level three or level four. All of a sudden, the enemies hit harder, the bosses are harder. The environments are more hard to traverse; there are more booby traps and things. So, you know, it's designed to take a lot of money from children. So, the very design of the game itself has a very large spike in difficulty very quickly. Yeah. So the spike of difficulty goes up, you lose a life, you put in more quarters to continue playing. That's the idea, got it. But something seemed off, even with those considerations in mind. So Christopher put on his detective's hat and he got back to work. What I found by recording these sessions... So, the library that I did, I can take that data. I can then ingest it back into the emulator that's running the game, and I can basically have it replay that session, or I can even have it record the session in a .mp4 file. And if you go and actually look at the video, there's even times where I can do like a grid version where I can see all the sessions running together in a big grid, which is also helpful to see in a single training cycle what the AI is trying to attempt. And through that, I was able to see that it was basically giving up on the boss. It wouldn't fight the boss; it would actually kill itself at the end of the level before it got to the boss so that it could repeat the level and maximize its points. A computer will do exactly what you tell it to do. Makes sense. We're seeing that here. Christopher had instructed his AI to maximize the number of points, and the model learned that it could get more points by starting a level over and racking up more points through the level rather than defeating the boss and moving on to the next level. It's an artificial grind, basically. Exactly. Yeah. Just farming those points. Yeah. It was funny. He put on his detective's hat. Yeah, he didn't put on his Red Hat? Come on. It was right there, you know. I missed it. You got me. And he had the whole, like, Matrix screens or Minority Report screens with everything. Yeah, yeah. Mind you, when he was doing this, he didn't work at Red Hat yet. So this is what helped him get the job. That's fair. Now, of course, having the AI kill itself over and over again is not what Christopher wanted it to do. But the AI couldn't know that what it was doing went against the norms and rules of gaming that we kind of take for granted. We then talked about the famous thought experiment called the Paperclip Maximizer. Now, a few of you might have heard of this before, but we're going to go. We talk about this in Command Line Heroes and the season we did on robotics. That's right. We did. Yeah. So, I mean, again, here, the idea is that a computer will tell you exactly what you tell it to until you tell it to stop. If you don't put enough guidelines and guardrails and ways to redirect it, if it starts doing things that you don't want it to, then, you know, bad things can end up happening. All right. So, back to Christopher and his Double Dragon bot. An emphasis on gaining points led to the bot playing the game "wrong." And this is the same type of situation that we ran into with Double Dragon. I overemphasized the points, and it felt like it needed to optimize points. Ultimately, what it needed to optimize was actually new experiences. I think that's so interesting. Yeah. The emphasis on new experiences, because that's kind of a foundational aspect of game design. For exploration and for discovery, for a human player, in order for them to progress in the game, like you're encouraged in some of these games to explore in order to progress. And it's, you know, that's... I think, especially considering Double Dragon is a very linear game, it's interesting that he introduces that here. Yeah. I mean, part of the fun of games like Double Dragon is, you know, you beat the level and then you go to the next one and it's new enemies. It's a new background. It's a new setting. And you know, you got to find different ways to progress. Maybe some of the techniques you used earlier on don't work as well. So you need to figure out a different way to play. And that, that's what gives it the challenge. Right? But if you just want points, you might as well just grind over and over and over. And the AI is not going to get bored of doing that because it's just a computer. Yeah. Just fight Abobo a million times and you'll get as many points as you want. And that's not why we, you know, we the collective we, you don't really play for points. You play to get further and further and further. At least that's my... Some people and some games do play for points, but those are very... yeah, those are very different games. But in most games like this, you want to get to the end. You want to get... that's the crux of it. And AI doesn't know that, you know, we're these go-getters like we just want to keep going forward. And it's like, but you said... This is what you told me to do. So this is what I'm doing, right? So instead of chasing points, the bot was then instructed to chase new experiences or progression through the levels. And that changed everything. What was actually optimal was actually having it explore and being able to go further into the level. There are times where it could even potentially ignore fights or have suboptimal ways of handling those fights. But because the AI was ultimately trying to explore more, it led to the most optimal way of actually going through the levels. So instead of grinding the points in a loop on the first level over and over and over again, the AI started to make its way through the game, just like Christopher had intended, and the reinforcement learning led to some other pretty cool developments too. And so once I adjusted it to not take points as high of consideration, I then was able to beat that first level, beat subsequent levels. It was really cool because the AI eventually learned that doing this kind of backwards elbow technique was the most optimal way. I was actually talking to a speedrunner who said they actually do that for speedruns. So the AI actually, over time, was able to figure out the most optimal way for those first couple levels that even speedrunners have learned through their own trial and error with the game. Yeah. That's fantastic. Okay, there. There's my question. Yeah. What is a speedrunner? Oh, a speedrunner is a person. Typically, you find a lot of these videos on YouTube where a person takes a game and they try to complete it in the fastest time possible. You know, a lot of old school retro games like Double Dragon, they're done usually through emulation because obviously they're not at an arcade, they're not playing an arcade cabinet. These games are kind of downloaded and modulated. So people are playing them through emulation. But, you know, all kinds of different games. They're just uploading videos of people just doing the fastest runs that they can. Typically, these people have to employ certain techniques in order to finish the game quickly. Okay. Yeah. And you'll see some of these videos. They'll take a game that's supposed to take hours or tens of hours to complete, and they'll do it in a few minutes. Yep. Yeah, it's kind of ridiculous, but it's really fun. It's fun to watch. I think it's something that also what he just said reminded me of is when DeepMind, I believe it was, started playing Go a few years ago, right? It started doing really well against Go masters and it started playing these strategies in Go that no one had ever seen before. They were wondering what it was doing. Then through the course of the gameplay, they realized that it was coming up with new strategies that were super efficient, that no one had ever seen before and were practically unbeatable. So this method of learning can really come up with other things that have been discovered to be really, really good, really efficient, really optimal, and things that people haven't even thought of yet. So the possibilities are kind of endless here. Yeah. It also, because, and going back to the Double Dragon example, there are some enemies in that game that you're not supposed to fight or that if you fight them, they'll just keep spawning over and over and over. It's a never-ending spawn, so it's designed to trap you. Remember the quarter muncher thing? It's designed to trap you and get your health down so that you'll die faster, and then you'll have to restart and put another quarter in. But the AI learned here that those traps were traps and it avoided them. It solved those problems and met those challenges without fighting. So it preserved its health and was able to complete the game in record time. All right. So what does this have to do with agentic AI? Well, everything really. I mean, a Double Dragon bot letting itself die at a boss in a game isn't going to really cause any problems. It's not going to hurt anyone. But it does show us how AI, if not given the proper instructions or the proper understanding of our wants and needs, can very quickly do things that we wouldn't approve of. There are other pop culture examples that take this to its logical conclusion, right? I mean, you've got Skynet and Terminator, right? Or you've got the iRobot movie. So before we give AI the keys to the internet or to the world at large, let's just make sure that they know and value the same things that we do. Okay, so, Angela, Kim, what do you think about all these ideas? Agentic AI, AIs gone rogue. Are any of you hiring an AI system anytime soon? I'm definitely fascinated with the idea of giving an agentic AI something with very low stakes to kind of test the waters with. I think that's a really interesting concept. I think it's probably the way to go. Obviously, agentic AI is like, there's a whole realm of possibilities that can be dug into there. But I feel like there's going to definitely even have to be some testing done and that... yeah. And that testing can't be at some point you have to kind of venture forth. I don't know if that testing can be limited to like a controlled environment. I think that we need to kind of introduce it, dip our toe in, and kind of introduce it with very low stakes involved in order to kind of work out the kinks and really get it to a place where we can trust it with the bigger things. You said it. It has to start somewhere. And it's usually something that very low stakes. Nothing bad will happen if things go awry or there's no money involved. There's no chance of someone getting hurt or doing something that you could again, you know, be liable for? I'm willing to give it a try. Yeah, there's nothing stopping us from at least seeing what it has to offer, how it can make even the smallest things simpler for us. I'm also curious to see how close it gets to a response or something that I myself would do. It makes it more game-like. How close can it get? How good can it get? How far can we go? So that's the fun side of it for me, and just teaching it and being that consummate student always trying to learn to help it perform better. I'm kind of looking forward to it. I'm definitely going to give it a try. I've heard it too many times, just in the past month for me not to dip my toe in, as kids say. Yeah. Yeah. What really stuck out with me for this episode, I think it's pretty clear at this point, but it's the idea that we really need to be careful with how we ask AI to do things and making sure that we're very precise and that there's no room for misunderstanding. As a bit of a parting thought, Christopher pointed out that it's really not just robots who need to rethink their incentive structures. And this is applicable to our human nature, where a lot of times, what's the most optimal thing for us is that sense of exploration. It's growing. It's learning new things. It's discovering new things. A lot of times we get hung up on optimizing points or optimizing money or optimizing, you know, you name it. In reality, in the long term, what's best for us is more of the journey. It's more of the exploration side. He said it. It's all about the journey, not about the destination. That's interesting, even for AI, like it's very much like us in that way. So I enjoyed this episode. What about you? I loved it. Of course you did, because this is the AI series and we're so glad you've come along on this AI ride with us. You have to tell us what you thought of this episode. How are you using agentic AI? Have you seen some use cases where it could really make a difference in people's lives? You got to tell us. Hit us up on our socials at Red Hat, always using the #compilerpodcast. Tell us more, and we would love to hear it. And that does it for this episode of Compiler. This episode was written by Johan Philippine. Victoria Lawton is keeping her eye on those rogue AI agents. Thank you to our guests, Jamie Beckland and Christopher Nuland. Compiler is produced by the team at Red Hat with technical support from Dialect. Our theme song is composed by Mary Ancheta. If you like today's episode, please follow the show. Write the show. Leave us a little note. Share it with someone you know. It really, really helps us out. Maybe get your agentic AI to write the review for you. I mean... Oh no, no, no, no, no. Thank you so much for listening, everybody. Until next time. Bye.