Generative Games with Language Models: Hilary Mason
Hilary Mason and Jon Radoff talk about generative AI and LLMs in Games
The new season of Building the Metaverse has arrived—where I talk entrepreneur-to-entrepreneur with the creators, leaders and builders who are shaping the future through generative AI.
I hope you enjoy the first episode, where I spoke with Hilary Mason about how language models can be used to create whole new forms of play.
Hilary is CEO of Hidden Door, a gaming startup that is using language models to create new types of play. Previously, Hilary was head of machine learning for Cloudera; the founder of independent research lab Fast Forward Labs; Chief Scientist of Bitly; and was a professor of computer science at Johnson & Wales.
We discussed how language models may change interactive experiences like games, finding new forms of playfulness, details of the enabling technology, emergent behavior—and the very nature of intelligence and consciousness itself.
If you enjoyed the conversation, please subscribe to the YouTube channel—and while you’re at it, consider following me on LinkedIn or Twitter to get updated about the next episodes. I expect to release 50 across the course of the next year!
Show Notes
Hilary’s company: Hidden Door
Jon’s company: Beamable
Sudowrite (discussed at 06:53)
Playing Smart: On Games, Intelligence, and Artificial Intelligence (discussed at 33:05)
Hilary’s book, Data Driven: Creating a Data CultureData Driven: Creating a Data Culture (discussed at 34:33)
Playing Atari with Deep Reinforcement Learning, paper discussed at 37:53
Reinforcement Learning from Human Feedback (RLHF), discussed at 37:55
The DAN jailbreak (discussed at 43:08)
Diplomacy and Meta AI’s CICERO (discussed at 43:48)
Before we began, I asked ChatGPT to predict what we’d talk about. I open the envelope containing its predictions towards the end; here is what it had:
Further Reading
Some additional articles by Jon that you may find relevant to the discussion:
Complete Transcript
The following transcript was automatically generated by Riverside. The technology works well, but there are probably a few mistakes.
Jon Radoff:
I'm with Hilary Mason, the CEO of Hidden Door. Previously, Hilary was a professor of computer science, the chief scientist of Bitly, and the head of machine learning at Cloudera. At Hidden Door, Hilary is building games that use generative AI. Hilary, welcome.
Hilary Mason:
Thank you! It's really delightful to be here this morning.
Jon Radoff:
Hillary, I really want to talk about the way you're using generative AI in gaming. Most people are currently thinking about it as part of the production process, like to make art, make content…actually using AI in the game itself. So can you elaborate on that and just sort of maybe zoom out the camera lens a little bit? What did you set out to do with Hidden Door?
Hilary Mason:
Absolutely. So at Hidden Door, what we're trying to do is create the ability to take any work of fiction, whether it's a movie, a TV show, a novel that you fall in love with, and to let you play role play in that world. And so what we're trying to create as a product experience is that ability that you fall in love with something, and then we take all of the joys you might find from like a tabletop ability to role play immediately in this sort of intersection of like fanfic and RPG energy in that world. And for that we have an AI narrator who's really in like tabletop parlance, like our dungeon master. And so we're really trying to create an experience that a lot of people are creating for themselves in different ways already. So obviously like tabletop is amazing. It's been really influential for me. There are also lots of communities of folks who write fanfic or who find other ways to sort of role play or imagine their own stories in those worlds. And so what we're doing with Hidden Door is using technology to make this experience something that has almost no friction to access. So, if you are a long time table top player, you have to like have your friends them all together. Somebody has to do the work to like plan the adventure. Somebody has to learn the rules well enough. If not everybody, like there's a lot of work involved in that play. We're trying to remove all that work, make a very accessible experience that still brings you a lot of those same collaborative, co-creative joys. And our players don't need to care at all that there's like an AI system behind the scenes. It's just the thing that to exist at all in this moment in time.
Jon Radoff:
I really want to drill into the technology a little bit more, but let's spend a little bit more time on the games and the game ideas. So you talked about the ability to overcome some of these scheduling kind of problems with getting your Dungeons and Dragons group together, for example. What's
Hilary Mason:
Mm-hmm.
Jon Radoff:
the AI going to bring to the experience that is unique and new for people to now experience a kind of game that they haven't done before?
Hilary Mason:
Yeah, I mean, I feel like I need to introduce this by saying we have a few principles of how we use AI that I think are important to say out loud. One of them is that I do not believe machine learning systems are themselves creative. So they are really great at understanding sort of the world's knowledge and representing it to us, showing us spectrums of possibility, predicting what is likely to happen, but they are not creative in the way that I might be. And when you think it, listening is, right? And like, when you think about what makes the joy of playing a game with your friends, it's not that the story is like the world's greatest novel story, right? Like the story of your game is usually frankly, something only you and your friends care about. It's the creative improvisational energy that you have together that's funny, where you're also bouncing against the rules and sort of the laws of physics of the world GM is like a partner in sort of pushing back on what you're doing, setting up the challenges and collaborating with you in that story to go forward. And so the role we see here for the technology is setting out some of those rails and enforcing it and routing stories back around and surprising you. It is creating the space in which you play. So creating the world, writing it out, text and art dynamically together, sort of like a graphic novel or like a live webcomic as you play, giving you that ability to do anything, have the world push back and respond to you in a way that makes sense, and then progress the story forward in a way that feels fun. And so I'm going to say that our principles are such that the system exists as a facilitator really it is the players together sort of bouncing off each other and bouncing off the system that create that that sort of fun thing and those memories together that you care about or that like, yeah, I don't know what your play style is. But you might see that like, like, maybe I see that like, oh, there's some sort of like bad guy here. And like, you're gonna gear up to attack it. And I could either decide like, oh, I'm gonna help you, it. And so what the system does is sort of take those actions, those intentions, and integrate them into a whole in a way that ends up colliding with that sort of delightful surprise.
Jon Radoff:
Yeah. So
Hilary Mason:
Hopefully that makes sense.
Jon Radoff:
what I'm hearing part of the game system is though, is there are also constraints. There's sort of rails to keep it fun. Like games are essentially systems of constraints,
Hilary Mason:
Yes.
Jon Radoff:
right? So I think a lot of people
Hilary Mason:
Absolutely.
Jon Radoff:
have probably tried at this point, like go to chat GPT and go through role-playing scenarios. And it kind of reacts to you and will do a lot of interesting things.
Hilary Mason:
Yeah.
Jon Radoff:
But I wouldn't call it a game. I'd call it like, so how do you bring the structure? To the game that sounds hard
Hilary Mason:
Well, first I have to say I'm very lucky to work with a very good game director, Chris Foster, who has a lot of experience and I've learned a ton from him. So I'm going to try and channel him in answering this, which is that as you say, it is a game. It is not a writing tool. It is not an improv tool. It is not one in which you as the player get to decide like, oh, I didn't like that die roll. I'm going to change what happens. There are a few other tools out there using similar technologies to create those experiences. pseudo right for professional writers. They're great, but they're not games. We're building a game, which means you can lose like you have to try things and you have to fail at them. And one of the design challenges we've thought about is really, what is that structure? How do we embrace the seemingly contradictory problem of you can do anything with a world that pushes back, So sometimes you can sort of direct the story and other times you can. And for that we look to what a really good narrator, really good GM would do. And we all sort of, when we play these games, we have these sort of social storytelling conventions we adhere to that are really helpful for us too. Like you don't split the party. You understand that if you're trying to do something to sort of derail the overall narrative GM is building for you, the GM's gonna push you right back. So we've built similar things into the narrative system to build on that kind of expectation of experience of how the story is gonna go. Even if you decide that the way you're gonna play is to be as provocative and all you're gonna do is put the poop emoji in over and over again and let the system deal with that,
Jon Radoff:
Don't split
Hilary Mason:
constraints.
Jon Radoff:
the party and my face. I don't know if you deal with this or not, but like the griefing player who's gonna be like backstabbing the party. Like there's always the player in the experience
Hilary Mason:
Hehehe
Jon Radoff:
of tabletop RPGs that does that. I'm curious, have you thought about like those social
Hilary Mason:
Mm-hmm.
Jon Radoff:
elements as well?
Hilary Mason:
We do at this point in our development, our, we basically assume you're playing with your actual friends and you have a way to yell at them beyond what's in our system. And you can ultimately, of course, kick people out of your game if they are really, really bullying or griefing.
Jon Radoff:
Hmm.
Hilary Mason:
But otherwise, yes, the system will
Jon Radoff:
cope.
Hilary Mason:
do its best to be like, we had one thing when our team was playing once where there was a dramatic climactic battle and then we had one guy who's just sitting there eating a bowl of spaghetti. the frame that gets generated is like two characters with their swords out like ready to go and then the one guy over here eating his bowl of spaghetti in the corner because he was trolling. And that's fine. Like that that just becomes funny. When you bring it all together, you know. So there's, I would say this is a design problem.
Jon Radoff:
So from design problems to the technology problems, what were the technology problems you needed to cope with in your game?
Hilary Mason:
I mean, we started this work, founded the company three years ago. I've been working with LLMs and text models for quite a bit longer than that. And so, you know, we started very much with an approach of controllability, which is, as you say, if you do go to chat GPT or any of these models, you sort of try to play along with it and say, like, okay, you're the Dungeon Master and we're playing this game and what happens. Like, it kind of works, but it also kind of doesn't. gets things and it introduces things and then things sort of go on the rail, like on the rails, off the rails, like they don't really follow the arc of the story in the same way you want the action paced and all that stuff. So we started very much at the beginning thinking a lot about controllability. And if I bring you back to the premise, in our case of our business model, which is working with authors and IP holders to be able to create games out of their worlds, controllability becomes very important because if you're like an author of a novel and you're gonna trust us to allow people to play their own stories in your world, like there are things you care about, like your characters must behave in the way that you want them to. You want to fix points of what might happen in space and let the system sort of generate into that but not make up its own like major and essentially have built at the technical level something that is a game engine. Like there is an actual database with every character item location. It has stats, like it has a character sheet. Those stats change over time depending on the actions logically. And that is something that even for the things that get generated along the way. So maybe, you know, you say something like, oh, you know, I picked somebody's pocket and you succeed at that and then it's going to generate an item. that you would have pulled out of that pocket, that item would have been like, it's a row in a database now, it exists. And I will say also that alongside that controllability, we can also think about safety. So we can do things like manage a lot of the biases and other problematic content that can otherwise come out of large language models. Not perfectly, We can also, because we don't allow, we allow plain text entry, but it gets interpreted through our system, we can make sure that people are, let's say you put the word Nazi in and it will interpret it as nacho. So like, you know, it will come back with some pretty funny
…allow you to inject that into the game. And that's a decision we've made. And then on the last bit, in pencil and then drawing in ink, which is our system will perhaps imply or hint at aspects of the world, but until players interact with them, they can change to propel the story forward. So you might have something like, oh, you know, I pick up a piece of fruit and it'll be like, oh, you pick up an apple. There's an apple in your inventory. And then maybe, and this is a bad example, because I am not a good, you know, off the top of my head game narrator. Maybe, you know, some wizard appears and it was like, Hey, I need a green thing Well, you have an apple. We don't know what color it is, but we can set that color if you're like, oh, I look for a green thing in my backpack. We'd be like, oh, you have an apple. There's a probability that's green. Let's make it green. And now it'll be green forever. We've set it in ink. You've looked at it. It's set in the database, which is an actual Postgres database. There's no NFT bullshit or anything like that. And now you can play with that. And so that's another aspect of our game engine. is being able to use these data structures, which are very common and in our case even very simple, alongside language together as something that we can then operate on. That's
Jon Radoff:
The idea of being able to put this in the hands of IP holders or work with them to create their worlds is really interesting to me because I've worked with some pretty big IP myself in the past. So I remember pitching George R.R. Martin on the Game of Thrones game we built, and this was at the dawn of social games.
Hilary Mason:
Hehehe
Jon Radoff:
And I told him, you know, this is actually an anti-social game because everyone kills each other in Game of Thrones. And when I first demoed the game to him, he was like, you know, people aren't dying enough and like increase the death level. And then that was sort of very opposite to like Star Trek when I worked with that, where there's violence from time to time in Star Trek, but it's actually not what the universe is really about. It's about optimism and exploration and a lot of interesting problem solving and engineering. And we really had to work hard to bring those elements back into the narrative so that it didn't always just devolve into like phaser battle. So seems like it could be a really challenging thing to get the language model to kind of surface the those principle themes of a word.
Hilary Mason:
Well, we think like, yes, absolutely. And also those are amazing stories and I would love to play in those worlds. But we think that this is like part of bringing a world in is being able to have a way to define sort of the nature of it. And we use a bunch of shortcuts and tools for that because the goal is to make this a fairly like short process. But that means that when you come in okay, here's a new world, you set a mixture. Right now we use sub genres, which are essentially like clusters of stories we've built models on, but you might be like, okay, this world is like 30% comedy, 50% like high fantasy with a little bit of like modern drama or like Regency romance in there. And what that does for us is give us a starting place for these laws of physics, like how much murder, when there is murder, Seriously, like how much, and it's not just like, you know, action oriented versus not, but like what are the narrative arcs that our system will propose? Is it more of a hero's journey sort of thing? Is it more relationship based? And is there like more of a dramatic relationship based narrative that should be the core of the kinds of stories that come out here? When characters die or like, how do we take something like, On the database side, like your character has like slapped me and I've lost one energy point, which is the stat we use on the backend. When that gets expressed in language and art, is this a like Regency style, like, you know, so and so like takes a deep breath and like slaps him across the face? Or is this like a, you know, you know, so and so like takes their like boxers gloves and like goes, goes right. There's a lot that the same. the same game engine expression can be expressed in language in any number of different ways. And so for us then it's setting the weights on how we're going to, what does this world feel like linguistically and visually? How are we going to express it? And we use again this design metaphor of infinite possibilities narrowed down to a few game engine changes to the game state to infinite expression possibilities, sort of create this illusion of the space. And we think a lot about like, like if you're in the Star Wars broader universe somewhere, somebody has to say the force, like every scene or two. And otherwise, like the story can actually be kind of anything at this point. Like any sort of narrative arc, any like it could be about a romance, it could be a mystery, it could be a heist, right? It could be like a sort of straightforward, like there's a bad guy, let's get him. And so it's also distilling out like, what is unique to this world that makes it feel like this world? And how does that work here? And there's a whole bunch of stuff around the language that gets expressed, the kinds of actions that even become possible and how they happen, the kinds of people I was going to say, but like NPCs you're going to meet, they get generated. Like are they human? Are they alien? If you know, there's a whole language around that. in a sci-fi, like is there, you know, faster than light travel? Like which of these tropes do you get for free in this world? And like, what is unique about this world too? Like, can we distill that out? What is the vocabulary that's unique about this world, which you do have to, we can extract from text, but then you have to sort of give it a thumbs up and be like, we want this word used in this context. So yeah, there's actually, you're right on it. the core challenges, but it's also the opportunity to build this has become possible because of the technical sort of step function forward. Because we're not just building one game for one world, but rather a story engine that can accommodate many. And I would say the secret to that is that stories themselves build on tropes and universes build on those things. And so we're able to model those tropes and then give you the tools you need to say like, no, but mine is different in this way. And I really care about this. And when someone dies, and this is something an author actually said to me, like in my world, they bleed out their eyes. Like, how are you gonna do that? It was very different than some of the more like, you know, cartoony violence sort
Jon Radoff:
When you mentioned the percentage of comedy, I couldn't stop thinking about the that robot from interstellar or where they could calculate like what percentage of funny and honesty and things like that it would have.
Hilary Mason:
It's an old trope, right? Like it goes back to Douglas Adams and like Hitchhiker's Guide and Marvin the Depressed Robot. And like, yeah, we have a rich tradition of that trope.
Jon Radoff:
So let's talk about the technology itself a little bit more that enables this. What has changed over the last few years that has enabled the kind of games you're making at Hidden Door?
Hilary Mason:
So I have to give you a little bit of history, as this is gonna be sort of a personal journey into this. Going way, way back, as I said, I've been a DM, I've played tabletop games for a long time, was in English, studied English in CS in undergrad, so long interest in writing and world building and all that stuff. But in 2014, I founded a different company called Fast Forward Labs, and we were an applied machine learning research and prototyping product building company. So we had, it was like a halfway house for misfit academics. We did our own research and we also partnered with our clients to help them build stuff. We published a report in 2014 on natural language generation, not using deep learning, but we were still able, at that time, we built a prototype where we crawled like 60,000 real estate ads in the New York City area, which were in-based, and then you were able to set the structured data of an ad, a 14 bedroom, one bathroom apartment with laundry, and it would generate the text for you. So it would do like, oh, this sun-filled, cozy space will be your new home. And so I've had a long interest in this technical capability and have worked with it. And indeed at Fast Forward, we went on to do a lot of research into extractive summarization, abstractive summarization, always with an eye towards how we would build products with it. And we did indeed over the years, with it with partners. So with, we did some in banking, we did some in telecom applications that range from customer service to helping very proficient traders understand emerging news that was relevant to their portfolios. So they could more quickly perhaps make a decision about updating their trading strategies. You will see throughout this, this principle that again, like we're not trying to replace people, but rather trying to use this information modeling decisions, you know, as they're doing it. It's like this has been a core approach for me. And in building a lot of that stuff, you know, largely with Fortune 500s and such, I realized that actually the weakness of language models is the hallucination and the inability to understand creative setting as a tool for expanding creative play, that actually becomes a huge asset. And so that's one of the technical realizations is that the ability to say like, okay, I'm gonna give you like a summary of a plot so far, what might happen next? And to be able to say like, what is the most likely thing, the least likely thing, give me the full range of like encoded possibilities and let me choose, we have another algorithm tuning the probability. This is something we think about in our system, by the way, is how much of what happens should be the obvious thing that should always happen next, and how much of what happens needs to be surprising.
Because if we only do the former, the system isn't dumb, but it is very boring. And if we only do the latter, the system is dumb because it's just doing random stuff. And as a person, you're like, oh, this story makes no sense. I'm not into it. So it's like tuning that alongside people's expectations. Anyway, that was some of the core, that was my technical experience building real deployed production systems around this stuff and thinking about, you know, what is this? Like, essentially, I love living in this space where we have one of these technical capabilities and we have really yet to invent the products and the business models around what becomes possible or economically feasible now because we have it. And that's where we are. And with Hidden Door, this was essentially the technology finally catching up to the kind of gaming experience I wanna see exist in the world and that I think is largely, like it feels inevitable to me that we will have these systems as a way to play. And we're sort of taking one shot at what that looks like from a technical level, a lot of our own LLMs and our own models.
Jon Radoff:
This is, yeah, this is not GPT.
Hilary Mason:
Though I have to say, like, it is, we benchmark against all that stuff. And I am so thrilled, like, as someone who founded a company around sort of open new models of machine learning research in 2014, now to see the community created models and you know data sets where we actually do have permission to train on it like I find it incredibly like almost heartwarming it's really nice to see that stuff and we build on that too like we fine-tune GPTJ and like all that stuff so I have to give credit
Jon Radoff:
So there at the intersection of the technology and the economic feasibility that you brought up, there's been this trend, at least at places like OpenAI, towards bigger and bigger models. Although we don't actually know anything about what's going on inside GPT-4, I have some thoughts that actually a lot of it came from hyperparameter tuning more so than just adding more and more parameters. But anyway, it's a very, very large model. My understanding is with your technology, you're not going in that direction of bigger and bigger and bigger. So can you talk about that? And also what's the economic impact of these bigger models and the ability for someone like you and your company at Hidden Door to be able to use language models?
Hilary Mason:
Yeah, so I'm going to divide that into two questions. So first, you're right. What we do is take more of an ensemble approach where we will use a model for a specific thing. And that is what it does. And it's something, it's a much smaller model, which is purpose built for that thing. And that thing might be something like, books and all this stuff. Or it might be something like let's figure out the NPC that you're going to encounter given this setup. And we make separate calls to all these things. And we also use a systems metaphor that is rather than sort of like unstructured data into a model and unstructured have a database again, and then we take that alongside the text and use that as the place we're generating from. And I do think that as an old ML person, one of the capabilities we've lost focus on is that actually these models are fantastic at going from structured data or some information we already understand to transforming it in a meaningful way. So in our case, it's taking basically our game state a change, a delta in that database, like I got hurt. And then in the context of the world I'm in, it is going to express that in language and art for us. But we still have that structured data at all points in the story. So we have controllability, we have memory, we have like an actual game engine, we can do physics simulations if we want. Like it is a somewhat different approach. The other consideration that I think is equally important is actually one of UX. And in our case, it's our game designers in the future, sort of narrative designers and folks' ability to manipulate aspects of the system without needing a machine learning engineer. So how does a game designer say, like, oh, this story needs, like, more content of this sort of trope and less of that one. Like, we need to give them a dial they can tune. And that means the system has to be interpretable as much as possible. Like, so what we do instead is this ensemble of methods. an old machine learning person, not everything is deep learning. That's super expensive. I don't want to light GPUs on fire. So we do a lot of pre-generation of stuff, do CPU style ranking of stuff, and then try to make it so that at every point in a story, we understand where did this come from? Why did it happen? What in our game data, like our engine data, engineers or machine learning engineers. So like designers can get in there and play with these tools and their role becomes not like, I'm gonna write the bits of dialogue that are gonna come out, but rather I'm gonna like puppet master the ensemble of systems till I get the experience I want.
And that's, you know, something we're doing because for us, like our goal is to create an amazing game experience. It is not to create the world's biggest, you know, fictional language model. We might do that at some point in the future as a side effect, but that's not the primary thing. I also think that even in the creation of ChatGPT over GPT-3, GPT-3 had been out for two years. ChatGPT was primarily a UX improvement over the GPT-3 model, but it set off an incredible amount of creativity and people building on it, because suddenly you had a UX where you could interact with it. Like we've gone the other way and tried to build more of a functional UX for building, curating and puppet mastering these stories. So it just leads to both very different technical design and somewhat different UX design when you think about it in that sense. And also I should say that I believe OpenAI has stated in public, like their goal is to create AGI, so like actually intelligent, autonomous intelligences. we have zero interest in AGI, so that may also lead to some of our differing
Jon Radoff:
I'm not sure I know how to define what AGI means. There was this paper that came out recently from Microsoft actually saying that they detected sparks of AGI in chat GPT. I mean, what's your thoughts on this whole subject? Like what is intelligence anyway? You mentioned earlier, you didn't think these models were quote unquote creative. I find that another kind of, I use the word all the time but I find it also problematic.
Hilary Mason:
Mm hmm. I think we're like to be philosophical for a moment. I think we're at a moment where we're collectively realizing that we don't know what intelligence really is and that for the long history of AI, we've had, you know, this this Turing test that we've held up is like, cool. Like once we do that, like we solved AI. But it turns out actually we've kind of done that. And actually, we've kind of done it before. Like even before LLMs, like there was a Turing test competition where somebody like fooled the judges by pretending to be frankly an ESL, like English second language speaker and like a kid. You know, so this says more to me about what we put on, like intelligence is a very heavy word that carries a lot and is not very precise. is that we can rethink a lot of Chomskyan philosophy of language and intelligence and symbolism given the fact that we have a thing that can do language incredibly fluently. And it is an incredible technical achievement and it is going to be deeply impactful. I'm a huge optimist for this stuff. I'm also somewhat of a pragmatist and I don't think that the ability to intelligence, but also I think we have lost any consensus of what intelligence even is.
And actually I was reading, I don't know if you know, Julian Togelius, he had this wonderful blog post yesterday sort of poking at this question saying like, how come we're not afraid Elden Ring is going to come to life and destroy the world as a parallel, like setting it up. And so I'd encourage folks to go read it because he just said it very boldly and beautifully. But it is really interesting to think that like, something that I think we all took for granted, which is that we as humans know what intelligence is, has been questioned now. And also this thing we took for granted in our fields that the Turing test was meaningful, is also now in question. And there's a lot of pretty viable discussion on all sides. So yeah, I think it's a very exciting moment for philosophers, for computer scientists, ethicists for everybody. What I do worry about is that the focus on things like AGI and AGI risk is taking attention away from the focus on ways that these models may be used to harm people or maybe inflicting economic harm, social harms, which is not, it's also not a new thing, but it is a thing we are potentially going to see at much broader scale as the practical and the economic value become irresistible. And so to give a more concrete way of thinking about this, and also I co-authored a book with DJ Patil and Mike Luchidis on data and ethics some years ago. And this was really thinking about like, okay, if you're going to deploy automated systems, machine learning models, even any sort of statistical analysis and use it to make a decision, like these systems have the potential to have bias. And what they do is they scale that bias So if we think about humans with bias, you're still rate limited to like, like if you think about human DMV employees with bias, you're fairly rate limited in where that bias is because a person can only have so many interactions in a day. When you scale that in an operationalized and automated system now though, you have the ability to take that bias and deeply magnify it. And by the way, these models often magnify the bias in the underlying data because of the nature of the mathematics. So I think a lot of the focus risk is taking attention away from a lot of the harms we may see there that are frankly like way more real and way more likely. I could rant about this at great length so I will stop.
Jon Radoff:
…is also how children are going to interact with these systems because they're going to become pervasive in society. So of course, kids are going to use it just like kids use Google search right now and encounter whatever. Now your games, as I understand it, you want children to be able to play your games. Is that? Yeah.
Hilary Mason:
We have architected it to be safe for kids as young as nine. And that means following the appropriate regulatory frameworks, collecting no PII, making sure that we have control and can set safety levels appropriately. So yes.
Jon Radoff:
So back to sort of this continuum of risk and intelligence and consciousness and all of this stuff, at the core of that is this idea of emergent behavior, which I think is also something that game developers, game designers are really familiar with. So for example, you have a lot of games where you build a certain kind of game, but the players discover a kind of gameplay on top of that, which is emergent, especially in
…Right? So as soon as you have humans interacting with other humans in an environment, you get a whole lot of behaviors that were very unpredictable. It seems like there's this parallel thing happening in the language models where they start with kind of simplistic behaviors. And the bigger they get, the more emergent behaviors, the more hallucinatory they get the feature that you found, not a bug for your use cases. games and also how do you design with that in mind because we're adding a whole new level of potential emergent gameplay by injecting quote unquote, I'll just say quote unquote intelligent systems into them.
Hilary Mason:
This is a super interesting topic and it actually makes me think about a lot of the work going on, like putting LLMs aside and language models aside, like a lot of the interesting work in reinforcement learning. And so reinforcement like games have been the primary way we've explored reinforcement learning research now going on like, I don't know, decades at this point. But you might remember like DeepMind writing papers on playing Atari games and that's because very simplistically, you have at any moment in time, you have a finite range of decisions you can make, and then at some point you get a score. So like, you know if you made good ones or bad ones. And that's the inputs you need to that system. And I know also there's a lot of energy in the AI gaming startup community around doing things like using reinforcement learning for like natural NPC behavior, because frankly, like, if you took something like on modern American internet plus like as much international stuff as we could throw in there plus some books, right? So like, what is that going to do to like being in like, say a game engine environment, like, and yes, it's like the symbolic or the ability it has to sort of like say things that make sense and can be interpreted as like, oh, I go left or I go right or like that actually is tremendously powerful and useful. And people are starting to sort of plug it in. But it's still not like it's not a model trained of that game environment. Right. And so I do think, and yet reinforcement learning as a technique is also one of the things allowing these models to progress, um, to the state of capability they're at. So I know this is off the top of my head and very high level, but I think there's something really interesting to play with and thinking about, um, as you say, the emergent properties of sort of game design as complex systems, Which by the way, I think game designers frankly have a particular expertise that is underappreciated in the broader AI world. And like, somebody should write that, that paper. It's not me, maybe it's you. Maybe it's someone listening. But that said, like, I don't think we can predict it other than to say that like, certainly interesting stuff will happen. And yeah, like, it's like, and as we think about it, we need to be very mindful of like where these models come from. And so I'll give you like, like one tip, because I do a lot of technical due diligence on AI products of all kinds. And one of the things I do, especially if I'm looking at something like, like a NPC chat application, right? Or like, like there, there are several companies out there who like make characters you can talk to in a game, like as an SDK, or just as an app for various purposes. Like I've looked at one's everything from like mental health care all the way to like, you know, sexy times all the way over to like, you know, filling in traditional like game roles.
I will always try to come up with like, okay, what's this something I can ask this that the things should not know about in the fictional context it exists in. So like ask it, you know, who won the world series in 1988? Should it know about that in its world? And if it does, like it's probably just plugged into GPT-3, And therefore, it's only going to provide one particular kind of experience in that context, if that makes sense. Not sure if I'm saying this clearly. So it's looking at ways, frankly, to poke the model. Or another example, I was talking to a friend of mine who is using a chat, GPT, to analyze music. So she's a brilliant technologist and she's sort of an amateur musician. And she was like, yeah, I keep asking it for facts about a song. And I was like, cool, this is our opportunity to like poke the thing. Let's lie to it. Let's say like give it a song with no melody at all and be like, describe the melody and see what comes out. Like let's try and understand the boundaries of what these models can actually provide. As we introduce them into our, our like fairly complex, messy systems where, as you said, like we already can't predict what humans are going to do. So now we have this additional chaos agent. Yeah. This might have been a trick question. Do you have
Jon Radoff:
think there's an answer yet, but I think that emergent properties are one of the super interesting aspects of games, especially like massively multiplayer online games.
If you've looked at things like Eve Online or World of Warcraft and all these things that players end up doing in terms of their social structures and social systems and then their own versions of the way they play the game that come out of the underlying system, that stuff's really interesting. look at.
As you mentioned this thing about poking systems about stuff it shouldn't know about, I was strangely remembering a recent interview with Sam Altman, actually. He was like, well, the way you'd know if a system is conscious is you'd make sure you trained it on a body of knowledge that completely excluded that so it wouldn't know anything about consciousness. And if it started expressing a subjective experience like consciousness, maybe it is. But anyway, that's kind of science fiction, but it's cool to think about.
Hilary Mason:
Yeah, and I mean, and I think there was another point you were making earlier around how people learn to interact with these models. Frankly, like what I'm proposing is to gaslight it and like see what it does. And I would never do that to a human. And so I'm thinking myself about like, what are my ethical boundaries? And like.
Jon Radoff:
Sounds like the DANhack that people are doing to GPT to get it to talk about stuff that it's supposed to be trained out of, speaking of reinforcement learning. So reinforcement learning is like, we could have just spoken for probably an hour about reinforcement learning, but really interesting area that is part of what actually improved the user experience. When we think about the user experience of GPT chat or chat GPT, a big part of it was the reinforcement learning they applied against it. games and the Atari research that you're referring to. I'm also thinking about the research around poker and then diplomacy recently.
So Diplomacy is really interesting because it actually had to use language to negotiate with other players. And that seems like a whole fertile area where you want to constrain the kinds of language that it's going to use to something that's relevant to also be able to kind of act like a human would in that context.
Hilary Mason:
Right, and it's playing, you're kind of playing two games, or at least as a human who has played diplomacy, you have the game and then you have the social game you are playing on top of the game, where the game is a scaffold really for that social game. And so it is really interesting to think about, let's say automated systems in that context as well, and learning to play those games at multiple levels And I also wonder, again, having played diplomacy, you either play that with friends that are so good that you'll still be friends after you stabbed each other in the back or with people you don't care about anymore. So I wonder where the system will fall on that. Are we going to be closer of this?
Jon Radoff:
Diplomacy is one of those games that has the reputation of like a good game to play if you don't want to be friends with someone anymore afterwards.
Hilary, this has been an awesome discussion. I hope it really inspires people who are thinking about games to utilize some of the AI technologies out there to build really creative products. But before we end, I've been running an experiment now. So I have an envelope here and I asked ChatGPT before we got started and knowing that we would be talking about language models and stuff anyway, I said, ChatGPT, what would Hillary and John talk about in a conversation in like a fireside chat? I just glanced at it to make sure that it gave a response, but I'm going to open up and we're going to see how it did. This is just, as far as I know, this experiment has never been run. Let's see.
As far as I know, this experiment has never been done.
…Let's see if we missed anything that we should have talked about. So it said we should talk about the future of AI and its potential impact on society, the role of data science and machine learning in developing a personalized user experience. I'm not going to list all the things it did because it had too many here, but strategies for building and scaling online communities. Yeah, we didn't quite get to that. That would sort of touch the emergent stuff.
The ethics of data collection and usage and how and harm. So it knew that was a topic that you cared about.
Hilary Mason:
Well thank you, chat GPT! I do!
Jon Radoff:
it did a decent job of kind of intersecting some of the areas that we're interested in and coming up with that. I'll post the actual response it did in case anyone is super curious about this. And maybe I'll run more experiments like this in the future. Hillary. Yeah, Hillary, thank you so much for being part of this conversation.
Hilary Mason:
Thank you, this was great.