Can big data be open data?

Big data has finally come to the world of games. As designers now study players with more and more granularity, games themselves become mirrors of our own play preferences. But with big data comes big questions: If games become personalized experiences, how will we have water-cooler conversations about them? Where is the boundary between collecting information to better experiences and a deeper understanding of the real you? Over the next few weeks, iQ and Kill Screen will explore these questions. 

This past spring, in conjunction with a panel at PAX East looking back on the completion of the Mass Effect trilogy, BioWare released an infographic sharing for the first time a small set of aggregated user data from the third Mass Effect game. While some of the numbers were impressive—88.3 million hours played in the single-player storyline, and 10.7 billion enemies killed in multiplayer—little of the information was particularly revelatory. A large percentage of players preferred to play as a male Commander Shepard over his female counterpart (82% to 18%). Very few players (4%) completed the game at its most difficult “insanity” setting.

But tucked in amidst all those stats was a particularly fascinating one: 39.8% of players earned a “Long Service Medal” for either importing a saved character for Mass Effect 2 and then completing Mass Effect 3, or completing Mass Effect 3 twice.

This is, of course, a dramatic number for an industry in which somewhere between 10 and 20% of players are expected to finish any given game, but it’s also a tellingly ambiguous piece of information. In a game series like Mass Effect, which is built on branching storylines, and promises a different experience to players who replay a different class, gender, or alignment, it would be enlightening to know how many players do actually complete a second playthrough, and how many of those players choose to pursue a dramatically different experience compared to how many hew closely to their original choices. By comparing this information to similar data from other branching narrative games like The Walking Dead or Heavy Rain, we could start to talk about how players interact with branching narratives, whether certain groups of players seems to be driven by an impulse to explore all the available possibilities and how others might seem to use multiple playthroughs to attempt to refine and perfect a particular performance.

BioWare’s infographic serves as a tantalizing hint that the data to answer such questions might be stored on one of the company’s servers. It’s also a clear expression of the fact that on the rare occasions that developers do choose to share data, it’s generally in a severely limited fashion as part of a marketing event, and it’s almost never the raw data itself. Many contemporary videogames generate huge amounts of data, some of which is shared with individual users on in-game player stat boards like in Red Dead Redemption, some of which is shared in a semi-public fashion in the form of player achievements and trophies, and some of which is simply stored in-game. As internet connectivity even for single-player games becomes the default for both console and PC gamers, more and more of that data can potentially be captured by developers.

A lot of that potential, however, remains to be realized. According to Ben Medler, a technical analyst for EA Games, it’s not yet standard for even AAA games to include the sort of robust data systems that could lead to useful internal analysis, much less eventual public sharing. “Developers always dream big, so at the beginning of projects you always hear promises to track every bit of data or to allow players to share everything. And it is not from a lack of trying and hoping that developers rarely deliver on these promises, it’s just that they run out of time. Deadlines slip, core game features take longer, engine re-writes are necessary, and all the extra features get cut or held back.” Even in cases where in-game data systems are constructed, there can be additional costs involved in formatting the data in a way that makes it usable outside the game, and even more in building an application program interface that could make the data more broadly accessible.

“At the beginning of projects you always hear promises to track every bit of data or to allow players to share everything.” 

Dmitri Williams, Associate Professor at the University of Southern California and CEO of the game analytics firm Ninja Metrics, echoes Medler’s assessment. “We’re in an interesting transition where there’s all this data, and developers talk about wanting to be data-driven, but they really struggle to make it a priority. It’s just not a cultural norm, and it’s not usually a core competency of the teams. So they’ll often outsource it, or ignore it, or do it a little bit on their own. And it’s the very rare cases where developers devote the resources and do a good job of it on their own.”

Part of the challenge, Williams says, is the particular skill set required to draw useful information out of large datasets. “There aren’t a lot of PhDs who understand big data and player psychology and gaming. That Venn diagram is pretty small.”

The challenges may be substantial, but there are models for what big data sharing might look like, and what sort of benefits it could have for players, developers, and games as a whole. Since 2007, Blizzard has made World of Warcraft player information directly available to the public through its Armory website—4,500 variables on every active character, every day.

Nick Yee, a research scientist studying online games and virtual worlds, has spent years researching massively multiplayer online player behavior, work made possible in large part by the accessibility of Armory data. Before the Armory, Yee says, gathering information on MMOs was a matter of attempting to negotiate with individual developers, who often had concerns not only about sharing raw data, but also about what findings could be shared once research was completed. The Armory not only changed that dynamic—at least between researchers and Blizzard—it also had a huge impact on game studies as a whole. “The problem with studying online games up to World of Warcraft,” Yee says, “was that every game researcher was speaking their to own game, and their own game culture. And so there were a lot of papers and books on specific games but they were all different, and it was hard to compare. By releasing its data, World of Warcraft allowed academics to kind of have a lingua franca. Now suddenly everyone could speak WoW.”

Blizzard’s motivation in releasing World of Warcraft data, of course, had less to do with enabling academic study than with encouraging engagement with its player community and allowing players to develop their own modifications to use within the game. Player groups could use the Armory to set up their own rankings, find players to recruit, and perform basic “background checks” before admitting new members. New players could find out what gear players at higher levels were using. Popular user mods were sometimes incorporated by Blizzard into game code.

Blizzard took a risk in sharing player and functionality data—Yee suggests that many other developers have not been willing to do the same because it could allow competitors to reconstruct marketing and player retention data—but they also gave players the ability to improve the experience of the game as whole. Massively multiplayer online games are essentially projects in ongoing worldbuilding, and World of Warcraft shows the way that making as much of that process visible to the user can drive participation. Dmitri Williams compares it to the sense of ownership players feel for games like Minecraft. “There’s an appetite for game data, and when you expose it, the players become extremely loyal, and extremely attached to it. It’s because the tools are now theirs. When you give people stuff, it really does pay off.”

Of course, the analogy between massively multiplayer online games like World of Warcraft and single-player, narrative-driven games like Mass Effect is less than perfect. “Modding” is often a dirty word in the world of single player games, where the game itself functions as more of a closed environment. In single-player games too, however, player communities are constructed in chat groups and wikis, as well as screenshots and gameplay videos where players show off costume mods, glitches, and narrative branches other players may not have experienced. Even the marketing value of the PAX East infographic comes largely from the way as it functions as a community reinforcement exercise, allowing players to line up their choices with those of other players. It may not be terribly useful for academic study, but it’s great for adding new fuel to old player debates.

“There’s an appetite for game data” 

And it’s a step in the right direction, even if just a small one. Dmitri Williams suggests that large scale data sharing won’t become common until developers observe enough positive feedback to justify devoting resources to it. The ongoing development of increasingly user-friendly data tools may help ease the way a bit by allowing developers to more easily incorporate data collection and analysis into the development process, but developers also have to decide that the benefits of making game data public outweigh the risks. “There’s always a tension between the idea that data wants to be free and maintaining control, but I think there’s a trend toward accessibility,” he says. “It’s certainly paid off for everyone that I can think of who’s tried it.”