Back to Blog

A Tale of Two Data Sets – Data in Game Development

Filament Games | Educational Game Developer

It might surprise you to learn that working on the data of your game is usually about half the development effort – if not more. Whether your data is bits of dialogue, images, items, or locations of objects, all of that information has to be meticulously designed and organized. While the game might have some additional rules and presentation, the data design specifies what is possible, and the raw data is ultimately what the player experiences.

Before I break down the journey of two projects’ data sets, I would like to prove that the data development effort is not unique to Filament Games or educational games. In 2014, there was a very successful narrative game released for Android and iOS called 80 Days (later released to PC in 2015). Like most successful games, the developers give professional talks, including this one by the content authors at the Game Developers Conference. In this talk by the lead writer we learned that the game contained more than half a million words in its branching narrative; which for context is more than The Lord of the Rings trilogy. While award-winning, this is not considered a “big game” in the commercial game market, the team size was less than 20 people. 

The two stories I want to tell are taken from two of the largest projects Filament Games has completed. Both of these experiences are somewhat similar in that they had a central space where users could navigate to a number of minigames. They also overlap on target audience age range, and both are HTML5 projects supporting the same devices. One data set comes from TVO’s mPower product that teaches math and the other data set from Scholastic W.O.R.D. which has language literacy content. These are two of our most acclaimed titles – TVO mPower earned a Serious Play silver award in 2017, and Scholastic W.O.R.D. just won a Serious Play gold medal earlier this summer. These two examples are interesting not just because they are large, but also because, despite similar appearances, they are dramatically different.

It might be hard to picture what I mean by “data sets.” While there are multiple ways we work with data on a per project basis, both of these projects used spreadsheets as editors. Through some custom programming the content in the spreadsheets export as tables that each game can digest, but you can just think about them as data tables. There is a bit of art to data design, but we are primarily going to focus on complexity – how interconnected the relationships are between tables, and volume – the number of tables that exist.

In both cases, creating and refining this much data takes a long time, which means development of the game and the game’s content is happening in parallel, and the data creation timeline usually stretches the entire project. 

While I am contrasting these projects, it is important to note that the data exists to service the design of your game. It is the implicit result of many other explicit decisions. Data design with complexity or volume isn’t a strategy decision, but you do need to be conscientious of the direction your project is heading because it will impact development.

High Complexity

Learning quests drive users through TVO’s mPower experience. In what we would call the “core game loop”, you talk to a character to accept a quest that requires you to help another character by playing a minigame, and then as you progress rewards are unlocked that let you complete the quest. As you complete quests, more and more characters join your virtual town, and they offer even more quests. By the time we were done with the project there were:

  • 70+ quests
  • 50 unique characters
  • 260 unlockable items
  • 600+ lines of dialogue

The character town part of the game alone took 9 large tables. Each of the 14 math mini games added their own tables and dialogue to the mix.

Learn more about TVO mPower

In either data development scenario you are starting with nothing and looking for a path to scale up to completion. Complexity usually requires several major systems to all exist as your data has its fingers in all of your other data. TVO mPower required a lot of placeholders, cheat codes, and unenforced game rules to deal with missing data or game systems. Some examples:

  • We made a minigame that just had a win and lose button to stand in for games that didn’t yet exist.
  • Character portraits took a substantial amount of art time, and until a character’s portrait was done, each of the characters had a placeholder.
  • Since quests required every part of the game to be operational, we needed cheat codes to make sure the game progression rules functioned correctly.

It was a very long time into development until you could play the game authentically the way an actual user would. These kinds of games do not coalesce until they are nearly finished. Because of this, content creators can have a hard time visualizing the end result – there is a lot of missing context when reviewing and adding content to the game. Similarly, playtesting can be very rough because if you test early when there is time to pivot, big pieces of the game are missing. If you playtest later, there is way less time to make any adjustments, particularly if you have already sunk resources into an expensive or time-consuming feature.

Even though I wrote it years ago, “How To Make A Learning Game” is as relevant now as when it was first written, particularly the takeaway that there are a lot of competing priorities in development. We definitely did not approach development on TVO mPower making one quest at a time. In order to write a quest, all prerequisite parts that compose it need to exist –- characters, dialogues, minigames, items, and rewards. We actually started with character assets because it was the most expensive part of our performance budget. Then there were some other technical problems that needed to be solved, but the other early content issue that needed focus was the dialogue system. We really needed a lot of lead time on our text content to find and record voice talent for so many characters.  

While part of Filament Game’s secret to success is onboarding client experts into the content creation process, when data complexity is high, Filament Games has to facilitate the relationship by translating and compiling client expertise into game content.

High Volume

Some fun facts about Scholastic W.O.R.D.:

  • 1,250 tables with a combined 20,000 entries
  • 63,000 audio files for voice overs
  • 9,500 images exist to service one mini game  

On this kind of project, once we have designed a form to be filled in, clients are much more able to produce their own content. As you might expect, many of these tables have the same format, with a mini game loading a specific table depending on difficulty. 

Scholastic W.O.R.D. gameplay

Learn more about Scholastic W.O.R.D.

Before you duplicate a form 100+ times you really want to make sure that it is correct. Scholastic W.O.R.D. built much more of the game out before we were satisfied that the content design was final. For a very long time the game only functioned with one example table for each mini game. This is a really small sample size, and there was a real concern that special cases would discovered late in development. In comparison, it’s equivalent to forming an opinion about America having only visited West Virginia.

Once we were confident in the data design, it was like drinking from a fire hydrant as content developers worked in parallel populating data and sending us thousands of files at a time. Let’s assume that content was 99% percent accurate in the first edit. At our volume that would be something like 630 broken or missing audio files, and 200 table entries needing correction.

The path to scaling up this project was pretty straightforward, it really became a project about managing the volume. Where TVO mPower had a set of “in-game” solutions, Scholastic W.O.R.D. needed a suite of external tools and scripts to support it through development. One example is a script we would run to validate content and generate a report containing mistakes and missing data.

The first version of the validator broke and ran out of memory, probably due to the fact that early reports were over 500 pages long. We spent a week rewriting the validator, not only to have visual filters so people could focus on specific kinds of errors or content, but also to provide an informational breakdown on total error numbers. The validator regularly needed updates as we discovered new flavors of issues and needed to know how many exist and where they were located. This tool was like a micro project within a project.

The cost of finding problems really escalates at volume, and most problems leaned on engineering to find solutions to avoid human review. One real problem we encountered was that on one particular browser, images where displaying as all black. We discovered that this was caused by our image library containing some images that were in a different format than their listed extension (e.g. a file was a .tiff file, but someone changed the filename extension to .png). Finding every instance of this would have been laborious in a sea of almost 10,000 images. We could have batch exported every image with a tool like Photoshop, but that had some risks, and the obvious downside of needing to be repeated every time this issue resurfaced. In the end, we found some image processing code libraries that could scan for this issue and generate a list of all of the problem files so we could make targeted changes and add another watchdog to our content updates. 

There are a lot of general problems with being big regardless of what your data looks like. Scholastic W.O.R.D. takes roughly 100 hours to play through all of the content. TVO mPower’s playtime is more variable, but you are definitely not getting through it all in a day. As they are both HTML5 products we needed to test them on multiple browsers as well. A big part of both product’s development story is the enormous quality assurance effort.

In addition to understanding all of the organizational challenges and effort needed to create game data; I hope I have illustrated how data affects development and developers. As well as an appreciation for all the extra work that needs to be done to facilitate content creation, it’s important to be mindful of all the work that is invisible when looking at the final product. 


More development insights from Game Engineering Lead Stephen Calender:

Learning About Learning: Dog Training
How to Design the First Five Minutes of Your Game

How to Make a Video Game for Learning: Costs and Considerations

Let's stay connected!