Designing your own Game Servers for Asynchronous Play (Postmortem)

Making your own game servers is fairly significant undertaking, so now that we’ve released our own game servers and started using them for our games – Hive, Khet 2.0, and Reversi – I thought I’d share a brief postmortem to help anyone else considering going this route at some point.

Why host your own Game Servers?

Obviously, a good starting point is to determine if it even makes sense to host your own game-servers. Our first releases, “Hive” and “Khet 2.0” both shipped using Steam’s networking. In fact, Hive had a previous version running on Xbox 360 which used Xbox Live.

Most common platforms have some built-in networking. Unfortunately, even if you use something like Unity, you’ll spend a lot of time cramming in each platform’s different concept of networking into your game. More on that later.

All of these platforms take a while to integrate but significantly less time than writing your own servers from scratch. They’re usually free to use, and are built to scale to many users.

Due to the time-savings, my recommendation is: start off using the networking on the platform you plan to release on, unless you have a strong reason not to.

Games are not a sure thing, so don’t go investing in making your engine flexible before you’ve found out that you’ve made the right game(s)! If it turns out you were right & there actually is some demand (like there was for our board games), you can spend the months re-writing while players already have your game and are enjoying it. Also, you’re probably earning some income during this time which can help fund the creation of your game servers.

So what would be a good reason to write your own servers? Here are the main reasons that impacted my decision:

  1. Technical limitations: Neither Steam (nor Xbox Live, if I recall) had support for asynchronous play: allowing a game to persist even while neither player is online. This type of thing was really important for turn-based games like ours.
  2. Cross-platform capabilities: Our games were already cross-platform in the sense of being on Windows, Mac, and Linux… but they were still tied to a single gaming platform: Steam. This is currently only available on desktops / laptops / SteamOS Consoles, but wouldn’t it be cool to be able to play from PC to mobile some day? In a game without coordination (like turn based games) it is completely fair to play across these form-factors without giving either side an advantage.
  3. Minimizing re-writes during massive porting: We didn’t have to rewrite our code to make our Mac and Linux versions of our games, they all ran with Steam. However, if we expand to mobile in the future, or go back to consoles again, we’d have to re-write to iOS’s Game Center, Google Play, PlayStation Network, etc.. With our own game-servers, not only will players be able to compete across those systems, but we won’t need to do a rewrite! As long as the system gives us access to make http requests, we’re all set!
  4. More DRM-free: Many gamers are cautious about buying games with DRM because they’re afraid that petty corporate overlords are going to arbitrarily yank their access to the product they legitimately purchased. It has happened plenty of times in the past. While we haven’t added any DRM to our games (and even created a DRM-free mode to make the game run as smoothly as possible even without Steam), we were still bound to requiring Steam for Online Play. Now we don’t use their game servers. We currently still use Steam Lobbies, but once we write those out, we will have no requirement for any 3rd-party platform at all. We’ll be completely DRM-free. #feelsgoodman

How does it work?

This section will be a brief overview of how the system actually works.

Connecting Steam Players to BlueLine Cloud

It should be very simple for users to get playing. An experience that I enjoyed as a gamer was PlayFab’s signup for Planetary Annihilation. So I modeled our signup after that (it’s a single, small form that you only ever see once). Apparently, I oversimplified a bit by going this route (see: Mistakes section below), but it felt solid & was a decent starting point.

The form gives the option to add an email address and password. This keeps the data from being locked into your Steam account. Additionally, we’re keeping the door open for letting users have email alerts when it’s their turn (Steam Notifications are great, but you only see them when you’re on Steam).

After the initial setup, we use the user’s Steam authentication to automatically connect them in the future. Keep in mind: even if the user does not set an email/password to give themselves unlimited access later, we can still connect them.

Cloud Servers

Fortunately, we have a lot of web-background and have scaled large web-services before, so not having to learn all of that from scratch saved a ton of time.

We ended up structuring the system to run on Cloud Servers. That’s a fancy name meaning that you don’t own a specific piece of hardware, but rather you’re assigned a certain amount of resources & that can bounce around as some machines fail and others start up. They are easy to scale up, and in our case they happened to be the least expensive introductory option also – which is great! I’ve been pricing and re-pricing them since AWS’s early days, hoping the scales would finally tip!

We’re paying in the ballpark of $20/month to start out, and things are running smoothly. If things start to slow down, it’ll likely be another $20/month for more power, and so-on.

Update: Someone asked about the exact stack. It’s a private hosting company I’d worked with before, and my starting instance is 2GB Ram / 2 CPUs / 1 dedicated IP (need it for some SSL stuff we do). It’s not an “image” like in AWS, the server just spins up running CentOS, so I will configure new instances with normal bash scripts.

The actual stack of the game-servers is a REST-like PHP API that’s backed by a mySQL database.

Game Data

Instead of storing the data in some custom format, we wanted to make sure the data would be easy to use across very different platforms, and even publicly accessible at some point.

Therefore, we stored the “game settings” in JSON blobs, and the “ply histories” (the list of plays that the players have made in a match) are in the most standard format we could find for each game.

This makes the data human-readable in the database (easier for debugging) and more standardized so down the road we could open an API and people could make their own game-reviewing / visualization / statistical analysis programs easily.

Here is some example data:

# Some Hive moves.
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | bP1       | 2015-05-27 15:34:32 |
|         2 | wG1 bP1\  | 2015-05-27 15:53:26 |
|         3 | bQ1 -bP1  | 2015-05-27 18:43:32 |
|         4 | wP1 /wG1  | 2015-05-27 19:12:07 |
|         5 | bA1 bP1/  | 2015-05-27 20:30:12 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

# Some Reversi moves ("Hasegawa" notation).
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | f5        | 2015-05-02 08:39:32 |
|         2 | f6        | 2015-05-02 08:39:32 |
|         3 | e6        | 2015-05-02 08:39:32 |
|         4 | d6        | 2015-05-02 08:39:32 |
|         5 | c7        | 2015-05-02 08:39:32 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

# Some Khet 2.0 moves (cw and ccw are rotations rather than movements).
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | cw g5     | 2015-04-10 15:10:49 |
|         2 | e6-d6     | 2015-04-10 15:11:12 |
|         3 | cw e5     | 2015-04-10 15:11:34 |
|         4 | c5-d5     | 2015-04-10 15:11:46 |
|         5 | f3-g3     | 2015-04-10 15:12:04 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

Mistakes!

We rolled this out with Reversi (a new game with a smaller audience) then fixed it a bit and rolled it out to Khet 2.0 (which is a few months older and has a bigger audience) and then we got more feedback and did more fixes before rolling it out to Hive which is our oldest Steam game and has the largest built-up audience.

Fortunately, I think we ironed out most of the wrinkles before the Hive release, but here were our most notable errors.

Communicating about Accounts

When we updated Khet 2.0 to have async, we got some users really upset because they had mistakenly thought we added DRM.

This blind-sided me a little because the dialog only asked for an email address and a password, which I thought were super-ubiquitous these days. I think it was mostly a visceral reaction to seeing a dialog box with the name of our game servers “BlueLine Cloud” on a game called “Khet 2.0”. If they didn’t notice the “BlueLine Game Studios” splash screen, that sounds like it might be a third-party. For all the user knew, this third-party might be sending an email-confirmation link to that address (so it would have to be valid) and then using it for nefarious purposes.

Keep in mind, this dialog was mostly well-received, but if the experience is super negative for some of your players to the point that they’d quit playing your game (which this user would have if we hadn’t been able to explain things better in the forum), it’s worth reworking. Not all of your players will be willing to go out of their way to express their disappointment when something is broken. So if you hear something from two users, it’s likely that many more had the same thought.

We reworked the entire signup so that you never have to give us an email/password if you don’t want to. Furthermore, if you change your mind, you can set up the email/password combination later-on.

Even though the underlying system is the same… we think that changing the wording and making the “No Thanks” option will keep players from being scared, confused, or otherwise inconvenienced by the dialog.

Initial version:
2015-01-10_simulated

Improved version:
2015-05-27

The second one is much more clear that the email address really is optional and it’s only UNlocking your data, not adding additional locks.

Estimation!

Still in the “what we did wrong” section. You may hear “estimation” come up as a weak point in many post-mortems… but that’s something that we’re actually usually quite good at. I’m not being glib here, our side-business (that we originally made for internal use) is a Burndown Chart tool for Trello. I married a Project Manager / Scrum Master. …estimation is usually one of those things we really excel at – because we actually nerd out on Project Management a bit.

However, this one went way off the rails. In late January, here’s me embarrassingly announcing that it would be out in a couple of weeks. When I tease launch-dates I tend to build in some buffer for things going wrong (and I had done just that). In the end, I underestimated the hours for the initial release by a factor of three! Furthermore, there were so many finicky edge-cases, that we ended up completely changing the rollout plan. We first rolled out on our new game Reversi (which has the most simple “plyHistory” format, so it was the safest), but then we saw that there were a bunch of things that needed to be changed or improved. It took weeks to finish those bugfixes & improvements and roll out on our next most-complex game: Khet 2.0… then we had external causes that made us delay making such a large (and therefore risky) update to Hive… so we polished it for a couple more weeks.

In the end, the Async that was teased to come out in “a couple of weeks” just came out May 26th, four months after that announcement. Someone fan me down, I’m blushing!

This experience hammered into me what might be a law of physics: when doing anything with networking, make a really conservative estimate, then triple it. Then plan for additional time for bugfixes after launch.

Xbox networking, Steam networking, and even writing our own gameservers, all took way longer than expected. These type of tasks aren’t the sum of their anticipated parts because you will fall into several rabbit holes where you can spend days debugging things that make no sense at all.

Furthermore, documentation always seems to be horrible for anything networking related. My guess is that there just aren’t many people who end up using it very deeply. People design the systems, do some Hello World apps while they write the docs, then the broad & wild internet makes a mess of our perfect theoretical world!

To reiterate: For anything related to networking: estimate conservatively, then triple it.

Conclusion

I hope that gave a good general overview of the types of things we had to do to switch to our own game servers, why we did it, and what this lets us do in the future.

If you have any questions, feel free to leave comments here or contact me through any of the other methods on the site (these days, I’m pretty accessible through twitter too @bluelinegames).

Cheers!