Designing your own Game Servers for Asynchronous Play (Postmortem)

Making your own game servers is fairly significant undertaking, so now that we’ve released our own game servers and started using them for our games – Hive, Khet 2.0, and Reversi – I thought I’d share a brief postmortem to help anyone else considering going this route at some point.

Why host your own Game Servers?

Obviously, a good starting point is to determine if it even makes sense to host your own game-servers. Our first releases, “Hive” and “Khet 2.0” both shipped using Steam’s networking. In fact, Hive had a previous version running on Xbox 360 which used Xbox Live.

Most common platforms have some built-in networking. Unfortunately, even if you use something like Unity, you’ll spend a lot of time cramming in each platform’s different concept of networking into your game. More on that later.

All of these platforms take a while to integrate but significantly less time than writing your own servers from scratch. They’re usually free to use, and are built to scale to many users.

Due to the time-savings, my recommendation is: start off using the networking on the platform you plan to release on, unless you have a strong reason not to.

Games are not a sure thing, so don’t go investing in making your engine flexible before you’ve found out that you’ve made the right game(s)! If it turns out you were right & there actually is some demand (like there was for our board games), you can spend the months re-writing while players already have your game and are enjoying it. Also, you’re probably earning some income during this time which can help fund the creation of your game servers.

So what would be a good reason to write your own servers? Here are the main reasons that impacted my decision:

  1. Technical limitations: Neither Steam (nor Xbox Live, if I recall) had support for asynchronous play: allowing a game to persist even while neither player is online. This type of thing was really important for turn-based games like ours.
  2. Cross-platform capabilities: Our games were already cross-platform in the sense of being on Windows, Mac, and Linux… but they were still tied to a single gaming platform: Steam. This is currently only available on desktops / laptops / SteamOS Consoles, but wouldn’t it be cool to be able to play from PC to mobile some day? In a game without coordination (like turn based games) it is completely fair to play across these form-factors without giving either side an advantage.
  3. Minimizing re-writes during massive porting: We didn’t have to rewrite our code to make our Mac and Linux versions of our games, they all ran with Steam. However, if we expand to mobile in the future, or go back to consoles again, we’d have to re-write to iOS’s Game Center, Google Play, PlayStation Network, etc.. With our own game-servers, not only will players be able to compete across those systems, but we won’t need to do a rewrite! As long as the system gives us access to make http requests, we’re all set!
  4. More DRM-free: Many gamers are cautious about buying games with DRM because they’re afraid that petty corporate overlords are going to arbitrarily yank their access to the product they legitimately purchased. It has happened plenty of times in the past. While we haven’t added any DRM to our games (and even created a DRM-free mode to make the game run as smoothly as possible even without Steam), we were still bound to requiring Steam for Online Play. Now we don’t use their game servers. We currently still use Steam Lobbies, but once we write those out, we will have no requirement for any 3rd-party platform at all. We’ll be completely DRM-free. #feelsgoodman

How does it work?

This section will be a brief overview of how the system actually works.

Connecting Steam Players to BlueLine Cloud

It should be very simple for users to get playing. An experience that I enjoyed as a gamer was PlayFab’s signup for Planetary Annihilation. So I modeled our signup after that (it’s a single, small form that you only ever see once). Apparently, I oversimplified a bit by going this route (see: Mistakes section below), but it felt solid & was a decent starting point.

The form gives the option to add an email address and password. This keeps the data from being locked into your Steam account. Additionally, we’re keeping the door open for letting users have email alerts when it’s their turn (Steam Notifications are great, but you only see them when you’re on Steam).

After the initial setup, we use the user’s Steam authentication to automatically connect them in the future. Keep in mind: even if the user does not set an email/password to give themselves unlimited access later, we can still connect them.

Cloud Servers

Fortunately, we have a lot of web-background and have scaled large web-services before, so not having to learn all of that from scratch saved a ton of time.

We ended up structuring the system to run on Cloud Servers. That’s a fancy name meaning that you don’t own a specific piece of hardware, but rather you’re assigned a certain amount of resources & that can bounce around as some machines fail and others start up. They are easy to scale up, and in our case they happened to be the least expensive introductory option also – which is great! I’ve been pricing and re-pricing them since AWS’s early days, hoping the scales would finally tip!

We’re paying in the ballpark of $20/month to start out, and things are running smoothly. If things start to slow down, it’ll likely be another $20/month for more power, and so-on.

Update: Someone asked about the exact stack. It’s a private hosting company I’d worked with before, and my starting instance is 2GB Ram / 2 CPUs / 1 dedicated IP (need it for some SSL stuff we do). It’s not an “image” like in AWS, the server just spins up running CentOS, so I will configure new instances with normal bash scripts.

The actual stack of the game-servers is a REST-like PHP API that’s backed by a mySQL database.

Game Data

Instead of storing the data in some custom format, we wanted to make sure the data would be easy to use across very different platforms, and even publicly accessible at some point.

Therefore, we stored the “game settings” in JSON blobs, and the “ply histories” (the list of plays that the players have made in a match) are in the most standard format we could find for each game.

This makes the data human-readable in the database (easier for debugging) and more standardized so down the road we could open an API and people could make their own game-reviewing / visualization / statistical analysis programs easily.

Here is some example data:

# Some Hive moves.
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | bP1       | 2015-05-27 15:34:32 |
|         2 | wG1 bP1\  | 2015-05-27 15:53:26 |
|         3 | bQ1 -bP1  | 2015-05-27 18:43:32 |
|         4 | wP1 /wG1  | 2015-05-27 19:12:07 |
|         5 | bA1 bP1/  | 2015-05-27 20:30:12 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

# Some Reversi moves ("Hasegawa" notation).
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | f5        | 2015-05-02 08:39:32 |
|         2 | f6        | 2015-05-02 08:39:32 |
|         3 | e6        | 2015-05-02 08:39:32 |
|         4 | d6        | 2015-05-02 08:39:32 |
|         5 | c7        | 2015-05-02 08:39:32 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

# Some Khet 2.0 moves (cw and ccw are rotations rather than movements).
mysql> SELECT plyNumber,plyString,plyTime FROM plyHistory WHERE game_id=REDACTED LIMIT 5;
+-----------+-----------+---------------------+
| plyNumber | plyString | plyTime             |
+-----------+-----------+---------------------+
|         1 | cw g5     | 2015-04-10 15:10:49 |
|         2 | e6-d6     | 2015-04-10 15:11:12 |
|         3 | cw e5     | 2015-04-10 15:11:34 |
|         4 | c5-d5     | 2015-04-10 15:11:46 |
|         5 | f3-g3     | 2015-04-10 15:12:04 |
+-----------+-----------+---------------------+
5 rows in set (0.00 sec)

Mistakes!

We rolled this out with Reversi (a new game with a smaller audience) then fixed it a bit and rolled it out to Khet 2.0 (which is a few months older and has a bigger audience) and then we got more feedback and did more fixes before rolling it out to Hive which is our oldest Steam game and has the largest built-up audience.

Fortunately, I think we ironed out most of the wrinkles before the Hive release, but here were our most notable errors.

Communicating about Accounts

When we updated Khet 2.0 to have async, we got some users really upset because they had mistakenly thought we added DRM.

This blind-sided me a little because the dialog only asked for an email address and a password, which I thought were super-ubiquitous these days. I think it was mostly a visceral reaction to seeing a dialog box with the name of our game servers “BlueLine Cloud” on a game called “Khet 2.0”. If they didn’t notice the “BlueLine Game Studios” splash screen, that sounds like it might be a third-party. For all the user knew, this third-party might be sending an email-confirmation link to that address (so it would have to be valid) and then using it for nefarious purposes.

Keep in mind, this dialog was mostly well-received, but if the experience is super negative for some of your players to the point that they’d quit playing your game (which this user would have if we hadn’t been able to explain things better in the forum), it’s worth reworking. Not all of your players will be willing to go out of their way to express their disappointment when something is broken. So if you hear something from two users, it’s likely that many more had the same thought.

We reworked the entire signup so that you never have to give us an email/password if you don’t want to. Furthermore, if you change your mind, you can set up the email/password combination later-on.

Even though the underlying system is the same… we think that changing the wording and making the “No Thanks” option will keep players from being scared, confused, or otherwise inconvenienced by the dialog.

Initial version:
2015-01-10_simulated

Improved version:
2015-05-27

The second one is much more clear that the email address really is optional and it’s only UNlocking your data, not adding additional locks.

Estimation!

Still in the “what we did wrong” section. You may hear “estimation” come up as a weak point in many post-mortems… but that’s something that we’re actually usually quite good at. I’m not being glib here, our side-business (that we originally made for internal use) is a Burndown Chart tool for Trello. I married a Project Manager / Scrum Master. …estimation is usually one of those things we really excel at – because we actually nerd out on Project Management a bit.

However, this one went way off the rails. In late January, here’s me embarrassingly announcing that it would be out in a couple of weeks. When I tease launch-dates I tend to build in some buffer for things going wrong (and I had done just that). In the end, I underestimated the hours for the initial release by a factor of three! Furthermore, there were so many finicky edge-cases, that we ended up completely changing the rollout plan. We first rolled out on our new game Reversi (which has the most simple “plyHistory” format, so it was the safest), but then we saw that there were a bunch of things that needed to be changed or improved. It took weeks to finish those bugfixes & improvements and roll out on our next most-complex game: Khet 2.0… then we had external causes that made us delay making such a large (and therefore risky) update to Hive… so we polished it for a couple more weeks.

In the end, the Async that was teased to come out in “a couple of weeks” just came out May 26th, four months after that announcement. Someone fan me down, I’m blushing!

This experience hammered into me what might be a law of physics: when doing anything with networking, make a really conservative estimate, then triple it. Then plan for additional time for bugfixes after launch.

Xbox networking, Steam networking, and even writing our own gameservers, all took way longer than expected. These type of tasks aren’t the sum of their anticipated parts because you will fall into several rabbit holes where you can spend days debugging things that make no sense at all.

Furthermore, documentation always seems to be horrible for anything networking related. My guess is that there just aren’t many people who end up using it very deeply. People design the systems, do some Hello World apps while they write the docs, then the broad & wild internet makes a mess of our perfect theoretical world!

To reiterate: For anything related to networking: estimate conservatively, then triple it.

Conclusion

I hope that gave a good general overview of the types of things we had to do to switch to our own game servers, why we did it, and what this lets us do in the future.

If you have any questions, feel free to leave comments here or contact me through any of the other methods on the site (these days, I’m pretty accessible through twitter too @bluelinegames).

Cheers!

Burndown for Trello gets flame decals!

Not literally… but we made the site faster! Our app that makes burndown charts for Trello has received a number of improvements in the past couple of weeks. A couple of days ago, we made the AJAX requests take about 1/7th of the time that they took previously. The enormous Trello board that we used for testing went from taking 14 seconds to load, to taking 1 to 2 seconds.

The changes that we made only took a couple of hours, so I figured I’d share a few quick tips so that you can get a Pareto-style improvement in your backend-performance too.

We started with a simple open source PHP profiler that I released a few years ago. The only catch is that our slow request was an AJAX call… so I added a small javascript function that can be used to wrap AJAX urls, so that the URL parameter for profiling gets passed to those calls to.

/**
 * Makes the URL profilable by the same system as the pages:
 * if profling was enabled via URL param on this run, adds the param to the URL.
 */
function profilable(url){
	if(getURLParameter('RUN_PROFILER')){
		var delim = ((url.indexOf("?") === -1) ? "?" : "&");
		url += delim + "RUN_PROFILER=" + getURLParameter('RUN_PROFILER');
	}
	return url;
}
// From http://stackoverflow.com/a/8764051/684852
function getURLParameter(name) {
	return decodeURIComponent((new RegExp('[?|&]' + name + '=' + '([^&;]+?)(&|#|;|$)').exec(location.search)||[,""])[1].replace(/\+/g, '%20'))||null;
}

Then anytime that you make an ajax call, just wrap the URL in “profilable()” like this:

$.post(profilable("./myAjaxEndpoint.php"), { someVar: 'someValue' }, callbackFunction, "json").error(errorHandler);

That will pass the RUN_PROFILER=true url params to the ajax endpoint. The other half of the equation is to make sure the profiling info comes back in the ajax request. As you will see from the a original post about the profiler, to get the profiling output as HTML, just call profiler_printResults() somewhere that is safe to output HTML. If the profiling isn’t enabled (eg: by the URL parameters), there will be no output.

Fortunately our ‘ajax’ is not using XML, but rather JSON which contains some HTML… so we just called profiler_printResults() right in our AJAX endpoint and the HTML in the result gets injected into the page along with the rest of the result.

To see the profiling in action, hit this URL:

https://BurndownForTrello.com/?RUN_PROFILER=true

php_profiler_outputThe initial page is very light (all of the heavy lifting is done by asynchronous javascript), so there is only a small table on the first request. Once you “Connect to Trello” and then view one of your trello boards, there will be quite a bit of detail below your board’s info. Just remember that the “Totals” row is the sum of all items, and since many functions are nested inside of each other, there will be a lot of double-counting… the actual time for the backend request is what comes after “Total runtime” below the table of data.

It took only a couple of hours to pull in the open source PHP profiler, add the _begin/_end hooks to our code, write the javascript code to pass the variable along to the backend, and then actually use the profiling output to identify some easy-win hotspots and get an 85% reduction in runtime for that request.

I hope it has a similar return-on-investment for you! 🙂

“Burndown for Trello” now has a Pro version

About a year ago, we started building a scrum-like burndown tool for Trello to help us manage our own projects. After a few months of work, we made it so that anyone could sign up for the free burndown charts for Trello.

dat exponential growth

Burndown for Trello – exponential user growth

We thought that a decent number of people would find it useful, but we didn’t count on it growing as fast as it did! We started getting a lot of requests for additional features. Since we were busy releasing Hive for Xbox 360, it was hard to find time to add new things. Our solution was to create a paid account, so we can spend more time on this project which it seems there is a huge demand for.

We want to do-right by our existing free users, so here are the basics:

  • The free account will remain free and will have all of the features it has today.
  • Anyone who signed up before the paid account was created is an “Early Adopter” and is eligible for a perpetual discount on the paid version (look for Early Adopter in the dropdown that lets you pick a plan).

At some point since the last blog-post, we added the average-velocity line which was requested by users.

At some point since the last blog-post, we added the average-velocity line which was requested by users.

There were some other changes that we made in the process of creating this paid account:

  • The name is now “Burndown for Trello”. We were told by someone at Trello that ‘Trello Burndown’ might create trademark issues.
  • The app is no longer just in a folder on BlueLine’s site… it’s at its own domain now: http://BurndownForTrello.com
  • All requests are now sent over https, so your company’s data is always encrypted.

We’ll be adding more and more to the paid account, but here are the main reasons to upgrade:

  1. Support us! More paid accounts == more time for us to make new features!
  2. Automatic daily updates of stats – even if you don’t visit the site in a given day, if you have a paid account we’ll pull the data from the Trello API for each board and store the stats. Until now, if you didn’t view a board in a specific day, then the next time you view that board, we just extrapolated (averaged) the data across all of the missing days.
  3. All new features will be added to the paid accounts. Free users get the app as it is now (with only minor upgrades, like site redesigns, bug fixes and global changes like adding https). All of the big stuff coming up is paid-only.
  4. MORE SOON! – We’ve had a ton of suggestions, and we’re hoping to add many cool features. Next up, we’re hoping to let you put your estimates in the titles of Trello cards, so you don’t have to visit Burndown for Trello to update your estimates.
  5. Update: March 8th, 2013 – we just finished & rolled out integration with the “Scrum for Trello” Chrome extension. This means that you can put estimates in the titles of the cards in parentheses like this: (2) whether you have the extension or not. If you do this, Burndown for Trello will automatically pull in the estimate. This was hands-down our most requested feature up to this point.

If you want to get all of the new features as we add them, head over to Burndown for Trello and click on the upgrade button!