Scaling Dedicated Game Servers With Kubernetes Part 3 Scaling Up Nodes

From AI Knowledge
Jump to: navigation, search

This is part three of a fivefour-part series on scaling game servers with Kubernetes.



In the previous two posts we looked at hosting dedicated game servers on Kubernetes and measuring and limiting their memory and CPU resources. In this instalment we look at how we can use the CPU information from the previous post to determine when we need to scale up our Kubernetes cluster because we've run out of room for more game servers as our player base increases.



Separate Apps from Game Servers



Before we begin to write code that will increase the Kubernetes cluster size, we need to first separate our applications - like match makers, the gameserver controllers, or the soon-to be-written node scalingr - onto different cluster nodes than where the gameservers would be running.



This has many advantages:



1. As they are running on different machines, the resource usage of our apps will not have any effect on game servers. This means that even if a matchmaker experiences a CPU spike, there is an additional barrier to prevent it from affecting a dedicated server in play. 2. It makes scaling up and down capacity for dedicated game servers easier - as we only need to look at game server usage across a specific set of nodes, rather than all potential containers across the entire cluster. 3. You can use larger machines with more CPU cores, memory, and memory for the game server servers, or smaller machines with fewer cores, memory, and cores for the controller apps, as they require less resources. We essentially are able to pick the right size of machine for the job at hand. This gives us great flexibility and is still cost-effective.



Kubernetes makes it easy to set up heterogenous clusters. Game servers an a lot of fun We also have the ability to specify where pods are scheduled within the cluster using the power of Node Selectors.



It is worth noting that there is also a more advanced Node Affinity feature available in beta. However, we don't use it for this example so we'll ignore its complexity.



To get started we need to assign labels to the nodes within our cluster. This is exactly the same as you would have seen if you've ever created Pods with Deployments and exposed them with Services, but applied to nodes instead. Google Cloud Platform's Container Engine uses Node Pools to assign labels to clusters as they are created. It also sets up heterogenous clusters. You can do the same things on other cloud providers as well as through the Kubernetes API and the command line client.



In this example, the labels role.apps and role.game-server were added to the appropriate nodes of my cluster. To control which nodes within the cluster Pods will be scheduled onto, we can then add a nodeSelector to our Kubernetes configurations.



Here's an example configuration for the matchmaker app. The nodeSelector is set to role-apps to ensure that the container instances are created only on application nodes (those with the "apps") role.



To the same effect, we can also adjust the configuration in the previous article to schedule all the dedicated server Pods on the machines that we have specifically designated. those tagged with role: game-server:



Note that in my sample code, I use the Kubernetes API to provide a configuration identical to the one above, but the yaml version is easier to understand, and it is the format we've been using throughout this series.



A Strategy for Scaling Up



Kubernetes on cloud providers tends to come with automated scaling capabilities, such as the Google Cloud Platform Cluster Autoscaler, but since they are generally built for stateless applications, and our dedicated game servers store the game simulation in memory, they won't work in this case. With the tools Kubernetes provides, it is not difficult to create your own Kubernetes cluster autorscaler!



It is easier to scale up and down Kubernetes nodes in a Kubernetes cloud environment because we only want the resources we use. If we were operating in our own premises, it might not make sense to adjust the Kubernetes cluster's size. We could simply run large clusters across all machines and keep them static. Adding and removing machines is much more difficult than on the Cloud, and would not necessarily save us money, as we lease the machines for longer periods.



There are many strategies that can be used to determine when to scale up the nodes in your cluster. But, for this example, we'll keep it simple.



- Establish a minimum number and maximum number of game server nodes, and make sure that we don't exceed it. - Use CPU resource capacity and usage as our metric to track how many dedicated game servers we can fit on a node in our cluster (in this example we're going to assume we always have enough memory). - Define a buffer of CPU capacity for a set number of game servers at all times in the cluster. I.e. If the cluster is unable to host a certain number of servers, you can add more nodes. Calculate if a new dedicated server for gaming is being started. The buffer amount determines if the cluster has enough CPU capacity. - As a fail-safe, every n seconds, also calculate if we need to add a new node to the cluster because the measured CPU capacity resources are under the buffer.



How to create a Node Scaler



The node scaler essentially runs an event loop to carry out the strategy outlined above.



Using Go in combination with the native Kubernetes Go client library makes this relatively straightforward to implement, as you can see below in the Start() function of my node scaler.



Please note that I have removed most errors handling boilerplate to make it clearer. But, the original code is available if needed.



Let's take a look at Go for those who aren’t familiar with it.



kube.ClientSet() returns a piece of utility code. This returns a Kubernetes ClientSet, which gives us access into the Kubernetes API on the cluster that we're running on. gw.ClientSet(): Kubernetes allows you to monitor cluster changes using its APIs. In this particular case, code returns a datastructure containing a Go Channel (essentially blocking-queue), specifically, gw.events. This will return a value each time a pod for a game is added to or removed from the cluster. The full source of gameWatcher can be found here. tick.= time.Tick.tick. This creates a Go Channel which blocks until a specific time, in this instance 10 seconds. After that time, it returns a value. Here is the reference for time. 1. The main event loop is under the "// ^^^ MAIN EVENT LOOP HERE ^^^" comment. Within this code block is a select statement. This declaration basically states that the system will block until the gw.events channels or the tick channels (firing once every 10s) return a valid value, then execute s.scaleNodes(). This means that a scaleNodes request will fire when a game server is added/removed, or every 10 second. s.scaleNodes() - run the scale node strategy as outlined above.



Using the Kubernetes' API, s.scaleNodes() allows us to query the CPU limits for each Pod. We can view the Pod specification's CPU limits via the Rest API or the Go Client. This allows us track how many CPUs each game server is using, as well as any Kubernetes management Pods. The Node specifications allow the Go client to track how much CPU capacity is available in each node. From here it is a case of summing up the amount of CPU used by Pods, subtracting it from the capacity for each node, and then determining if one or more nodes need to be added to the cluster, such that we can maintain that buffer space for new game servers to be created in.



If you look at the code in this example you will see that we are using Google Cloud Platform APIs to add nodes to the cluster. Google Compute Engine Managed Instance Groups APIs let us add (and remove!) instances from the Kubernetes Nodepool. That being said, any cloud provider will have similar APIs to let you do the same thing, and here you can see the interface we've defined to abstract this implementation detail in such a way that it could be easily modified to work with another provider.



The Node Scaler is deployed



Below is the deployment.YAML for node scaler. Environment variables are used in order to set all configuration options.



- Which nodes in the cluster need to be managed? How many CPU each dedicated server of gaming needs? The minimum and maximum number? How much buffer should be available at all times?



You may have noticed that we set the deployment to have replicas: 1. We did this because we always want to have only one instance of the node scaler active in our Kubernetes cluster at any given point in time. This ensures that we do not have more than one process attempting to scale up, and eventually scale down, our nodes within the cluster, which could definitely lead to race conditions and likely cause all kinds of weirdness.



Also, make sure that the node scaling server is shut down before starting a new instance. This will ensure that Kubernetes destroys the current running node scaling server pod before recreating the updated version. It also avoids any possible race conditions.



It is possible to see it in action



Let's examine the logs once we have deployed our Node Scaler. In the video below, we see via the logs that when we have one node in the cluster assigned to game servers, we have capacity to potentially start forty dedicated game servers, and have configured a requirement of a buffer of 30 dedicated game servers. As we fill the available CPU capacity with running dedicated game servers via the matchmaker, pay attention to how the number of game servers that can be created in the remaining space drops and eventually, a new node is added to maintain the buffer!



Kubernetes is one of my favorite things about it. We can do this without needing to build so much foundation. While we touched on the Kubernetes client in the first post in this series, in this post we've really started to take advantage of it. This is what Kubernetes is all about. An integrated set of tools to run software on large clusters, which you have great control over. In this instance, we haven't had to write code to spin up and spin down dedicated game servers in very specific ways - we could just leverage Pods. We have the Watch APIs to allow us to react and take control of events within Kubernetes. It's amazing how much utility Kubernetes offers you, even though many of us have been building it for years.



That all being said, scaling up nodes and game servers in our cluster is the comparatively easy part; scaling down is a trickier proposition. We'll need to make sure nodes don't have game servers on them before shutting them down, while also ensuring that game servers don't end up widely fragmented across the cluster, but in the next post in this series we'll look at how Kubernetes can also help in these areas as well!



In the meantime, as with the previous posts - I welcome questions and comments here, or reach out to me via Twitter. You can see my presentation at GDC this year as well as check out the code in GitHub, which is still being actively worked on!