This is part 5 of the multipart Trap Labs Code Design and Architecture Series.
Gaffer on Games has a great series on the inner workings of a networking engine. The author, Glenn Fiedler goes into detail about every component in way more detail than I ever would so I please read his articles before you read mine. Consider this article an addendum to his series. I’ll specifically focus on design and architecture, and provide some insights on testability as networking is one of the more difficult components to test.
So again, from this point forward, I am going to assume you already know the inner workings of a network engine. Otherwise this article won't make much sense to you.
The example Glenn gave in his series is a single threaded game loop that updates networking and game at the same time. This is fine for most games. I took it a step further and improved it with a multithreaded engine
Ideally, you want the networking component to send and receive packets as fast as possible. If the entire game is on a single thread the core game loop will be the biggest bottleneck, because you can only send and receive packets as fast as your game does the processing. For example if your game runs at less than 30fps and your need a send rate of 30 ticks/second, then your network is effectively bottlenecked by your game. And if your frame rate fluctuates a lot it'll also affect your network stability and congestion. I created a multithreaded and asynchronous (Boost Asio async sockets) standalone networking library such that the bottleneck wouldn’t exist. It was a good challenge and I hope it will be useful for you too.
Note that OSSockets
and GameLoop
are modules, not classes
Including the GameLoop
this is a three thread process: receiver thread, sender thread, and game loop thread. The basic work flow goes like this:
ReceiverProcessor
for deserialization and processingReceiverProcessor
puts the deserialize data onto a command data queue and wakes the GameLoop
thread (if sleeping)GameLoop
thread takes this data queue and inputs it through the command engine described in Part 3GameLoop
and put on an outgoing data queueGameLoop
wakes the SenderProcessor
(if sleeping)SenderProcessor
takes the outgoing data queue and creates corresponding packetsI used a wait-free queue from BOOST (single producer single consumer) as the data queue that delivered the data between threads.
Note the clear separation of boundaries on OSSockets
and GameLoop
. All boundaries dependencies are inverted using interfaces (remember DIP?). This way the networking module is agnostic from the operating system and the game. This means that it doesn’t need to know about the game protocol or the transport protocol.
Something that worried me when I came up with his architecture was whether the context switch between threads may be too expensive. I’ve yet to benchmark anything, but the CPU usage on Trap Labs’ dedicated server is only 3% on debug build hosting a 4 player game (Intel i5 2500k @ 4.0GHz), and there was no perceived lag. I’d say that’s pretty good.
The significance of the Receiver
and Sender
being an interface is that you can use any transport protocol you want. This allows you to use the networking library with either UDP or TCP (or something completely different). I implemented both TCP and UDP variants with Boost’s Asio asynchronous sockets. I used UDP for game the loop and TCP for all lobby transactions.
The interesting bit of architecture worth noting is that dependency on the Receiver
and Sender
is different. OSSockets
inherits Sender
and references Receiver
. Under UDP I have two implementations: UDPReceiver
and UDPSender
. UDPReceiver
references the Receiver
interface which is implemented by ReceiverProcessor
. The idea is that if you are not receiving any packets you should be doing nothing. UDPReceiver will wait for packets the come in (handled by Boost Asio), and when a packet is received it goes through ReceiverProcessor
and wakes up the GameLoop
through the Wakeable
interface (wakeable is usually implemented using condition_variable
in C++ in case you are curious).
UDPSender
on the other hand implements the Sender
interface because it is used by SenderProcessor
to send the packets. And because SenderProcessor
is wakeable by GameLoop
, the association on both SenderProcessor
and Sender
are different from Receiver
.
Simply stated, receiving is passive and sending is active, and that's why the dependency is different from the networking module's perspective. I know it’s a bit difficult to see the reasoning behind this, but it should become clear once you try to implement this architecture.
To summarize:
Receiving in one direction (>>>)
UDPReceiver
uses Receiver
interface which is implemented by ReceiverProcessor
which wakes up the GameLoop
Sending in other direction (<<<)
GameLoop wakes up SenderProcessor
which uses Sender
interface implemented as UDPSender
As their name suggest, ReceiverProcessor
is responsible for taking the packets received and processes them into useable data, and SenderProcessor
takes game data and processes them into outgoing packets.
What’s significant in my architecture is that most of the networking component resides within the SenderProcessor
:
I’ll quickly go over the components from left to right:
GameLoop
and generates the packetIf you need more information on what these modules do you can visit Gaffer on Games to find detailed descriptions of their behavior. The only component that is new is the firewall. It simply filters clients by endpoint (IP + port pair) and protocol id.
Since ReceiverProcessor
and SenderProcessor
are on separate threads, they communicate through a wait free queue TimeStampedAckQueue
. Whenever ReceiverProcessor
receives a packet it updates the queue with the sequence number and its timestamp. Whenever SenderProcessor
is awaken it consumes the queue and updates its internal components like PacketRTT and CongestionAnalyzer.
Let me go through a typical receive and send process. From the ReceiverProcessor
side:
ReceiverProcessor
GameLoop
TimeStampedAckQueue
GameLoop
From the SenderProcessor
side:
SenderProcessor
consumes outgoing data queueTimeStampedAckQueue
(shared by ReceiverProcessor
)TimeStampedAckQueue
and SentTimeStampedSequences
. Packets that haven’t yet been acked will be saved in the TSAckQueueCache
The reason that most of components residing inside SenderProcessor
is that components like PacketRTT, CongestionAnalyzer needs acks from packets received in order for them to operate. In addition, PacketResender
only resends based on timed-out packets reported by PacketRTT
. So all these components are not useful until something is received. Having all of the components within SenderProcessor
is mostly for convenience of access. Logically it makes more sense to put PacketRTT
and CongestionAnalyzer
in ReceiverProcessor
. However, this results in SentTimeStampedSequences
and TSAckQueueCache
being wait free queues inorder for SenderProcessor
to share them due to the multithread nature. I didn’t want to have this overhead so simply repositioning the composition inside SenderProcessor
eliminated this overhead. This was one of the more interesting design decisions I had to make which was not immediately clear until refactoring.
Many of the design decisions that you make for your software is about determining what are the implementation detail that should be decoupled from the module. It is almost intuitive to make the assumption that the protocol and the underlying transport should be coupled to the networking module because networking can’t work without them… or can they? Always remember to question your designs because often the intuitive design is often a bad design.
Let’s look at the transport layer that implements the Receiver
and Sender
interfaces. The obvious choice here is to implement UDPSender
and UDPReceiever
. When using TCP versions they'd still work exactly same. I simply added a flag for SenderProcessor
to be able to turn off resending components modules when using TCP. Due to this convenience I used UDP for the game loop and TCP for the lobby system. I’m planning to use TCP for all ingame operations such as chat and pulling player stats, and I intend to use UDP for all gameplay mechanics. I’ll update this article in the future if I have new findings when using both protocols at the same time.
The true advantage of being transport protocol agnostic actually lies within testing. Not only this allows you to mock the interfaces for unit testing, it’ll also allow you to implement unstable version of UDP and TCP to simulate network instability for playtesting.
In one of the sections this GDC talk on Halo: Reach networking architecture, the speaker talks about their costly network traffic shaping hardware to test the robustness of their networking architecture (you should watch this talk btw). But for us indies we don’t have the money for such expensive equipment. What’s the next best thing? Since Receiver
and Sender
are interface we can implement our own traffic shaping with zero cost! And since the networking is a on a separate thread, I can literally do what I want to the network and it would not affect the game thread!
For example, I implemented a LossyUDPReceiver
that drops a packet every x milliseconds on purpose. This was trivial to implement (simply do nothing when timer is triggered). On top of that I can test the full gameplay with the lossy variants to experience what gameplay is like on a bad network while on localhost. If I wanted, I could go one step further and implement a more complicated lossy algorithm that drops packet on random intervals and delay release of packets. This is super convenient and allows me to test variety of network issues over localhost and further optimize the gameplay. I could even see in the future to implement a full suite of lossy transports to simulate myriad of realistic networking conditions.
Of course I’m not saying this is better than using actual traffic shaping hardware. But until I have the money to afford such expensive equipment this is the best way for me to test and optimize for bad networks.
See I just saved you tens of thousands of monies how are you going to repay me? :)
To make the networking engine truly portable, it has to be game agnostic as well. This means that it cannot know about the game protocol. How can that be? Well if you separate the network protocol from the game protocol you can do just that. For example, the mandatory parts for the network protocol are just really 4 items:
You can certainly add more fields as needed. The networking engine would only deserialize the first chunk which is what stated above, and the game loop would deserialize the latter chunk. So as long as your game uses the same networking protocol you can essentially use any game protocol you like, making the networking engine completely game and protocol agnostic.
Let’s say your game is single threaded and using non-blocking sockets. Regardless whether you use UDP and TCP as your underlying transport protocol your game will hang if there is packet loss. The way to reduce the visible hang effect is to continue to update your game loop even if you didn’t receive a packet, and use a custom reliability method over UDP to reduce the lag. So just as Glenn explained UDP is still the superior choice. In addition, depending on how computationally intensive your game loop is, it’ll also affect your send rate over the network.
My multithreaded architecture eliminated the networking bottleneck completely by placing the game loop on a separate thread. This way the networking components can try to send and receive as fast as possible, and the game loop can continue to process without being affected by network condition. Assuming there’s ample computing resource, the game loop will never block the network, and the network will never block the game loop.
Regardless of what architecture you use, if there is packet loss your game will lag. What you can do as the software architect is design your game in such a way to compensate for the lag as best as you can such that to the player the lag is mitigated or unnoticeable.
I don't have any emperical evidence to show that my architecture is actually better than a single threaded variant. If I have the time and resources one day I would like to run benchmarks against a single thread variant to test their performance and scalability. But hey, it works great!
For some this architecture might be considered as over design. In addition, the knowledge required to implement and test multithreaded library is definitly not something for beginner or even some intermediate programmers. And I would agree, for most indie games, a single threaded version like that Glenn offered is more than sufficient.
Personally I don't think this is over design. I’m in it for the long run, and I hope you are too. I built this networking with the intention of reusing the library in the future for any real-time multi-client software. And as it stands, this networking engine is infinitely reusable as along as I respect the separation of networking and the software protocol. In addition, the fact that its multithreaded means non of my apps in the future will be blocked by the network or the app core. I’m really happy it turned out well and I’m proud to say it’s one library that I built worthy of the multi-core era.