Cogito, ergo sum

Everything you ever wanted to know about replication (but were afraid to ask)

From Unreal Wiki, The Unreal Engine Documentation Site
Jump to: navigation, search

Replication is a mighty beast lurking inside the Unreal Engine that even seasoned UnrealScript programmers treat with a lot of respect. With this article I'll try to explain how replication works and hopefully get rid of some myths and misunderstandings on that topic.

This article should be considered more a technical documentation than a tutorial. It specifically avoids code examples, because the different engine generations require expressing the same concept or feature in different ways. See Replication Idioms for actual examples of some common replication tasks.

A German version of this article is available over at UTzone.de.

Things to keep in mind while reading

We're on the Unreal Wiki, a site full of tutorials and reference documentation. I won't go into detail for every feature, because you can find that elsewhere. You really should have gathered some experience with the language itself and know how to use it properly before you take on scary features like replication. If you still have a question, try looking it up on the wiki first. If you can't find an answer, you can of course still ask on this article's discussion page. For more complex questions you might want to post on a forum instead, though.

One fact people probably don't expect is that the demo recording feature of the UT series of games is internally handled much like a network game. When you record a demo you are something similar to a listen server and when you play back a demo you are a client. Even if you write an offline-only mod, as soon as you want to support demo recording, you will have to deal with replication. The difference between demos and network play is that when recording a demo, the "server" doesn't expect the client to respond (network traffic is just dumped to a file) and that the "client" in demo playback will discard any data that is supposed to be sent to the server.

Another interesting case is UTV. A UTV server basically is a proxy server that looks like a special client on the game server, but acts as a server to its own clients. A UTV server's clients are spectators that can interact with each other via chat, but except for the primary client their data stays on the UTV proxy and doesn't reach the game server. Additionally the UTV proxy intentionally delays the game server's data so UTV clients cannot be used by the players to cheat in some way.

Background - What you start with on the client side

I won't explain how you should set up a server and connect to it with a game client, that part is covered in great detail elsewhere. This section is about what you start with after the game loaded a map on the client.

To make it short: You start with most of what the mapper added to his level. Particularly all non-actor objects (sounds, textures, meshes) used in the map will be loaded. Some of the actors the mapper placed will be missing, though. To be precise, the engine deletes all actors that have neither bStatic nor bNoDelete set to True. Level geometry, most lights, movers, navigation points, keypoints, emitters, decorations and many other actors aren't affected by this. However, all Pawns (especially placed monsters, vehicles and turrets), Projectiles, in UE1/2 also Pickups, and triggers will be gone. All of the remaining actors will have their Role and RemoteRole values exchanged, except for ClientMovers and similar actors marked as bClientAuthoritative. Quite a lot of the static and non-deletable actors end up with Role set to ROLE_None here, but that doesn't mean they don't exist on the client. It only means they won't receive any property updates through replication.

See What happens at map startup for what else happens before replication kicks in.

Replication basis - Actor replication

F6 network stats.

So, how does the client get to know about them? They do show up when you play the game, right? The basic concept responsible here is actor replication. For each relevant actor, the engine creates an "actor channel" between the server and the target client. The number of active channels can be viewed via the stat net command, which is usually bound to the F6 key. Note that the number of channels listed there is the total number of channels. Most of these are actor channels, but the engine also has other channel types, e.g. for voice chat.

There are actually two flavors of actor replication, one for static and non-deletable actors (those that aren't deleted at map load on the client) and one for any other actors that were either deleted at map load or spawned on the server at runtime. As mentioned above, static or non-deletable actors already exist in the client world, so their flavor of actor replication just establishes a channel between the corresponding server and client instances. It should be mentioned that static actors can only be subject to replication if they were already marked as bAlwaysRelevant before map load.

The other version is for actors that are neither bStatic nor bNoDelete (let's call them "runtime actors" because they can be spawned and destroyed at runtime) and requires a bit more work, as the target actor does not exist on the client. The server basically tells the client to spawn an actor of the required type and it also tells where to spawn it. By spawning the actor on the client, all its properties start at the class default values. See What happens when an Actor is spawned for details on how actors are initialized. The most important part here is that the actor gets its Role and RemoteRole values exchanged before any UnrealScript code is executed.

In case this isn't immediately obvious: Runtime actors placed by the mapper are deleted at map load and then possibly spawned again through actor replication. When they are replicated, all their properties are reset to class default values. Some of the change done by the mapper might later be reconstructed through other means, but for now they are gone.

Network relevance - Which actors are replicated

Most of what this article explained was from the network client's point of view. Let's switch to the server side for a while to discuss a very important thing, network bandwidth. Bandwidth is usually the most-restricted parameter in a network. Data is sent sequentially, so apart from the time it takes for data to travel anyway, some of the data needs to wait while other data is transmitted. This further increases response times ("ping"), which is undesirable in most games. Games can't reduce the actual travel time of the data, that's a fixed property of the underlying network architecture. They can, however, attempt to reduce the data's waiting time by reducing the overall amount of data to transmit. The Unreal Engine employs several tricks to reduce the overall amount of data, but the best way always is to not send any data at all.

To figure out which actors need to be replicated to a client at all, the engine performs several checks to see if the actor is relevant to the client. These checks can be summed up in the following rules that will be tested roughly in the given order:

  1. If the actor is bAlwaysRelevant, it is relevant to the client.
  2. If the client or its view target owns the actor, it is relevant.
  3. If the client is a UTV see-all spectator, the actor is relevant.
  4. If the client can hear the actor's ambient sound, it is relevant.
  5. If the actor is based on (or attached to the bone of) another actor, it is relevant to the client if the other actor is.
  6. If the actor is bHidden or bOnlyOwnerSee and neither blocks other actors nor has an ambient sound, it is not relevant to the client.
  7. If the actor is in a zone with distance fog and is further away from the client's view location than the distance fog end, it is not relevant to the client.
  8. If there is BSP geometry between the client's view location and the actor's center (!), the actor is not relevant.
  9. The server may decide to check if the actor is behind some terrain and/or beyond its CullDistance if it needs to save bandwidth, which may result in the actor being not relevant to the client.
  10. The actor is relevant to the client.

Note that, while they usually prevent an actor from being rendered, anti portals do not affect network relevance, probably because testing every actor against every single AntiPortalActor for every client may be far too expensive.

Once an actor became relevant, it will continue to be considered relevant until the above rules fail for more than a few seconds. The duration is configured via RelevantTimeout=... under [IpDrv.TcpNetDrv] in the server's main configuration file for UE1/2 or the Engine.ini in UE3. The default value is 5 seconds and provides a good balance between getting rid of non-relevant actors and not having to restart replication too often for actors that often switch between being relevant and being not.

If a previously net-relevant actor is really no longer relevant, its channel to the client is closed and the actor is destroyed on the client. (See What happens when an Actor is destroyed for details.) If the actor becomes relevant again later, it will be spawned again as a completely new actor. If an actor is destroyed on the server, its channel is closed as well, causing the corresponding actor instances on all clients to be destroyed.

There are also two other ways to close an actor channel, which don't destroy the client instance. When that happens, the client takes over "simulation" of the actor behavior without any further help from the server. One way is the property bNetTemporary, which closes the actor channel immediately after the initial set of properties has been replicated (see below). This mode is used for most projectiles that don't change their movement after spawning, except for a potential influence of gravity. Projectiles that allow interaction other than the usual explode/bounce-on-impact logic usually don't use bNetTemporary. This includes projectiles that can be blown up (e.g. shock projectile, Redeemer or AVRiL rocket), that track down a target (e.g. seeking rockets or spider mines) or that simply stick to a target (e.g. bio goo or sticky grenades). bNetTemporary also has the advantage that the server doesn't need to remember which variable values it replicated to clients, but more on that later.

The other way is the bTearOff property, which also closes the actor channel, but it also swaps the Role and RemoteRole properties of the actor again so the client side instance becomes an "authoritative" instance. Unlike bNetTemporary, which can only be set in the defaultproperties, bTearOff is set on the server at runtime to "tear off" replication to all clients at the same time. On the clients the actor was relevant to, the event TornOff() is called for the actor. Once an actor is "torn off", it will no longer be replicated to new clients it might become relevant to.

Variable replication - Updating properties on the client

Alright, now that you know how actors are brought to existence on clients, it's time to think about how to get modifications across the network. Remember, when a replicated actor is spawned on a client, it starts with its class defaults and the only information from the server is the actor's location and potentially its rotation, if it matters. Any other properties are sent separately through what is called variable replication.

Replicated properties are always replicated from the server to all or a specific subset of the clients, but of course only to clients to which a channel for the actor exists. In Unreal Engine 1 there was also the possibility to replicate variables from the client owning the actor to the server, but that feature has been dropped in favor of sending the values via replicated function calls. (We'll see about that one later.) One left-over of that two-way replication is that almost all variable replication conditions in stock code contain the term Role == ROLE_Authority.

Replication conditions

Wait, what's a "replication condition"? Well, as mentioned before, variable replication can be restricted to a specific subset of the relevant clients. The subset is selected via a bool-type expression known as the replication condition. Replication conditions are specified in a special area of the source code, the replication block. Each class may only contain one replication block. Inside there may be one or more replication conditions, each applying to one or more variables or functions. Only one condition may be specified for a variable or function and you are not allowed to specify replication conditions for members inherited from a parent class.

A typical replication block in UE2 might look as follows:

replication
{
  reliable if (bNetOwner)
    ThisVarOnlyConcernsTheOwner;
 
  reliable if (bNetInitial)
    ThisVarIsOnlyReplicatedOnce;
}

In UE3 it would be similar, except that the "reliable" or "unreliable" keyword is missing. That keyword doesn't have any effect on variable replication, it only exists because it affects the way function calls are replicated, but we'll get into that later.

Technically the boolean expression between the parentheses after the "if" is standard UnrealScript code, so you could call functions there if you want. In practice, however, nobody will do that because the time and frequency at which replication conditions are evaluated is unpredictable. Also, this is deep inside network code and should be as quick as possible. Because of that, some classes have their replication conditions implemented in native code, which is specified in the class declaration via the NativeReplication modifier. These classes still have a replication block so you can figure out when exactly the various properties are replicated. Also, NativeReplication only applies to variable replication, not to replicated function calls.

So, what kind of conditions can you use? Here are a few properties you may find useful:

bNetInitial
True only for the initial bunch of variables replicated in addition to the information for spawning the actor on the client.
bNetDirty2,3
True whenever variables changed on the actor. To be honest I'm not entirely sure why this exists as variables always only get replicated if they changed from what the server thinks the client's value is.
bNetOwner
True only if the actor is owned by the client.
bDemoRecording
True if replicating to the demo recording driver instead of a "real" network connection.
bClientDemoRecording
True if the demo is being recorded on a network client, false if recording offline or on a server or not recording a demo at all.
bRepClientDemo
True on the server if the actor is owned by a client that currently records a demo.
Level.ReplicationViewer2
The PlayerController of the client currently replicating to.
Level.ReplicationViewTarget2
The ReplicationViewer's current view target.
WorldInfo.ReplicationViewers3
A dynamic array with information about the PlayerController(s) on the target client, their view target, view location and view direction. It's an array because UE3 allows more than one player on a client if splitscreen mode is enabled.
Role
This actor's local network role. You only need to check it when replicating variables in UE1 or when replicating function calls in UE1/2.

If you look around in the replication blocks of stock classes, you may find other variables being used. For example an actor's Mesh is only replicated if the DrawType is DT_Mesh.

What is replicated and when?

So, when exactly does variable replication happen? The short answer is "between world updates, if anything changed". But the server doesn't really check all actors after each tick. Each actor class has a NetUpdateFrequency, which tells how often per second the actor should be checked for changed replicated variables. The first check is of course done right when the actor becomes relevant to the client and bNetTemporary actors won't get any further updates after that. For all other relevant actors, the engine repeats checks for changed variables about every 1.0/NetUpdateFrequency. Usually there's only limited bandwidth available, so the engine needs to prioritize the various actors. This is done via the NetPriority property. The higher an actor's priority is, the more likely it will be updated. However, lower priority actors won't "starve", because the longer an actor has to wait for its update check, the more likely it will be updated during the next round of checks.

In Unreal Engine 1 you can't control at which time an update check for replicated variables happens. In Unreal Engine 2 you can force (well, at least strongly suggest) updates earlier by setting the NetUpdateTime to a value in the past, e.g. Level.TimeSeconds - 1. Unreal Engine 3 provides the property bForceNetUpdate, which can be set to True for an immediate update.

The server keeps track of what each client knows about the actors replicated to it and their replicated variable values. The initial assumption about what the client knows is built based on the serverside class defaults, which includes localized and configurable values read from the localization/config files. In other words, config/globalconfig properties might not initially get replicated because the server thinks the client already knows about them. It is really recommended you use separate properties for replicating configurable values. Similarly if you edit class defaults at runtime and then spawn a new replicated actor, the server will not know you have changed the defaults and just assume the client knows about it.

Every time a variable is send to the client, the server will remember its value for that client. This may use a good amount of memory, but it helps the server save bandwidth by not having to replicate the same value again. Consider the following scenario: The server replicated a certain value to the client, then the variable is modified on the server multiple times, eventually ending up the same as it was when the server replicated it. None of the changes were replicated yet because they happened too quickly, but the server marked the actor as having changed properties. Now it's time again to check for properties to replicate. The server will look up what values it sent to the client last time and finds that the replicated property hasn't actually changed. To save bandwidth, the server won't send the property value again, because the client already knows about it.

Value compression

As mentioned in the section about relevance, the engine has a few tricks to reduce the amount of data it needs to send. One of these tricks is that it compresses certain value types for transfer and uncompressing them. This compression is not lossless, but actually changes the value that arrives at the client. This doesn't apply to basic types, but only to certain structs:

Vector
The components are rounded to the nearest integer and send as integer data. This way small vectors only require several bits up to a few bytes, while the original three uncompressed float values would have required 12 bytes. If you need more than integer precision, you should multiply the vector by a scalar value before assigning it to the replicated variable.
Rotator
Only bits 9 to 16 of the components are transfered, which corresponds to the operation C & 0xff00. That way the required data amount is reduced from 12 to about 3 bytes. (It seems zero components even only take up a single bit, reducing the minimum size to 3 bits for the zero rotator.) The compression restricts replicated rotator values to rotations and makes them useless for rotation rates. To replicate a rotation rate, you could copy the rotator components to the components of a vector variable. Note that you shouldn't use typecasting to vector because that results in a unit vectors, which not only discards the Roll component entirely, but also is heavily affected by vector compression.
Quat
Values are assumed to be unit quaternions, allowing the engine to drop the W component from replication entirely and calculating it from X, Y and Z on the client. As a result Quat values require only 12 instead of 16 bytes for the remaining 3 float values.
CompressedPosition2
The struct consists of vectors for location and velocity and a rotator for rotation. The vectors are replicated as usual, but because this struct is used to pack a player position, the Roll component of the rotation is not replicated at all, while the Pitch and Yaw components receive the usual compression to byte size.
Plane
Components are rounded to signed integers in the range [-32768,32767]. That corresponds to a data size reduction of 50%.

Detecting replicated values on the client

Most of the time you just let values replicate so they are available on the client. Sometimes, however, you will want to react to certain property changes immediately. Depending on the engine generation you have different options to react to replicated variables changing.

In Unreal Engine 1 you're entirely on your own as there is no notification. You will have to keep a backup copy of the variable you are monitoring and frequently check the backup against the original, e.g. in Tick() or a Timer().

Unreal Engine 2 at least tells you that it received replicated variables, but it doesn't tell you which variables were replicated. You need to set bNetNotify to True on the client to receive a PostNetReceive() call when a new bunch of replicated variable values arrived. It should be mentioned that if you only want to get a notification for a single, infrequent event, you can toggle the value of bClientTrigger. This will call the ClientTrigger() event as soon as the changed value arrives on the client.

Finally in Unreal Engine 3 you don't have to figure out which variable was changed, because the engine tells you. To get replication notifications, simply declare the corresponding variable with the modifier RepNotify and the engine will call the ReplicatedEvent() function with the variable's name as the parameter whenever a value for that variable is received.

Note that variables are not always replicated immediately when they are changed. Usually the engine makes sure there are at least 1/NetUpdateFrequency seconds between variable updates for a single actor. Also, actors with a higher NetPriority are usually preferred when there's not enough space to replicate changed properties in all relevant actors. Actors with a lower priority may have to wait longer for their variables to replicate.

To get instant replication at the expense of the ability to pick more than one target client, you can use replicated function calls instead.

Restrictions

Not all types can be replicated, others may only replicate properly under certain conditions. For example dynamic arrays cannot be replicated at all. Any variable's value must at least fit into a single network packet to be replicated, but if multiple values from the same actor are small enough to fit into the same packet, then they will be transferred together, saving some overhead.

Strings and structs can only be replicated as a whole, while the elements of a static array are treated as separate variables for replication. That means, a static array with hundreds of relatively small elements may replicate just fine, while a long string or a very complex struct may fail. Note that static arrays in structs are subject to the "structs are replicated as a unit" rule, while dynamic arrays in a struct will be excluded from the struct replication data.

Actor or object references are another thing where you need to pay attention. Actor references can only be replicated if the referenced actor is either bStatic or bNoDelete or is currently relevant to the target client. Non-actor object references, such as classes, sounds, textures or meshes, will only reach the client if the object wasn't created at runtime. Non-Actor objects (not a reference, but the object itself) are generally not replicated, so you always need an actor if you want to establish a "connection" between the server and a client.

It might be obvious from the article already, but just in case: There is no way to achieve direct replication between clients. Clients can only communicate with the server.

Function call replication - Sending messages between server and client

The word "messages" should be understood in a much wider range than just text messages. UnrealScript functions can have up to 16 parameters and each parameter can have one of many built-in and custom types. Replicated function calls can use almost the entire range of feature you can imagine. For parameters the same restrictions apply as for variable replication, with two additions: The entire function call with all parameter values must fit into a single network packet, and only the first element of a static array parameter is replicated, the others are set to their corresponding null value. If you hit the upper data size limit, you may have to find a way to break down the data into separate calls. If you need to replicate a static array, wrap it into a struct. This also makes passing it around in other cases much easier because structs can be copied as a whole, while static arrays cannot.

Ok, that said, let's look at how to replicate a function call. This differs between UE1/2 and UE3. In engine generations 1 and 2 you use the replication block to specify when to replicate the function call to the remote end. Usually you will include Role == ROLE_Authority for functions you want to send from the server to the client and Role != ROLE_Authority for functions the client should send to the server other terms are extremely rare in the replication condition. Keep one thing in mind: Function replication always implies bNetOwner, i.e. function call are only replicated if the executing actor is owned by a client, and the call will only be replicated to/from that owning client. (Being owned by a client means the actor is directly or indirectly owned by the client's PlayerPawn (UE1)/PlayerController. If walking up the "owner chain" does not end at a PlayerPawn/PlayerController that belongs to a client, then the actor is not owned by any client.)

Unreal Engine 3 no longer uses the replication block to specify replicated function conditions. Instead it provides function modifiers to specify the replication direction. The modifier server means if the function is called on the client owning the actor, the call should be replicated to the server, while the modifier client means the server should replicate the function call to the client owning the actor. Because it makes sense, the client modifier also implies the modifier simulated to ensure the function can definitely be executed on the client when it arrives. Another modifier is demorecording, which means the function should be replicated to the demo recording driver.

Calling replicated functions

If a replicated function is called and its replication condition is met, the call and all parameter values passed to it will be sent to the remote side immediately. If the condition isn't met, the function will be called locally instead. That means, if the executing actor does not have any Owner or the owner does not belong to any clients, the function call is evaluated as if the function isn't defined to be replicated. For functions replicated from a client to the server this usually means the function call is ignored because it lacks the simulated keyword. Calls from the server that stay on the server don't have such a "failsafe switch" and will cause the function to be executed there.

Note that while replicated functions are allowed to have a return type, the actual return value will be that type's null value if the function is replicated successfully. The code will not wait for the function to be executed and return a value, that's just not feasible for a game engine. If you want a replicated function to send by a value, you need to do that via a parameter of another replicated function that is sent in the other direction. Similarly if a replicated function has out parameters, their value will not change if the function is replicated. On the remote side, any out parameters or return values will be discarded when the function has finished.

Note that the parameters of replicated functions are subject to the same variable compression strategies as mentioned in the section about variable replication. Additionally, any parameter that is a null value is omitted from the replication data to save bandwidth. This goes only for entire parameter values, not for individual members of a struct used as a parameter type.

Reliability

Function call replication can be either "reliable" or "unreliable", which is specified by the keywords of the same name either in the replication condition in UE1/2 or as optional function modifier in UE3. If a function is marked as "reliable", the engine makes sure it is processed in the correct order in relation to other reliable network events, especially other reliable function calls. But also opening and closing an actor channel is a reliable event. In other words, provided they are called after the actor channel is opened by the server, reliably replicated function calls are guaranteed to be processed on the client while the actor exists and they are guaranteed top be processed in the same order as they were called on the server.

Why is the order important? I'll spare you the gory details, but we need to get a bit more technical to answer that. The Unreal Engine uses UDP to transmit its data. This protocol does not actually create a connection, but just sends packets to the target address. It doesn't even guarantee that the packets arrive, let alone that they arrive in the same order they were sent. Due to the way the internet works, different packets might takes different routes and overtake each other. They may get dropped somewhere or even get duplicated.

Sounds like a nightmare, but the lack of checks also has a big advantage. The TCP protocol would implement guaranteed order and data integrity, but all of its checks cause a lot of overhead and slow down transfers. That might not be a problem for file transfers (HTTP, FTP and the various mail protocols are built on TCP), but for a game where low response times are crucial, this would be a catastrophe. Thus the engine swallows the bitter pill and performs its own checks for dropped, duplicated and out-of-order packets. These checks are only performed for important things like opening/closing actor channel or reliable replicated function calls. Note that even reliable function calls might get lost when there's packet loss, but the calls that do arrive are guaranteed to be executed in the correct order.

Unreliable function calls on the other hand might not even get send if the connection is saturated. If they are sent, they are more likely to get lost, they could be duplicated or be called out of the correct order. If the ordering gets really bad, they may even arrive after their channel is already closed or before it was opened on the client, in which case they are dropped. In stock code, unreliable functions are used for things like replicating sounds, less important visual effects and (this may be surprising) player input. If one player input packet is lost, this usually isn't a great problem as the server extrapolates movement and the client has some freedom in correcting the server's extrapolation errors. Losing a jump or fire event may be a bit annoying, but the sheer amount of input packets causes unreliable replication to provide a huge advantage compared to reliable replication, including better response times. Duplicated and out-of-order packets are caught by a timestamp value in the function call, which allows the server to discard any obsolete updates.

So when deciding whether to make a function reliable or unreliable ask yourself the following questions: Is it really that bad if the function call gets lost underway or isn't received in the correct order? And if so, would the advantages of making it reliable outweigh the response time penalty caused by the engine ensuring the correct order?

Ok, what's with that "simulated" keyword?

Ah yes, that weird function modifier. In fact, it can also be applied to states to affect state code in the same way. Remember the talk about Role and RemoteRole and how they are exchanged on the client up in the first few sections of this article? Well, the simulated keyword, or actually the lack of it, is related to the value of the Role property. Actor instances (as opposed to static functions and non-actor objects) will execute code in their functions and states only if the actor's Role is higher than ROLE_SimulatedProxy or if the function or state is marked as simulated (or (native).

Offline and on a server all actors have ROLE_Authority as their Role value, and the same goes for "runtime actors" (remember? bStatic and bNoDelete both set to False) created on the client via the Spawn() function, i.e. not received through replication. Also bStatic or bNoDelete actors that are bClientAuthoritative don't get their roles exchanged on clients, and replicated actors that are "torn off" get their roles exchanged back to the original values, so these also have a Role of ROLE_Authority on the client.

Now the rule says "either simulated or Role higher than ROLE_SimulatedProxy", but ROLE_Authority is not the only role satisfying that rule. There's also ROLE_AutonomousProxy, which is used by the local PlayerController and its Pawn on the client. That is actually set as the RemoteRole value on the server, but replication magic downgrades it to ROLE_SimulatedProxy on other clients so it really only applies to the owning client.

On the other side, there are also ROLE_DumbProxy (at least in UE1/2) and ROLE_None. Remember how mapper-placed actors may end up with Role set to ROLE_None on clients? It just means you can't use replication on them, but nothing would prevent you from calling simulated functions on these actors, if they had any.