Gah - a solution with more questions. – EntropicLqd

Legacy:Code Optimization

From Unreal Wiki, The Unreal Engine Documentation Site
Jump to: navigation, search

If your code runs slow or causes hitches on some computers, here are several optimization techniques you can apply to your code to make it run better.

You might be looking for...[edit]

Keep the log clean[edit]

Every line that is written to the log file takes time. Writing lots of them to the log may slow down your mod to a crawl and can inflate the game's log to several megabytes after a short time already.

Fix Accessed None and other log warnings 
Fix all Accessed None and other log warnings you find in the log after executing your mod. (You can safely assume that your code is responsible for any script warning you find in the log after executing it, even those that point to somewhere in Epic's code. Compare with the log when not playing your mod if in doubt.)
Remove debug logging 
Make sure to remove any debugging log statements in release versions of your mods. If you think you may need the log statements in future again, just comment them out.

Avoid iterators[edit]

Avoid using any iterators in frequently executed functions like Tick or PostRender because they are, for the most part, extremely slow.

Instead try using linked lists or dynamic arrays for the actors you work with. You could either fill these at the start of the match, e.g. with a single iterator loop, or maintain the list when spawning and destroying actors like the SpawnNotify in UT or the Interaction lists maintained by the InteractionMaster in UT2003.

Optimize iterator use[edit]

If you can't avoid using them, at least try to optimize them.

VisibleActors and VisibleCollidingActors are good examples of this. Every actor that is within the radius has a FastTrace called on it. Traces use a lot of CPU resources and if the radius is large you will be calling a lot of traces. You can spare yourself some traces by checking anything else than needs to be checked on top of the trace first.

For example, if you are looking for all visible pawns within 2000 units that have over 50 health, we can check the health before we check the trace – that way we don't bother doing the expensive visibility check if it was going to fail the health check anyway. For example:

// this way is slow
foreach VisibleActors(class'Pawn', P, Radius, Location, True)
  if (P.Health > 50)
    // do something
 
// this way is faster
foreach RadiusActors(class'Pawn', P, Radius, Location, True)
  if (P.Health > 50 && FastTrace(P.Location, Location))
    // do something

Another example of an even slower implementation that needs to check all actors, not just Pawns:

foreach VisibleCollidingActors(class'Actor', thisActor, Radius, Location)
{
    if (thisActor.bStatic || thisActor.Physics == PHYS_None)
        continue;  // skip this actor
    // do the actual stuff
}

Within a radius of e.g. 1500 you could easily find over 300 actors. The loop will execute a FastTrace (see Actor/Methods) for every single actor in the collision hash within the specified area. However, a lot of those actors are most probably static and/or have Physics == None. FastTrace requires much more time than the checks used within the loop, but those checks would catch almost as many actors. A simple optimization would be executing the FastTrace after the other two checks. This can be done by using the CollidingActors (still faster than RadiusActors with up to 2000 UU radius) iterator instead:

ForEach CollidingActors(class'Actor', thisActor, Radius, Location) {
    if ( thisActor.bStatic || thisActor.Physics == PHYS_None || !FastTrace(Other.Location, Location) )
        continue; // skip this actor
 
    // do the actual stuff
}

This will execute much faster, but why?

Imagine those 300 actors, let's say 200-250 of them are StaticMeshes placed in the map. Those are all static actors which will be caught by the first part of the if statement. Some other actors might be e.g. gibs lying on the floor. Those actors have Physics == PHYS_None and will be caught by the second part of the if statement. Typically over 90% of the actors will fail to pass those first two tests, leaving only about 30 actors for the FastTrace check in this example. This means we only have to do 30 FastTraces instead of 300. Now imagine you want to run this loop every Tick. A high number of FastTraces can slow down the game by 50% or even more, while about 10-30 of those Traces can only be noticed by checking the frame rate.

Disable engine events when you don't need them[edit]

Use the Disable function to deactivate certain engine events when you don't need them. (That only applies for events you provide an UnrealScript implementation for; if you haven't overwritten an engine event in your class, disabling it makes no difference.)

event PostBeginPlay
{
    // all events are enabled by default, so disable Tick event to start with
    Disable('Tick');
}
 
event Trigger(Actor Sender, Pawn Instigator)
{
    // tell engine to call Tick event from now on
    Enable('Tick');
}
 
event Tick(float DeltaTime)
{
    // do something -- executed only after the actor has been triggered
}

Using Disable and Enable is more efficient than using a bool variable and doing an UnrealScript-level check in the Tick function. (And it's more elegant as well.)

Re-use objects[edit]

Creating actors and objects is a relatively expensive operation. If you can, design your code so that you spawn an actor or object you need only once, save a reference to it and use it later again.

The object pool makes re-using non-Actor objects convenient and straightforward: Allocating an object of a given class either takes an existing one from the pool or automatically creates a new one if none exists yet; freeing an object doesn't destroy it but puts it into the pool.

Precache materials and static meshes[edit]

Precache any new materials or static meshes you use to avoid hitches when they're displayed the first time. Overwrite the UpdatePrecacheMaterials and UpdatePrecacheStaticMeshes functions (defined in Actor) to do that:

simulated function UpdatePrecacheMaterials()
{
    Super.UpdatePrecacheMaterials();
    Level.AddPrecacheMaterial(Texture'MyUserInterfaceTexture');  // hard-coded texture reference
    Level.AddPrecacheMaterial(MapperSpecifiedMaterial);          // specified by mapper in UnrealEd
}
 
simulated function UpdatePrecacheStaticMeshes()
{
    Super.UpdatePrecacheStaticMeshes();
    Level.AddPrecacheStaticMesh(StaticMesh'MyStaticMesh');
}

Optimize expressions[edit]

Place operators in an optimized order – this seems trivial but can be important for things that get called a lot, especially for replication statements.

Expressions in UnrealScript will terminate prematurely if applicable, so you can take advantage of this. Similar to the optimizations for iterators, if using "or" expressions, evaluate the most likely or least expensive things first. That way, if it is true, it doesn't have to waste time on things that usually will be false anyway. If it is an "and" expression, evaluate the least likely thing first – that way you won't pass one check only to get stopped by the second as often.

The only real exception to this is if it is necessary to avoid Accessed Nones – obviously it's more likely that a Controller's Pawn will have over 10 health than that a controller will not have a pawn at all, but for obvious reasons you need to confirm that the controller has a pawn before attempting to read a variable from it.

Avoid Nesting functions[edit]

Although it seems like it would use less memory to nest function calls, I've timed different variations on nested and non-nested calls, and a non-nested call consistently ran at twice the speed as a nested one:

// Runs relatively slow.
function int NestedFunction()
{
  return DoSomething( DoSomethingElse( DoEvenMore( 5 ) ) );
};
 
// Runs WAY faster.
function AFunction()
{
  local int Value;
 
  Value = DoEvenMore( 5 );
  Value = DoSomethingElse( Value );
  return DoSomething( Value );
};

I suspect this is an indication that the interpreter's "stack" is not used very efficiently.

Switch: With relatively simple functions I noticed something opposite - NestedFunction() was faster by about 10%. With relatively large functions I couldn't measure the difference. I was testing with stopwatch() in build 3355 commandlet.
Can someone post an example where the nested function calls are noticeably slower?

Use native functionality instead of UnrealScript code[edit]

Use native functions instead of scripted ones whenever possible.

UnrealScript runs a lot slower than the native functions do – it's usually better to use a native function instead of writing your own, even if the native function does a bunch of stuff you don't need. The wasted functionality is nothing compared to the added speed of native functions.

Execute code only as often as needed[edit]

Timeslicing less important calls in Tick or Timer can increase speed.

You could, for instance, use a boolean variable and store the previous deltatime so that a less critical function can be called only half the time, with the cumulative deltatime. You can also use an incrementing integer and a cumulative Deltatime float to call functions even less often. This makes the program look more complicated, unfortunately, but can decrease the strain on the CPU.

EntropicLqd: Under those circumstances couldn't you use the SetTimer(..) function to reduce the number of times the Timer() function is called?

Foxpaw: Yes, but you only have one Timer. If a superclass uses it you won't be able to use it unless your timing needs are the exact same as those of the superclass. Furthermore, you can only use it for one thing then. Say, for instance, you wanted to make your own physics system. You wanted to update location every tick, velocity every 3 ticks, and rotation every 5 ticks. Furthermore, you want to check collision only every 13 ticks. This would be a bit difficult to do in Timer.

Find bottlenecks by measuring execution time[edit]

You can use the Clock and UnClock functions (defined in Actor, moved to Object for DeusEx) to measure the time a part of your code spends executing, in milliseconds. Use this to find the sections of your code that require performance optimzation most urgently, and to compare different ways of doing something performance-wise.

function MyFunction()
{
    local float ExecutionTime;
 
    Clock(ExecutionTime);
    // do something
    UnClock(ExecutionTime);
 
 
    Log("Time spent executing something:" @ ExecutionTime @ "ms");
}

Clock is a very useful function when you know exactly what you want to measure. It is also handy because it returns the elapsed time to you, so you can use it to take averages, etc. You could also even use it for some kind of gameplay timer, though it sometimes "rolls over" which might not make it the best for that.

Clock uses CPU cycles rather than actual time. In most engines the script equivalent of clock\unclock automatically converts the value to seconds rather than CPU cycles. However you should never use clock\unclock for actual timing perposes, only use it for local performance testing (e.g. in case you change an algorithm and want to test if it's better or worse). CPU cycles roll over very often, thus it's useless for longer periods (a second is too much). Because of the over rolling the result after UnClock might be incorrect, so when you use it perform the test multiple times to get a proper value.

If you want to test a lot of code, adding a lot of clock, unclock, and log statements can be tedious. Enter Stopwatch. Stopwatch is a much more powerful function than Clock, but works very differently. The stopwatch is global, and it's either "on" or "off."

StopWatch( false ); starts the stopwatch. StopWatch( true ); disables and resets the timer. When you stop the timer, a line will be printed in the log stating something like: Time=41.768ms. This is the time that the stopwatch was at when it was stopped. This makes it appear to be very much like clock.. but the log statement that shows the stop time is not what you can really do with StopWatch.

The magic of stopwatch is that when it is running it timestamps log entries. Each is stamped with the time elapsed since the timer was started. It is great for finding out the time taken throughout a function, without writing a veritable pile of clock and unclock statements, as well as adding temporary variables for them, and the works! But I digress. Here's an example:

function SuperFantasticFunction()
{
  local int i;
 
  StopWatch( false );
 
  log( "Beginning Execution" );
  // ... some code goes here ...
  log( "Initialization Complete" );
  // ... more code ...
  log( "Precomputation Complete" );
  // ... yet more code ...
  log( "Entering Loop" );
  for (i=0;i<5;i++)
  {
    // .. something done in a loop
    log( "Iteration "$(i+1) );
  }
 
  log( "Loop Completed" );
  // ... some finishing code ...
  log( "Function Completed" );
 
  StopWatch( true );
}

Produces something like the following in the log: (except stopwatch actually has a few more digits of precision.)

  ScriptLog: 0.00 ms: Beginning Execution 
  ScriptLog: 0.12 ms: Initialization Complete
  ScriptLog: 0.45 ms: Precomputation Complete
  ScriptLog: 0.76 ms: Entering Loop
  ScriptLog: 1.02 ms: Iteration 1 
  ScriptLog: 1.35 ms: Iteration 2
  ScriptLog: 1.68 ms: Iteration 3
  ScriptLog: 1.92 ms: Iteration 4
  ScriptLog: 2.24 ms: Iteration 5
  ScriptLog: 2.24 ms: Loop Completed
  ScriptLog: 3.16 ms: Function Completed
  Time=3.16 ms

Unroll your loops[edit]

Setting up a loop can take extra time (especially if Mychaeel is correct and loop iterations are counted). In some cases, the number of iterations are known, and a loop doesn't need to be used at all:

for (int i = 0; i < 5; i++)
{
    //Some Code.
}

can be changed to

//Some Code.
//Some Code.
//Some Code.
//Some Code.
//Some Code.

Dante: What if the engine doesn't count loop iterations but counts every instruction ? Then you might save 3 called instructions with the unrolled loop. Sometimes it looks very ugly, leaving the for(i=0...5) solution the better one.

Sordith: We could second guess how the scripting engine works until we fill up the server's hard drive and not really get anywhere. I think I tested this one (along with all of the techniques I posted), but I can't remember the results. I'll test again later and post some sample numbers. You are correct that sometimes (read most of the time) it looks very ugly, and the time saved during execution may not be worth the time lost when working with the ugly code. I didn't mention that because the questions of what and when to optimize could easily take it's own page.

Speed comparison of loops[edit]

Different loops have different overhead times. Lets say we have following situation:

	// data, length = 100
	var array<vector> stuff;								
 
	// code to loop					
	stuff[i] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();

An obvious choice is for loop. Lets say that it takes 1 time unit to execute such function. For comparison function with unrolled loop takes 0.66 time units.

	function Test( int n )
	{
		local int i;
		for(i=0; i!=n; ++i)
		{
			stuff[i] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
		}
	}

While loop takes 0.93 time units. That's probably because the i variable is accessed not three but two times in each loop iteration.

	while( i!=n )
	{
		stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
	}

Until loop takes 0.90 time units. Equality operator or internal implementation may be faster.

	if( n!=0 )
	{
		do
		{
			stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
		}until( i==n )
	}

What about a goto loop? 0.93 time units, same as while loop.

	loop:
		if( i!=n )
		{
			stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
			goto 'loop';
		}

The fastest loop I tested was partially unrolled one - 0.76 time units. Choose how many times you want to duplicate the code (4-8 will do), loop couple times so the number of iterations is a multiply of chosen number and run the duplicated code in another loop.

	local int iterations, modulus;
 
	if( n != 0 )
	{
		modulus = n % 8;
		if( modulus != 0 ) 
		{
			do
			{
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
			}
			until( --modulus == 0 )
		}
 
		iterations = n / 8;
		if( iterations != 0 )
		{
			do
			{
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
				stuff[i++] = vect(1,0,0)*FRand() + vect(0,1,0)*FRand() + vect(0,0,1)*FRand();
			}
			until( --iterations == 0 )
		}
	}

Refine your algorithms[edit]

Different operations take different amounts of time. Generally speaking, addition, subtraction, assignment, and shifting take small amounts time. Multiplication takes a slightly longer time, while division takes the most time. Using floating point numbers also increases execution time. Integer multiplication and division by powers of 2 can be converted into shifts. Division by a floating point number can be converted into multiplication by a floating point number (x/0.5 == x * (1/0.5)).

Sordith: Don't like the way this one reads, but can't seem to spit it out more clearly.

Mychaeel: I don't believe UnrealScript's compiler optimizes constant subexpressions, so "1/0.5" does probably involve a division at run time. Also, while you're technically right of course, I don't believe that this technique has noticable impact on the execution speed of UnrealScript code – you have to assume that UnrealScript byte code is executed at least one order of magnitude slower than native code, and on top of that every operation in an expression translates to a function call with all of its overhead anyway (instead of being resolved to an inline operation).

Sordith: This is true. Optimising outside of a loop will rarely have noticable results. These techniques should be used where working with a large number of objects, or when doing calculations every frame, and probably not even then unless you need the extra time.

I was thinking more along the lines of:

local float mult = 1/0.5;
for (int i = 0; i < 1000; i++)
{
    a *= mult;
    b *= mult;
    c *= mult;
}

also, if you want to divide by a constant number, you can change it to multiplication by a constant number without the overhead of the first division or the temp variable.

The member selection operator has its price[edit]

By member selection operator I mean the dot used to access objects in other objects. Don't forget it's also a function and one that may slow down your calculations quite a lot. If the objects are constant from the loop point of view, you may cache them as mentioned above. You may also try to optimize by using additional calculations instead of the member selection operator.

In following example the member selection operator is used 900 times. Let's say that execution of this function took 1 time unit.

	var vector stuff[100];
 
	function Test( vector v )
	{
		local int i;
		for(i=0; i<100; ++i)
		{	
			stuff[i].x = v.x + v.x*(FRand()-0.5);
			stuff[i].y = v.y + v.y*(FRand()-0.6);
			stuff[i].z = v.x + v.z*(FRand()-0.7);
		}	
	}

Let's eliminate 600 uses of member selection operator (v members) by adding 3 local variables, 3 assignments, and 3 uses of member selection operator. At this point it takes 0.6 time units to execute the function.
The remaining 300 uses of member selection operator (stuff[i] members) can be eliminated by adding 300 multiplications by vector and 200 vector additions. After this optimization the execution time is 0.39 time units. That's over 2.5 times faster than the original example.

	function Test( vector v )
	{
		local int i;
		local float vx,vy,vz;
 
		vx=v.x;
		vy=v.y;
		vz=v.z;
 
		for(i=0; i<100; ++i)
		{	
			stuff[i] = vect(1,0,0) * (vx + vx*(FRand()-0.5))
					 + vect(0,1,0) * (vy + vy*(FRand()-0.6))
					 + vect(0,0,1) * (vz + vz*(FRand()-0.7));
		}
	}

Be careful with temporary variables[edit]

First of all, you should remove all unused variables from your functions. (The UT200x compiler will warn you about unused local variables.) They slightly increase package size and whether they also affect execution speed is yet to be proven. They definately make your code harder to read if you have lots of them.

Assignment is a pretty expensive operation in UnrealScript compared to reading variable values, especially with structs. If you are working with dynamic arrays and want to move an element to a new position with a higher array index you might come up with a solution similar to this if you are used to languages without dynamic arrays:

function MoveElementUp(int Source, int Destination)
{
  local int i;
  local float tmp;
 
  tmp = MyArray[Source];
  for (i = Source; i < Destination; i++)
    MyArray[i] = MyArray[i + 1];
  MyArray[Destination] = tmp;
}

Of source, this is a really bad implementation since it completely neglects the features of dynamic arrays. (You might have to use it for static arrays, though.)

A way better solution would be the following:

function MoveElement(int Source, int Destination)
{
  local float tmp;
 
  tmp = MyArray[Source];
  MyArray.Remove(Source, 1);
  MyArray.Insert(Destination, 1)
  MyArray[Destination] = tmp;
}

This implementation not only works for moving in both directions (up or down), but also completely drops the loop.

There's still a temporary variable though, which requires an additional assignment operation. Let's get rid of that as well:

function MoveElement(int Source, int Destination)
{
  if ( Source < Destination ) {  // move up
    MyArray.Insert(Destination + 1, 1);
    MyArray[Destination + 1] = MyArray[Source];
    MyArray.Remove(Source, 1);
  }
  else if ( Source > Destination ) { // move down
    MyArray.Insert(Destination, 1);
    MyArray[Destination] = MyArray[Source + 1];
    MyArray.Remove(Source + 1, 1);
  }
}

If the destination has a higher index than the source, the new element needs to be inserted at an index 1 higher than the desired index, because when removing the source element afterwards, all following elements' indices are decreased by 1. In the case of moving the element down, inserting the destination element increases the source element's index by 1.

Xian: I'll also assume that resetting temporary Object pointers to "None" will also help. Though I am not too sure about how UE handles memory management at such points.

Wormbo: Like the second paragraph in this section mentions: Assignment is a relatively expensive operation. Local variables (and thus any object references) cease to exist upon returning from the function they are declared in. The UnrealEngine 1 and 2 don't collect garbage at all during gameplay, only when the "Obj Garbage" console command is executed, which in regular gameplay only happens at mapchange and in UT2004 between Onslaught/Assault rounds. I don't think much was changed in UnrealEngine 3, but maybe the garbage collector can run asynchronously now. Anyway, it's probably still not worth the additional execution time to explicitly clear local object references. If you really have to create a lot of objects dynamically, consider using an object pool instead of creating temporary objects every time.

Related Topics[edit]