Gah - a solution with more questions. – EntropicLqd
Legacy:AssociativeArray
When learning UnrealScript I found the idea of operator overloading very enticing. These pages describe the design and implementation of an associative array, providing (I hope) some insight into how UnrealScript operators are overloaded.
Contents
Overview[edit]
An Wikipedia:associative array is an array that is indexed by something other than a sequence of small integers. C++ has std::map, Perl has its hash, and JavaScript has its structures. The power of these structures is that using them is just like using a regular array; assuming that AA is a map from string to string, the following C++ code should make sense:
AA["pet"] = "dog"; AA["car"] = "jaguar"; AA["programming language"] = "UnrealScript"; cout << AA["programming language"] << endl; // C++ output statement
Associative arrays are powerful data structures; much of their power stems from their "natural" notation. UnrealScript's operator overloading provides a chance to implement associative arrays with a "natural" notation (for an appropriately loose definition of natural).
This proof of concept focuses on implementing an associative array indexed by strings and containing strings. There are three phases of implementation:
- Selecting a datastructure to associate key/value pairs of strings.
- Selecting operators to wrap the data structure in "natural" notation.
- Making iterators to traverse the contents of the associative array.
The remainder of this page discusses the high-level design of the each phase; the copiously commented code for each phase is included in its own subpage.
Data Structure[edit]
An associtive array associates keys drawn from some domain with values drawn from some domain. Thus it is a mapping from key space to value space.
Depending on speed requirements and restrictions placed on key and vlaue types, such a mapping could be implemented with a hard-coded function, a linear array of pairs, a haash table of pairs, or a tree of pairs. This is a representative rather than an exhaustive list.
A function is a good, compact representation if it is easy to hard-code the relationship between key and value types. Not an option for associative arrays.
A linear list of pairs could be implemented as a UT2K3 dynamic array of structs or objects. An unsorted list is easy to insert into but costly to look things up in; a sorted list is faster for lookup but more costly for insertion. The overall cost is just a little bit too high; besides, this is a chance to play with UnrealScript and that seems too easy.
A hash table is fast for lookup and insertion but a good hash function is difficult to design. Also, a hash table makes in-order traversal of the data difficult without another data structure.
A binary search tree is a good compromise with fairly fast insertion, fairly fast lookup, and easy ordered traversal (assuming the tree is close to balanced). Other, more complex, trees can make tight guarantees on performance. This proof of concept uses a simple BST
For demonstration purposes, we use strings as both keys and values. The only real restriction on the types is that key values must be comparable with < and ----. The ordering requirement can be relaxed with a linear list or hash table.
Legacy:AssociativeArray/DataStructure presents the code that implements the BST nodes and BST. The decision was made that an associative array is single valued.
Operators[edit]
After implementing an association between keys and values, it is necessary to wrap calls to the impelemntation functions in UnrealScript operators.
Diversion: Picking Operators[edit]
By preference, overloading either [] or () would make the associative array notation most natural. UnrealScript does not treat either pair of symbols as an operator so they cannot be overloaded. The UT dynamic arrays implementation used the << operator for assignment and paired the <> operators for refering to elements in the array. Treating the < and > operators as angle brackets produced a pleasing appearance as in
if (best<"movie"> == "Road Warrior") log("Programmer is too old.");
The only thing missing is an assignment operator. Using the left-shift operator breaks the "naturalness" of using the associative array as an array and the assignment operator (=) is not available for redefinition. >= (greater-than equals) is available and it can make for some very "natural" assignment statements:
best<"icecream">= "vanilla"; best<"vegetable">= "califlower"; best<"stooge">= "moe";
There cannot be a space between the > and the = (it has to parse as a single symbol for the operator look up to trigger) but that is a small price to pay for something that looks like assignment to an element in an array.
Now for the hard part: best<"icecream can be the prefix to either an assignment statement or an evaluation of a value in the associative array. In computer science speak, we want best<"icecream to represent either an l-value (left-hand side of an assignment) or an r-value (right-hand side of an assignemnt statement). Note that UnrealScript has no way of knowing, parsing left-to-right, what the following context of the given prefix is.
In C++ (and in Perl, under the hood) the problem is solved using references; the [] operator typically returns a reference to the element in the "array" and the compiler can convert a reference (really an l-value) into an r-value at need. UnrealScript has no reference type so that solution is not possible.
Inspired by the Dynamic_Array for UT, this solution uses a proxy class. In Dynamic_Array the author uses a special type to represent the indices for setting values; here the prefix expression (the array, the angle bracket, and the key string) will return a type that can behave as either an l-value or an r-value.
AAProxy serves this role by keeping track of a location inside an associative array. If it is followed by a > it returns the value stored at that location. If it is followed by a >= it sets the value at that location to the value of the expression following the operator. Yes, this is far, far too clever to be considered good code; in my humble opinion, it is also far, far too clever to be considered bad code.
Diversion: Operator Precedence[edit]
Both Operators and Scripting_Operators cover operator precedence. Scripting_Operators does leave the impression that when scripting an operator, the compiler pays attention to the precedence number you set for the operator. This may be the case when defining operators that are not already part of the system. It is most definitely not the case when defining (or should that be redefining) operators that already exist. So the >= operator has precedence 24 as defined in Object.uc and any version I define will have that same precedence no matter what number I put as its precedence.
This became a problem when defining the >= operator. It is supposed to be an assignment operator so it should bind very loosely. In particular, it should probably bind more loosely than $ so that the following line assigns "golden-delicious" to the associative array entry "apple":
AA<"apple">="golden"$"-delicious";
Unfortunately, the line assigns the value "golden" to the entry, concatenates "golden" with "-delicious" and throws away the value of the expression. The following parentheses are necessary:
AA<"apple">=("golden"$"-delicious");
Legacy:AssociativeArray/OperatorOverloading presents the code for the AAProxy and the demonstration commandlet. UnrealScript restricts the use of overloaded operators to the decendents of the class defining them; without access to Object, the associative array operators are defined in the commandlet. This limits the usefulness of the associative array.
Iterator[edit]
It is not possible to write an UnrealScript iterator in script alone. Without access to the engine source code, the AssociativeArrayIterator was designed to emulate the iterators in C++'s STL. So the iterator is a class that behaves like a pointer at a location in the associative array. The "only" way to get an iterator is through a call to an iterator generating function of an associated array: AssociativeArray::begin() or AssociativeArray::end().
begin() returns an AssociativeArrayIterator that refers to the first element in the array (if there is one) and end() returns an iterator that is one past the last element in the associative array (this is C++'s standard asymetry). With appropriately defined ++ and != operators for iterators, the following loop should visit every element in the associative array:
local AssociatveArrayIterator it; for (it = best.begin(); it != best.end(); ++i) { // visit element referred to by it here }
We need some manner of "visiting" the elements. One solution is to use two functions, one to get the key and the other to get the value at the location referred to by the iterator. This is done here with first() and second() so to log the key/value pairs, the following code works:
for (it = best.begin(); it != best.end(); ++i) { log(it.first() $ "/" $ it.second(), 'AssociativeArray'); }
This is nice. Could we use operators? Sure. Assume you only need the value (not the key) associated with each element. Then the following notation works:
for (it = best.begin(); it != best.end(); ++i) { log("Value: " $ <it> , 'AssociativeArray'); }
But what if we wanted to append "'s" to every value. Could it be done? With the mechanics we have already developed, having <it return an AAProxy, it can use the same r-value/l-value determination used for regular array access with iterators. So you can add the "'s" with the following code:
for (it = best.begin(); it != best.end(); ++i) { <it>= <it> $ "'s"; }
Legacy:AssociativeArray/Iterator presents the code for the iterator and its supporting stack. The actual algorithm for traversing a BST is described in the embedded comments.
Extensions[edit]
It may be hard to believe after looking at the amount written about this silly proof of concept but there is more that could be done. The operations on the array could be extended (but notation would have to be found), iterators could be extended (forward and reverse, iterator invalidation), and it could easily be made more efficient.
The BST implementation could be extended with a mechanism for removing nodes. It would be nice to come up with an obvious, natural notation for deleting a node. There is also no way to check whether a given key is in the associative array (no way to differentiate between a stored empty string and a key that is not present).
The iterator could be more closely associated with the associative array in such a way that the iterator could be invalidated (or corrected, perahps) when the associative array changes. Invalidation could be accomplished with a list of active iterators (iterators on the array) that is checked when the array changes.
Using new is the standard way of allocating binary search tree nodes. In UnrealScript it would be more efficient to maintain a dynamic array of BST nodes; this would be even more advantageous when deletion is possible. The associative array could keep a free list threaded through the dynamic array or it could compact the array on occasion (iterators would have to be invalidated in this case, too).
Related Topics[edit]
Discussion[edit]
El Muerte TDS: Nice concept, too bad that this adds a lot of overhead. Usualy it's better to just create a Key-Value struct and a dynamic array of this struct. The engine does support associative arrays, it's called Map. But there are no unrealscript bindings for it.
Bcladd: Of course this adds a great deal of overhead; all the preparation is done so the loop for traversing the structure appears "natural". Only argument against the straight dynamic array of key/value pairs is the cost of lookup and insertion (both O(n)). Actually, the implementation of associative array (and iterator) could be changed very easily and the operator notation still used. Note that I wrote this implementation before I had ever seen the insides of the engine.
El Muerte TDS: The cost of lookup and insertion for a dynamic array is less than using object references and function calls in unreal script. Functions have to be lookup in each object before they are executed.
Bcladd: Point taken: the constant cost of multiple (recursive) function calls from script outweights the asymptotic performance benefits of the binary tree for any realistic number of entries. This really is just a write up of my playing with operator overloading from script. Since I got great pleasure out of the UT dynamic array code, I thought someone might find the hackery here interesting.
Mychaeel: El Muerte – Without wanting to go into details at this time (and without claiming intimate knowledge of the Unreal engine), I am rather sure that function invocations in UnrealScript are constant-time and rather efficient at that. I expect that after loading an UnrealScript class into memory all function references in function calls are directly resolved as pointers to the C++ "UnrealScript function" objects that are stored as such in UnrealScript code packages; involving any sort of hash or array lookup in calling an UnrealScript function on engine level would strike me as needlessly inefficient.
Bcladd – It's a real pity I don't have the time to look into your code right now, but from what I've seen at a first glance I'm pretty intrigued by your approach. I'll certainly have a closer look when I've got my normal share of free time back.