Cogito, ergo sum
Legacy:BruteForce
BruteForce is a programming language I (El Muerte TDS) specified. It's being compiled and executed from within the Unreal Warfare engine.
I thought it would be intresting if I explained how I designed it and how it works. Some things might be usefull for other people.
The latest source code of BruteForce is avilable from my CVS repository, module name: BruteForce
Or you can download a snapshot of the source code here: http://unreal.student.utwente.nl/Source/BruteForce-source.zip
Contents
Step 1: the specification[edit]
The most important part of a language, that can be a programming language or even a file format, is the specification. This is usualy done using EBNF format.
The EBNF specification defines how the source code should be written, if the source code doesn't match all the rules of the EBNF specification it can not be compiled.
Read more about the BruteForce EBNF specification
Step 2+3: the parser and compiler[edit]
Now you have to write the parser that checks if the input source conforms to your EBNF specification. The easiest way to do that is to use a tokenizer to retreive the significant blocks from the source file.
I've named by parser class compiler because it will also compile the source file to usefull data for the execution of the code. Usualy the parser will also compile the input source to an internal format, for later processing/execution. So that's why I will handle both at the same time.
The compiler will translate the input code to a Abstract Syntax Tree (Legacy:BruteForce/AST), the AST is a tree representation of the input code. This tree makes it easier to do something with the code. The main advatage of a tree over a stack machine is that you can execute tree nodes and always return to that node by reference, very usefull in while loops. Basically a tree works the same like a stack machine, but the jumps are just more friendlier to use than labels.
Your language has to be readable for humans, at least that is the nicest way. But you AST has to be easy to read for a machine. So what's the diffirent, well a machine wants to know what it has to do as fast as possible, it doesn't want to look ahead to see what it has to do, this is not efficient. This comes down to that your tree has to use a prefix notation, but mostlikely you designed your language to use infix.
x = 1 + 2 // infix = x + 1 2 // prefix
So during compiling you need to translate the infix notation to prefix.
The easiest way to create a parser for your language is to create a Wikipedia:Recursive descent parser, e.g. each non terminal is a new function. This might result in a lot of small functions or even function that just call another function, but that doesn't matter. Maybe in the future you want to extend your language and you will be happy that you did it this way.
Step 4: the checker (optional)[edit]
After you compiled your input source to a AST you may want to check if the content is correct, e.g. check for unresolved variables/functions/... or even do range checking or other content checking.
The parser/compiler only checks if the course is correctly used not if you didn't do stupid things. This is what the checker does.
The step is not required to do and I left it out in the current version. The only problem you get is that you program might crash or returns undesireable information.
The code of the checker is basically the same as that of the interpreter, it also walks through the AST, except that it doesn't interpret the code, it doesn't calculate the values. So run time errors (e.g. divide by zero) are not checked by the checker.
Step 5: the interpreter[edit]
When you have created the AST you can execute/interpret it and return a result. All the interpreter has to do is walk down the tree. For this it's also best to write a recursive descent parser, but note that it does not work the same way like with the parser/compiler. This one is much easier since you can see at the tree node where you have to go.
Unlike with the parser you do have to pass arguments to the diffirent functions, each function receives the current tree node to process. This way you can easily create a while loop by calling the same function with the same node.
Expressions return results, there are two common ways to pass these results up the tree to the node where they are used.
One way is to use a data stack where you push and pop the to/from. For this to work you have to create the logic for the stack administration (e.g. the push and pop). Using the static is what usualy is done in assambler languages (and finally in the CPU).
An other, modern, way is to return the results of a function, this might involve storing the results in local variables, but it's generaly easier to write and use (specially for debugging).
Step 5.1: the scope[edit]
While interpreting your AST you need to keep track of functions and variables used. You also might need to keep track of global and local definitions. For this you needs to create a class that keeps track of this. I've called it the scope. This stores the declarations and values and returns them when the interpreter requests it.
Step 6: combining it all[edit]
Now combine all seperate pieces into one thing that you can use.
Create a tokenizer ans pass it the source code. Create an AST and a Compiler and pass the compiler the tokenizer and AST. Create an interpreter and pass it the AST and your Scope.
Issues/things to think about[edit]
Type complete[edit]
I designed BruteForce to be type complete, this means that every type is treated equally, each type is automatically casted to an other type when needed.
Ofcourse this can result in intresting issues like what to do with operators. As a general rule the left side defines the type, ofcourse this will also be confusing sometimes.
// easy ones "some string" + "other string" = "some stringother string" "string" + 123.67 = "string123.67" 123 + "56" = 179 // a little bit difficult 123 + "weird stuff" = 123 + 0 = 123 // "weird stuff" is not a number so it's 0 true && "" = true && false // an empty string, or for int/float 0 equals to false // weird "string 1" - "string" = ?? // maybe " 1" "string" - 1 = ?? // maybe "strin" "string 1" * "string" = ?? // I've got no idea "string 1" * 2 = ?? // maybe "string 1string 1" "string 1" / "string" = ??
Function declarations[edit]
Function declarations are great, using the AST and Scope it's even very easy, when you reach a function declaration just add the declaration with type function and as value the node where the function begins.
When that function is called jou just lookup the function in the scope and execute the returned node.
Related articles[edit]
BruteForce documents[edit]
- Legacy:BruteForce/AST
- Legacy:BruteForce/Compiler
- Legacy:BruteForce/EBNF
- Legacy:BruteForce/Interpreter
- Legacy:BruteForce/Main
- Legacy:BruteForce/Scope
- Tokenizer
Other documents[edit]
- Wikipedia:Abstract syntax tree
- BNF and EBNF: What are they and how do they work?
- ANTLR - ANother Tool for Language Recognition
Discussion[edit]
Tarquin: very interesting. care to tell us what it's useful for – what advatages does this have over Uscript?
Mychaeel: Sounds like a case study to me, not anything with an actual intent or purpose. – How's the compiler's and interpreter's performance?
El Muerte TDS: Just started on it for fun, I am following a course about translaters and compilers at the uni at the moment and this makes it more intresting. As for the performance, the sample script on the Legacy:BruteForce/EBNF page takes about 50ms to compile and 170ms to execute (and calculate the first 10 dates). I think the most time is used by Legacy:BruteForce/Scope class, need to improve this somehow.
Zedsquared I think this is all great fun and doesn't need a reason at all :) El Muerte, why not go on to create a radical 3d debugging environment for code written in bruteforce seeing as how you've got this 3d engine sitting there doing nothing ATM... seems such a shame to restrict output to the log :)
One possible use for this: Write a 'core war 3D' mod where you can get together online, code up simple AI for some sort of combat actors in real time and get them battling... I was thinking along these lines for my GP stuff but the 'language' such as it is would only appeal to lisp hackers, bruteforce is much more friendly. All very geeky I know but I reckon the world needs more geeky games :)
Tarquin: Hey, I'm a mathematician! I know all about things which serve no purpose but are elegant! ;) ElM, do you want to write a short paragraph on this to submit to BU news?
El Muerte TDS: eek, an mathematician :D Uhm, this is news worthy ? Lemme use my excelent writing skills, ... uhm... uhm...
Tarquin: I can make something up if you like :) it's not newsworthy that I'm a mathematician... ;) but I think a language written in Uscrip is kinda cool... we should show it off!
hc: Could you set up a BruteForce download that doesn't require you to use cvs? (I'm behind a firewall...)
El Muerte: check the top of the page for a zip file containing a snapshot from the CVS
hc: Thanks!
MythOpus: Quick question: You've stated that it is being compiled and executed within the engine... Does that mean that one could write something up when you are running the game (whatever game/mod that would be) and compile it/execute it without getting out of the game. I hate how you have to write your code, compile it, start the game, exit the game, make changes, delete the .u file, re-compile and do it all over again. Would that be neccesary with this? And also, is BruteForce an alternative to UScript, in that you could write up a mutator and/or complete total conversion using BF for use with the Unreal Engine?
El Muerte: bruteforce is not an alternative to unrealscript. It's just a programming language written using unrealscript. You can not use it to write a mutator or total conversion (not unless you also add all required bindings between bruteforce and the unrealengine, but that's just a waste of time because you already have unrealscript).