I love the smell of UnrealEd crashing in the morning. – tarquin

Legacy:MD5Hash

From Unreal Wiki, The Unreal Engine Documentation Site
Revision as of 09:48, 19 November 2007 by Sweavo (Talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Object >> MD5Hash (custom class)

A custom utility class for calculating Wikipedia:MD5 hashes of strings and dynamic byte arrays. It cannot be used in UnrealEngine1 games due to their lack of dynamic arrays.

This implementation is specially optimized for UnrealScript and probably won't perform well when ported to Java, C/C++ or other languages. Important performance optimizations include the use of class-global variables and the lack of out parameters, because those are not "by reference" in UnrealScript!

Note that the bottle neck of this implementation – string to byte conversion – still is there and there's no real workaround for that yet. Passing large strings around takes some time, and they always need to be passed at least to the Mid() function.

Usage

For now the class can only be used statically:

local array<byte> SomeData;
local string Result;
 
// fill SomeData with byte values
 
Result = class'MD5Hash'.static.GetArrayHashString(SomeData);
local string SomeData;
local string Result;
 
// fill SomeData with a string
 
Result = class'MD5Hash'.static.GetStringHashString(SomeData);

A new version is being worked on that allows the programmer to create an instance of the class as "MD5 context" for more performance-friendly handling of more complex values.

Methods

Static Methods

GUID GetArrayHash(array<byte> InData) 
Returns the MD5 hash of the specified byte array as a GUID.
string GetArrayHashString(array<byte> InData) 
Returns the MD5 hash of the specified byte array as a string.
GUID GetStringHash(string InData) 
Returns the MD5 hash of the specified string as a GUID.
string GetStringHashString(string InData) 
Returns the MD5 hash of the specified string as a string.
string GetHashString(GUID Hash) 
Returns the string representation of the MD5 hash.
string LittleEndianToHex(int i) 
Returns the little endian string representation of the integer value.

Code

/******************************************************************************
MD5 hash implementation for UnrealScript by Wormbo.
Feel free to modify and optimize for your needs.
******************************************************************************/
 
 
class MD5Hash extends Object;
 
 
/** @ignore */
var private GUID StaticHashValue;
/** @ignore */
var private array<byte> StaticData;
 
 
//=============================================================================
// Instant hash functions - probably not suitable for long input data
//=============================================================================
 
static final function GUID GetStringHash(string In)
{
  local int StrLen, i;
 
  StrLen = Len(In);
  default.StaticData.Length = StrLen;
  for (i = 0; i < StrLen; i++) {
    default.StaticData[i] = Asc(Mid(In, i, 1));
  }
  StaticProcessChunks();
  return default.StaticHashValue;
}
 
static final function string GetStringHashString(string In)
{
  local int StrLen, i;
 
  StrLen = Len(In);
  default.StaticData.Length = StrLen;
  for (i = 0; i < StrLen; i++) {
    default.StaticData[i] = Asc(Mid(In, i, 1));
  }
  StaticProcessChunks();
  return LittleEndianToHex(default.StaticHashValue.A)
      $ LittleEndianToHex(default.StaticHashValue.B)
      $ LittleEndianToHex(default.StaticHashValue.C)
      $ LittleEndianToHex(default.StaticHashValue.D);
}
 
static final function GUID GetArrayHash(array<byte> In)
{
  default.StaticData = In;
  StaticProcessChunks();
  return default.StaticHashValue;
}
 
static final function string GetArrayHashString(array<byte> In)
{
  default.StaticData = In;
  StaticProcessChunks();
  return LittleEndianToHex(default.StaticHashValue.A)
      $ LittleEndianToHex(default.StaticHashValue.B)
      $ LittleEndianToHex(default.StaticHashValue.C)
      $ LittleEndianToHex(default.StaticHashValue.D);
}
 
static final function string GetHashString(GUID Hash)
{
  return LittleEndianToHex(Hash.A) $ LittleEndianToHex(Hash.B)
      $ LittleEndianToHex(Hash.C) $ LittleEndianToHex(Hash.D);
}
 
 
//=============================================================================
// Internal stuff for static instant hashing functions
//=============================================================================
 
static final function string LittleEndianToHex(int i)
{
  const hex = "0123456789abcdef";
 
  return Mid(hex, i >> 4 & 0xf, 1) $ Mid(hex, i & 0xf, 1)
      $ Mid(hex, i >> 12 & 0xf, 1) $ Mid(hex, i >> 8 & 0xf, 1)
      $ Mid(hex, i >> 20 & 0xf, 1) $ Mid(hex, i >> 16 & 0xf, 1)
      $ Mid(hex, i >> 28 & 0xf, 1) $ Mid(hex, i >> 24 & 0xf, 1);
}
 
static final function StaticProcessChunks()
{
  local int i;
  local int A, B, C, D;
  local int W[16];
 
  i = default.StaticData.Length;
  if (i % 64 < 56)
    default.StaticData.Length = default.StaticData.Length + 64 - i % 64;
  else
    default.StaticData.Length = default.StaticData.Length + 128 - i % 64;
  default.StaticData[i] = 0x80;
  default.StaticData[default.StaticData.Length - 4] = (i >>> 29);
  default.StaticData[default.StaticData.Length - 5] = (i >>> 21);
  default.StaticData[default.StaticData.Length - 6] = (i >>> 13);
  default.StaticData[default.StaticData.Length - 7] = (i >>>  5);
  default.StaticData[default.StaticData.Length - 8] = (i <<   3);
 
  default.StaticHashValue.A = 0x67452301;
  default.StaticHashValue.B = 0xEFCDAB89;
  default.StaticHashValue.C = 0x98BADCFE;
  default.StaticHashValue.D = 0x10325476;
 
  while (default.StaticData.Length > 0) {
    for (i = 0; i < 16; i++) {
      W[i] = (default.StaticData[i * 4 + 3] << 24)
          | (default.StaticData[i * 4 + 2] << 16)
          | (default.StaticData[i * 4 + 1] << 8)
          | default.StaticData[i * 4];
    }
 
    // initialize hash value for this chunk
    A = default.StaticHashValue.A;
    B = default.StaticHashValue.B;
    C = default.StaticHashValue.C;
    D = default.StaticHashValue.D;
 
    // round 1
    A += (D ^ (B & (C ^ D))) + W[ 0] + 0xD76AA478;   A = B + (A <<  7 | A >>>  -7);
    D += (C ^ (A & (B ^ C))) + W[ 1] + 0xE8C7B756;   D = A + (D << 12 | D >>> -12);
    C += (B ^ (D & (A ^ B))) + W[ 2] + 0x242070DB;   C = D + (C << 17 | C >>> -17);
    B += (A ^ (C & (D ^ A))) + W[ 3] + 0xC1BDCEEE;   B = C + (B << 22 | B >>> -22);
 
    A += (D ^ (B & (C ^ D))) + W[ 4] + 0xF57C0FAF;   A = B + (A <<  7 | A >>>  -7);
    D += (C ^ (A & (B ^ C))) + W[ 5] + 0x4787C62A;   D = A + (D << 12 | D >>> -12);
    C += (B ^ (D & (A ^ B))) + W[ 6] + 0xA8304613;   C = D + (C << 17 | C >>> -17);
    B += (A ^ (C & (D ^ A))) + W[ 7] + 0xFD469501;   B = C + (B << 22 | B >>> -22);
 
    A += (D ^ (B & (C ^ D))) + W[ 8] + 0x698098D8;   A = B + (A <<  7 | A >>>  -7);
    D += (C ^ (A & (B ^ C))) + W[ 9] + 0x8B44F7AF;   D = A + (D << 12 | D >>> -12);
    C += (B ^ (D & (A ^ B))) + W[10] + 0xFFFF5BB1;   C = D + (C << 17 | C >>> -17);
    B += (A ^ (C & (D ^ A))) + W[11] + 0x895CD7BE;   B = C + (B << 22 | B >>> -22);
 
    A += (D ^ (B & (C ^ D))) + W[12] + 0x6B901122;   A = B + (A <<  7 | A >>>  -7);
    D += (C ^ (A & (B ^ C))) + W[13] + 0xFD987193;   D = A + (D << 12 | D >>> -12);
    C += (B ^ (D & (A ^ B))) + W[14] + 0xA679438E;   C = D + (C << 17 | C >>> -17);
    B += (A ^ (C & (D ^ A))) + W[15] + 0x49B40821;   B = C + (B << 22 | B >>> -22);
 
    // round 2
    A += (C ^ (D & (B ^ C))) + W[ 1] + 0xF61E2562;   A = B + (A <<  5 | A >>>  -5);
    D += (B ^ (C & (A ^ B))) + W[ 6] + 0xC040B340;   D = A + (D <<  9 | D >>>  -9);
    C += (A ^ (B & (D ^ A))) + W[11] + 0x265E5A51;   C = D + (C << 14 | C >>> -14);
    B += (D ^ (A & (C ^ D))) + W[ 0] + 0xE9B6C7AA;   B = C + (B << 20 | B >>> -20);
 
    A += (C ^ (D & (B ^ C))) + W[ 5] + 0xD62F105D;   A = B + (A <<  5 | A >>>  -5);
    D += (B ^ (C & (A ^ B))) + W[10] + 0x02441453;   D = A + (D <<  9 | D >>>  -9);
    C += (A ^ (B & (D ^ A))) + W[15] + 0xD8A1E681;   C = D + (C << 14 | C >>> -14);
    B += (D ^ (A & (C ^ D))) + W[ 4] + 0xE7D3FBC8;   B = C + (B << 20 | B >>> -20);
 
    A += (C ^ (D & (B ^ C))) + W[ 9] + 0x21E1CDE6;   A = B + (A <<  5 | A >>>  -5);
    D += (B ^ (C & (A ^ B))) + W[14] + 0xC33707D6;   D = A + (D <<  9 | D >>>  -9);
    C += (A ^ (B & (D ^ A))) + W[ 3] + 0xF4D50D87;   C = D + (C << 14 | C >>> -14);
    B += (D ^ (A & (C ^ D))) + W[ 8] + 0x455A14ED;   B = C + (B << 20 | B >>> -20);
 
    A += (C ^ (D & (B ^ C))) + W[13] + 0xA9E3E905;   A = B + (A <<  5 | A >>>  -5);
    D += (B ^ (C & (A ^ B))) + W[ 2] + 0xFCEFA3F8;   D = A + (D <<  9 | D >>>  -9);
    C += (A ^ (B & (D ^ A))) + W[ 7] + 0x676F02D9;   C = D + (C << 14 | C >>> -14);
    B += (D ^ (A & (C ^ D))) + W[12] + 0x8D2A4C8A;   B = C + (B << 20 | B >>> -20);
 
    // round 3
    A += (B ^ C ^ D) + W[ 5] + 0xFFFA3942;    A = B + (A <<  4 | A >>>  -4);
    D += (A ^ B ^ C) + W[ 8] + 0x8771F681;    D = A + (D << 11 | D >>> -11);
    C += (D ^ A ^ B) + W[11] + 0x6D9D6122;    C = D + (C << 16 | C >>> -16);
    B += (C ^ D ^ A) + W[14] + 0xFDE5380C;    B = C + (B << 23 | B >>> -23);
 
    A += (B ^ C ^ D) + W[ 1] + 0xA4BEEA44;    A = B + (A <<  4 | A >>>  -4);
    D += (A ^ B ^ C) + W[ 4] + 0x4BDECFA9;    D = A + (D << 11 | D >>> -11);
    C += (D ^ A ^ B) + W[ 7] + 0xF6BB4B60;    C = D + (C << 16 | C >>> -16);
    B += (C ^ D ^ A) + W[10] + 0xBEBFBC70;    B = C + (B << 23 | B >>> -23);
 
    A += (B ^ C ^ D) + W[13] + 0x289B7EC6;    A = B + (A <<  4 | A >>>  -4);
    D += (A ^ B ^ C) + W[ 0] + 0xEAA127FA;    D = A + (D << 11 | D >>> -11);
    C += (D ^ A ^ B) + W[ 3] + 0xD4EF3085;    C = D + (C << 16 | C >>> -16);
    B += (C ^ D ^ A) + W[ 6] + 0x04881D05;    B = C + (B << 23 | B >>> -23);
 
    A += (B ^ C ^ D) + W[ 9] + 0xD9D4D039;    A = B + (A <<  4 | A >>>  -4);
    D += (A ^ B ^ C) + W[12] + 0xE6DB99E5;    D = A + (D << 11 | D >>> -11);
    C += (D ^ A ^ B) + W[15] + 0x1FA27CF8;    C = D + (C << 16 | C >>> -16);
    B += (C ^ D ^ A) + W[ 2] + 0xC4AC5665;    B = C + (B << 23 | B >>> -23);
 
    // round 4
    A += (C ^ (B | ~D)) + W[ 0] + 0xF4292244;     A = B + (A <<  6 | A >>>  -6);
    D += (B ^ (A | ~C)) + W[ 7] + 0x432AFF97;     D = A + (D << 10 | D >>> -10);
    C += (A ^ (D | ~B)) + W[14] + 0xAB9423A7;     C = D + (C << 15 | C >>> -15);
    B += (D ^ (C | ~A)) + W[ 5] + 0xFC93A039;     B = C + (B << 21 | B >>> -21);
 
    A += (C ^ (B | ~D)) + W[12] + 0x655B59C3;     A = B + (A <<  6 | A >>>  -6);
    D += (B ^ (A | ~C)) + W[ 3] + 0x8F0CCC92;     D = A + (D << 10 | D >>> -10);
    C += (A ^ (D | ~B)) + W[10] + 0xFFEFF47D;     C = D + (C << 15 | C >>> -15);
    B += (D ^ (C | ~A)) + W[ 1] + 0x85845dd1;     B = C + (B << 21 | B >>> -21);
 
    A += (C ^ (B | ~D)) + W[ 8] + 0x6FA87E4F;     A = B + (A <<  6 | A >>>  -6);
    D += (B ^ (A | ~C)) + W[15] + 0xFE2CE6E0;     D = A + (D << 10 | D >>> -10);
    C += (A ^ (D | ~B)) + W[ 6] + 0xA3014314;     C = D + (C << 15 | C >>> -15);
    B += (D ^ (C | ~A)) + W[13] + 0x4E0811A1;     B = C + (B << 21 | B >>> -21);
 
    A += (C ^ (B | ~D)) + W[ 4] + 0xF7537E82;     A = B + (A <<  6 | A >>>  -6);
    D += (B ^ (A | ~C)) + W[11] + 0xBD3AF235;     D = A + (D << 10 | D >>> -10);
    C += (A ^ (D | ~B)) + W[ 2] + 0x2AD7D2BB;     C = D + (C << 15 | C >>> -15);
    B += (D ^ (C | ~A)) + W[ 9] + 0xEB86D391;     B = C + (B << 21 | B >>> -21);
 
    // add this chunk's hash to result so far
    default.StaticHashValue.A += A;
    default.StaticHashValue.B += B;
    default.StaticHashValue.C += C;
    default.StaticHashValue.D += D;
 
    default.StaticData.Remove(0, 64);
  }
}

Implementation Notes

Only the 5 least significant bits of the second operand of UnrealScript's bitshift operators are used. That means, for example (C >>> -30) is the same as (C >>> 2).

The implementation avoids the unneccessary circular variable assignments and additional function calls the original MD5 implementation in RFC 1321 and on Wikipedia suggest. While Wikipedia's very efficient notation of the main loop (see below) is very small, the high number of assignments make it quite slow in UnrealScript.

   for i from 0 to 63
       if 0 ≤ i ≤ 15 then
           f := (b and c) or ((not b) and d)
           g := i
       else if 16 ≤ i ≤ 31
           f := (d and b) or ((not d) and c)
           g := (5*i + 1) mod 16
       else if 32 ≤ i ≤ 47
           f := b xor c xor d
           g := (3*i + 5) mod 16
       else if 48 ≤ i ≤ 63
           f := c xor (b or (not d))
           g := (7*i) mod 16

       temp := d
       d := c
       c := b
       b := ((a + f + k(i) + w(g)) leftrotate r(i)) + b
       a := temp

Another difference from the original implementation suggestions are the calculations on the first and second round. This algorithm implements the optimizations suggested by Wikipedia: d xor (b and (c xor d)) instead of (b and c) or ((not b) and d) for the first round and c xor (d and (b xor c)) instead of (d and b) or ((not d) and c) for the second round. This results in a lower number of operations without changing the result.

Related Topics

  • SHA1Hash – a similar class for calculating SHA1 hashes
  • Wikipedia:MD5 – Wikipedia article about the MD5 hash function