Difference between revisions of "User:Becky Pippen/Key Storage"
Becky Pippen (talk | contribs) (preliminary page in anticipation of script memory limits) |
Becky Pippen (talk | contribs) m |
||
Line 76: | Line 76: | ||
<lsl> integer hashSize = 6; | <lsl> integer hashSize = 6; | ||
llGetSubString(database, | llGetSubString(database, n*6, n*6 + hashSize-1)</lsl> | ||
In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list. | In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list. | ||
Line 89: | Line 89: | ||
integer hashSize = 6; | integer hashSize = 6; | ||
integer idx = llSubStringIndex(database, hashValue); | integer idx = llSubStringIndex(database, hashValue); | ||
if (idx % hashSize == 0) { | if (idx == -1) { | ||
return -1; | |||
} else if (idx % hashSize == 0) { | |||
return idx / hashSize; | return idx / hashSize; | ||
} else { | } else { |
Revision as of 12:57, 31 December 2009
Key/UUID Storage
Analysis of memory needed vs. memory used
Each key, or UUID, contains 128 bits of information which are stored in memory as 36 characters (32 hex digits plus four hyphens). If stored in a global list, there's an additional overhead of 12 bytes per element in classic LSO, or 34 bytes of overhead in Mono. That gives us a theoretical memory efficiency of 128/(8*(12+36)) = 33% for LSO, or 128/(8*(34+72)) = 16% for Mono.
Lossless key compression
If you need to store a lot of keys without losing any bits, see the suggestions at Key_Compression.
Lossy storage using hashes
If you need to store avatar names or keys just so the script can answer the question, "have I encountered this name or key before?" then consider storing only a hash of the name or key. Here is a convenient way to map all possible avatar names to 32-bit integers: <lsl> // Returns a 32-bit integer hash of the specified string
// integer hash(string s) { return (integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7)); }</lsl>
This uses the MD5 hash algorithm not for encryption or security, but for the way it evenly distributes all possible strings across the set of all possible integers.
Here is another hash function that maps the input string to a 32-bit hash space represented by a six-character base64 string:
<lsl> // Returns a six-character base64 hash of the specified string
// string hash(string s) { return llGetSubString(llIntegerToBase64((integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7))), 0, 5); }</lsl>
With any hashing method that reduces a large set of possible values to a smaller set, there is a chance of hash collisions. This is when two or more strings accidentally map to the same hash value. The chances of this happening is similar to birthday paradox and can be approximated as:
P(n) = 1 - e ^ -(n^2 / 2 * x)
where P(n) is the possibility of having at least one duplicate among n hash values where x is the number of possible hash values. For example, if we map avatar names or keys to the set of 32-bit integers, the probability of a duplicate among 1000 hashes is P(1000) = 1 - e ^ -(1000^2 / 2 * 2^32) = 0.00012. The probability increases like this:
For 32-bit hashes: n probability of a hash duplicate - ------------------------------- 1000 0.00012 10000 0.012 50000 0.25 100000 0.69
Storing the hashes as strings
Storing a bunch of hash values in an LSL list is not very efficient due to the overhead with every list element. Instead, consider concatenating a bunch of fixed-width hash values into a single string. For example, suppose you want to store avatar names and your hash function maps names to six-character strings as shown in the last hash function above:
<lsl> hash("Alli Avatar") returns "H2AyMw"
hash("Ally Avatar") returns "o+fBPA" hash("Bobby Avatar") returns "y1qWIw"</lsl>
Those can be concatenated and stored in a string variable:
<lsl>string database = "H2AyMwo+fBPAy1qWIw";</lsl>
For six-character hashes, the n-th hash can be extracted as:
<lsl> integer hashSize = 6;
llGetSubString(database, n*6, n*6 + hashSize-1)</lsl>
In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list.
Searching the list for a hash value can be done in a loop. Or it can be done with a single call to llSubStringIndex() at the risk of accidentally matching on an unaligned boundary:
<lsl> // Returns logical index [0..n) of the specified hash value, or -1 if not found:
// integer findHashInDatabase(string hashValue) { integer hashSize = 6; integer idx = llSubStringIndex(database, hashValue); if (idx == -1) { return -1; } else if (idx % hashSize == 0) { return idx / hashSize; } else { // oops -- matched on parts of two hashes -- here we could // fall back to a slower loop or recover in some other way . . . } }</lsl>
For even better compression, write a hash function that generates 30-bit numeric hash values, then store them as two Unicode characters each as described in this page.
For more ideas about script memory usage, see this checklist of things you can do.