Difference between revisions of "User:Becky Pippen/Key Storage"
Becky Pippen (talk | contribs) m |
Miuku Karu (talk | contribs) m |
||
Line 18: | Line 18: | ||
encountered this name or key before?" then consider storing only a hash of the name or key. Here is | encountered this name or key before?" then consider storing only a hash of the name or key. Here is | ||
a convenient way to map all possible avatar names to 32-bit integers: | a convenient way to map all possible avatar names to 32-bit integers: | ||
< | <source lang="lsl2"> // Returns a 32-bit integer hash of the specified string | ||
// | // | ||
integer hash(string s) | integer hash(string s) | ||
{ | { | ||
return (integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7)); | return (integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7)); | ||
}</ | }</source> | ||
This uses the [[MD5]] hash algorithm not for encryption or security, but for the way it | This uses the [[MD5]] hash algorithm not for encryption or security, but for the way it | ||
Line 31: | Line 31: | ||
represented by a six-character base64 string: | represented by a six-character base64 string: | ||
< | <source lang="lsl2"> // Returns a six-character base64 hash of the specified string | ||
// | // | ||
string hash(string s) | string hash(string s) | ||
{ | { | ||
return llGetSubString(llIntegerToBase64((integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7))), 0, 5); | return llGetSubString(llIntegerToBase64((integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7))), 0, 5); | ||
}</ | }</source> | ||
With any hashing method that reduces a large set of possible values to a smaller set, there is | With any hashing method that reduces a large set of possible values to a smaller set, there is | ||
Line 65: | Line 65: | ||
function maps names to six-character strings as shown in the last hash function above: | function maps names to six-character strings as shown in the last hash function above: | ||
< | <source lang="lsl2"> hash("Alli Avatar") returns "H2AyMw" | ||
hash("Ally Avatar") returns "o+fBPA" | hash("Ally Avatar") returns "o+fBPA" | ||
hash("Bobby Avatar") returns "y1qWIw"</ | hash("Bobby Avatar") returns "y1qWIw"</source> | ||
Those can be concatenated and stored in a string variable: | Those can be concatenated and stored in a string variable: | ||
< | <source lang="lsl2">string database = "H2AyMwo+fBPAy1qWIw";</source> | ||
For six-character hashes, the n-th hash can be extracted as: | For six-character hashes, the n-th hash can be extracted as: | ||
< | <source lang="lsl2"> integer hashSize = 6; | ||
llGetSubString(database, n*6, n*6 + hashSize-1)</ | llGetSubString(database, n*6, n*6 + hashSize-1)</source> | ||
In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list. | In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list. | ||
Line 83: | Line 83: | ||
a single call to [[llSubStringIndex]]() at the risk of accidentally matching on an unaligned boundary: | a single call to [[llSubStringIndex]]() at the risk of accidentally matching on an unaligned boundary: | ||
< | <source lang="lsl2"> // Returns logical index [0..n) of the specified hash value, or -1 if not found: | ||
// | // | ||
integer findHashInDatabase(string hashValue) | integer findHashInDatabase(string hashValue) | ||
Line 96: | Line 96: | ||
// oops -- matched on parts of two hashes -- here we could | // oops -- matched on parts of two hashes -- here we could | ||
// fall back to a slower loop or recover in some other way | // fall back to a slower loop or recover in some other way | ||
. . . | // . . . | ||
} | } | ||
}</ | }</source> | ||
For even better compression, write a hash function that generates 30-bit numeric | For even better compression, write a hash function that generates 30-bit numeric | ||
hash values, then store them as two Unicode characters each as described in | hash values, then store them as two Unicode characters each as described in |
Latest revision as of 23:16, 10 June 2016
Key/UUID Storage
Analysis of memory needed vs. memory used
Each key, or UUID, contains 128 bits of information which are stored in memory as 36 characters (32 hex digits plus four hyphens). If stored in a global list, there's an additional overhead of 12 bytes per element in classic LSO, or 34 bytes of overhead in Mono. That gives us a theoretical memory efficiency of 128/(8*(12+36)) = 33% for LSO, or 128/(8*(34+72)) = 16% for Mono.
Lossless key compression
If you need to store a lot of keys without losing any bits, see the suggestions at Key_Compression.
Lossy storage using hashes
If you need to store avatar names or keys just so the script can answer the question, "have I encountered this name or key before?" then consider storing only a hash of the name or key. Here is a convenient way to map all possible avatar names to 32-bit integers:
// Returns a 32-bit integer hash of the specified string
//
integer hash(string s)
{
return (integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7));
}
This uses the MD5 hash algorithm not for encryption or security, but for the way it evenly distributes all possible strings across the set of all possible integers.
Here is another hash function that maps the input string to a 32-bit hash space represented by a six-character base64 string:
// Returns a six-character base64 hash of the specified string
//
string hash(string s)
{
return llGetSubString(llIntegerToBase64((integer)("0x" + llGetSubString(llMD5String(s, salt), 0, 7))), 0, 5);
}
With any hashing method that reduces a large set of possible values to a smaller set, there is a chance of hash collisions. This is when two or more strings accidentally map to the same hash value. The chances of this happening is similar to birthday paradox and can be approximated as:
P(n) = 1 - e ^ -(n^2 / 2 * x)
where P(n) is the possibility of having at least one duplicate among n hash values where x is the number of possible hash values. For example, if we map avatar names or keys to the set of 32-bit integers, the probability of a duplicate among 1000 hashes is P(1000) = 1 - e ^ -(1000^2 / 2 * 2^32) = 0.00012. The probability increases like this:
For 32-bit hashes: n probability of a hash duplicate - ------------------------------- 1000 0.00012 10000 0.012 50000 0.25 100000 0.69
Storing the hashes as strings
Storing a bunch of hash values in an LSL list is not very efficient due to the overhead with every list element. Instead, consider concatenating a bunch of fixed-width hash values into a single string. For example, suppose you want to store avatar names and your hash function maps names to six-character strings as shown in the last hash function above:
hash("Alli Avatar") returns "H2AyMw"
hash("Ally Avatar") returns "o+fBPA"
hash("Bobby Avatar") returns "y1qWIw"
Those can be concatenated and stored in a string variable:
string database = "H2AyMwo+fBPAy1qWIw";
For six-character hashes, the n-th hash can be extracted as:
integer hashSize = 6;
llGetSubString(database, n*6, n*6 + hashSize-1)
In Mono, this uses 12 bytes of storage per avatar instead of 106 bytes per avatar if we stored UUIDs in a list.
Searching the list for a hash value can be done in a loop. Or it can be done with a single call to llSubStringIndex() at the risk of accidentally matching on an unaligned boundary:
// Returns logical index [0..n) of the specified hash value, or -1 if not found:
//
integer findHashInDatabase(string hashValue)
{
integer hashSize = 6;
integer idx = llSubStringIndex(database, hashValue);
if (idx == -1) {
return -1;
} else if (idx % hashSize == 0) {
return idx / hashSize;
} else {
// oops -- matched on parts of two hashes -- here we could
// fall back to a slower loop or recover in some other way
// . . .
}
}
For even better compression, write a hash function that generates 30-bit numeric hash values, then store them as two Unicode characters each as described in this page.
For more ideas about script memory usage, see this checklist of things you can do.