Wizardry and Steamworks/Randomness, Entropy and Statistics

From Second Life Wiki
Jump to: navigation, search
KBnote.png Note: Some formulas used here are using the Mediawiki-maths system. They show up fine on the WaS wiki where we write our articles but, for now, they may not display properly on the Second Life wiki.

This work is done by Kira Komarov of the Wizardry and Steamworks group and signed [WaS-K].

Introduction

Information theory, especially applied to computer science, relies heavily on numerical methods. Most of the times, expressing a process or a simulation mathematically cannot be translated directly into code. Where mathematics is an abstract field, the programming field is a practical tool and the transition from abstract concepts to practical implementation falls back to logic and algorithms. However, that does not only apply to mathematics, but also to computer science itself: it is easy to formally say "let the be a b-tree" yet harder to implement the data structure itself, even if the formalities are clear.

This document will consist mainly in a collection of snippets or methodologies and will implement frequently-used and useful concepts using LSL and relying on the tools provided by Second Life.

PRNG vs TRNG

A pseudo-random number generator (PRNG) is an algorithm meant to create a series of numbers so that their order of appearance does not describe an easy to determine pattern (random). Pseudo is added at the start to indicate that the pattern can, in the worst case, be determined which makes the PRNG not a "true" random generator.

The concept of "true" is tricky for a random number generator: for example, since there is no periodicity established for transcendental numbers (such as pi or e), one valid source of pseudo-random numbers is simply calculating pi or e using an algorithm and extracting all the numbers, one after the other. Since pi is a already being calculated by super-computers, a whole bunch of numbers are known, so that does not make it a "true" random number generator. Another, more popular example, is using random generators that are proven to have a periodicity but with an immense period (for example, a popular one, the Mersenne twister has a periodicity of 2^19937-1. That means, that after you have extracted that amount of numbers, it will repeat itself. In contrast with the Mersenne twister, most common software packages, have a period of just 2^32. However, depending on your application, that is much more than you will ever need.

So what is a TRNG? A TRNG is an algorithm that gives you a number based on some phenomena (usually physical, chemical and so on) whose pattern cannot be predicted. If you do not believe that everything is everything else, and you do not believe that it is possible to predict how the wind blows, then one good example is perhaps an algorithm extracting numbers based on that. There are other advances done in quantum physics by measuring sub-atomic fluctuations that are said to not be predictable.

However, how would we implement a TRNG in Second Life. If you are a scripter, the last paragraph might have appealed to you and you might have gone directly to llWind with some great ideas. However, wait up a second: a TRNG is really a "true" random number generator and wind in Second Life is determined algorithmically - by some fluid dynamics, from what I understand. Thus, llWind might be a poor choice because the wind speed and even direction are given by a simulation itself.

Searching briefly through the API, we may stumble on a function called llGetRegionFPS and llGetRegionTimeDilation which are two functions that seem to be the most suitable to use for a TRNG entropy source. If you ever used the OpenSIM platform, osGetRegionStats is another OpenSIM-typical function that returns more than the Linden codebase allows us to about a simulator which may prove to be a good source of extraction for entropy. For now, let us go with what we have:

Let us draw three number using llGetRegionFPS and see what they contain. We use a simple script: <lsl> default {

   touch_start(integer num)
   {
       llOwnerSay((string)llGetRegionFPS());
   }

} </lsl>

and extract three values by touching the primitive the script is in three times:

45.019420 
45.004980 
44.916880

FPS may be influenced, as much as wind can be influenced by blowing into the detector or as quantum fluctuations can be influenced by dropping a large elephant over the particle. However, overall you cannot just decide how much FPS a region should have unless you cap the FPS to a certain value. This is currently the situation, all simulators are capped to have no more than 45 fps.

Now, we also know from the function prototype that the region FPS is capped at 45. One interesting remark is that it is feasible to assume that randomness increases as we go towards the right of the number. That is, the simulator's FPS will not vary greatly around the value "45": it might drop a bit over time but if you keep touching the object with the script above, you will notice that that value will stay almost the same. In fact, if you look at the numbers above, on the simulator that we have taken this data from, the 4 never changes, the 5 changes to a 4 and the next number after the float marker changes once out of three times.

Here is a sketch of what we mean:

        entropy decreases
      <------------

  4 5 . 0 1 9 4 2 0

  ---------->
     entropy increases

We use entropy in the sense of "more randomness", which, in fact, is the reverse of what entropy really is. Entropy, classically defined would mean "more information". However, since we are seeking randomness out we are not interested in any processes that generates any type of ordered data. We desire complete bogus data.

That is, the more you go to the right, the more un-predictable (and hence random) the numbers are bound to get.

We could chose to extract that 45 cap from the value of each draw if it really bothers us, but we decided to take some measurements first, by doing it the blatantly obvious way and we extracted 5000 numbers in the interval [0, 99) generated using our FPSrand:

<lsl gremlin="FPS rand"> ////////////////////////////////////////////////////////// // [K] Kira Komarov - 2011, License: GPLv3 // // Please see: http://www.gnu.org/licenses/gpl.html // // for legal details, rights of fair usage and // // the disclaimer and warranty conditions. // //////////////////////////////////////////////////////////

integer rounds = 0;

integer FPSrand(integer max) {

   integer r = (integer)(llGetRegionFPS() * 10000000.0) % max;
   if(max > 0) return r; else return -r;

}

default {

   state_entry()
   {
       llSetTimerEvent((1.0-llGetRegionTimeDilation())*10);
   }
   timer() {
       llOwnerSay((string)FPSrand(99));
       if(++rounds == 5000) {
           llSetTimerEvent(0);
           return;
       }
   }

} </lsl>

Since our brains are unable to tell how all those 5000 numbers are distributed, we used a PRNG such as llFrand to generate another 5000 random numbers in the (0, 99] range by using the trivial script:

<lsl> default {

   state_entry()
   {
       integer itra=5001;
       do {
           llOwnerSay((string)((integer)llFrand(99)));
       } while(--itra>1);
   }
   touch_start(integer total_number)
   {
       llSay(0, "Touched.");
   }

} </lsl>

Then, we went to the dreaded Excel and plotted a graph in order to compare the two random number generators. One of the most obvious thing to see here is that llFrand and FPSrand follow practically the same average distribution.

Figure 1: llFrand compared to FPSrand on 5000 draws in the [0,99) interval.

That is, if you look at the graph, you will notice that numbers are generated around the same height. On the graph, the height, the vertical axis on the left shows the number of times a certain number on the X axis was drawn. If you compare the two random generators, you will see that they both oscillate around the median value of 50: more precisely, if you draw a horizontal line at Y=50, you will cross both the llFrand and FPSrand curves the most number of times. There are no clusters or tendencies to prefer some value over the other, showing that FPSrand is a valid random number generator. The tendency line of FPSrand is almost congruent with the tendency line of llFrand showing no typical stray. This is the expected outcome.

So now, you are wondering why do both llFrand and FPSrand have a bigger density around the 50-times occurrence of each value. If you take a fair coin and flip it many times and note down the "heads" or "tails", you will notice that the average distribution is clustered around the 50 times median line. That means "heads" will occur 50% of the times and "tails" will occur the rest of 50% of the time.

Figure 2: Binary entropy function: tossing a coint, X axis represents the probability of a head or tail and the Y-axis represents one of the faces as 0 and the other as 1.

If you then graph your results based on the probability to score either "heads" or "tails", you will notice a peak right in the middle as you can see in Figure 2. The same result could be expected from throwing a 6-faced or similar die (plural of dice).

Coming back to our FPS-based TRNG, one thing to notice is that a TRNG cannot be used to generate random numbers in fast way. As you can see from the LSL code, we cap the draws from the [0,99) interval by limiting them to the time interval given by:

<lsl>

       llSetTimerEvent((1.0-llGetRegionTimeDilation())*10);

</lsl>

Avoiding the Bell

We do that to avoid poisoning the output data and requesting the region's FPS too fast and thereby creating repeated number entries. We have used this trick before to avoid capping systems and work out the minimal amount of wait-time before we can fire requests, instant messages and so on. These are all based on the probability distribution shown in Figure 2.

One of our ongoing projects requires sending instant messages and we want to send them as fast as possible but to also avoid the Linden hard-coded capping limits. Here is the problem that Linden gives us, taken from Limits:

The number of IMs an object can send in one hour is 5000.

So what is the minimum delay required so that I can send a message as fast as possible? Let us reformulate the problem: you have several sending instant messages and one object doing some synchronisation between them. In that case, how much must the synchroniser object wait before allowing an object to send an instant message? The answer, based on Figure 2, is simply a bit less (stress those capping system, huulaaa!) than:

<math> \frac{5000}{2} + \frac{500}{2} + \frac{50}{2} + \frac{5}{2} = 2777.5 </math>

times than the number of queued message.

More precisely, if you were to express this in LSL, using integers, one could use the following fast-recursive algorithm to compute the minimum amount of time required to pause between sending messages:

<lsl gremlin="Avoid capping systems - Integer calculation"> ////////////////////////////////////////////////////////// // [K] Kira Komarov - 2011, License: GPLv3 // // Please see: http://www.gnu.org/licenses/gpl.html // // for legal details, rights of fair usage and // // the disclaimer and warranty conditions. // //////////////////////////////////////////////////////////

// REQUIRES: integer, cap // a number representing the value of a certain cap // REQUIRES: integer, pad // some initial padding value you may want to pad with // PROVIDES: the random delay pool required to wait // between repeating a capped operation.

integer minCapDodgeTimeI(integer cap, integer pad) {

   if(cap < 1) return pad;
   pad += cap >> 1;
   cap /= 10;
   return minCapDodgeTime(cap, pad);

} </lsl>

Or, if you have some capping system that caps using floats: <lsl gremlin="Avoiding capping systems - Float calculation"> ////////////////////////////////////////////////////////// // [K] Kira Komarov - 2011, License: GPLv3 // // Please see: http://www.gnu.org/licenses/gpl.html // // for legal details, rights of fair usage and // // the disclaimer and warranty conditions. // //////////////////////////////////////////////////////////

// REQUIRES: float, cap // a number representing the value of a certain cap // REQUIRES: float, pad // some initial padding value you may want to pad with // PROVIDES: the random delay pool required to wait // between repeating a capped operation.

float minCapDodgeTimeF(float cap, float pad) {

   if(cap < 0.1) return pad;
   pad += cap / 2.0;
   cap /= 10.0;
   return minCapDodgeTimeF(cap, pad);

} </lsl>

And in order to call this function from the synchroniser, you could do something like:

<lsl> llSetTimerEvent(minCapDodgeTimeF(5000, 0) * queued_messages); </lsl>

where 5000 represents the cap value, 0 some initial pad and queued_messages the amount of messages waiting to be sent. The corresponding timer event handler could use llInstantMessage to send the messages.

Let us refer to our previous sketch and turn the world around a bit:

        no-errors
      <------------

  4 5 . 0 1 9 4 2 0

  ---------->
     errors

When dealing with capping systems, you will notice that the more precise your system is and the less precise their systems is, the more likely you are to trick the capping system into sending a message faster than what it is actually capped at. For example, during our tests we managed to send messages at:

<lsl> llFrand(2776) * queued_messages; </lsl>

That is, the calculated cap for sending messages would be calculated to be a pool of 2777.5 seconds. However, we do it at 27776 seconds without getting the cap system message. That is what call to play the precision of a system: the 1.5s error is within the error of the capping system.

Smashing FPSrand

One of the reasons that FPSrand is a true random number generator is that there is nothing pre-defined that would make the TRNG periodic. There is no obvious reason for a region to have a cyclic FPS - it is just some value of the moment.

llWind and llCloud are computed values - they are predetermined by a fluid dynamics simulation. You cannot make a TRNG based on computed values or you will get a PRNG instead. You have to have measured data. Additionally, we cannot inject PRNG values into a TRNG or the result will be a PRNG.

However, if we were to use osGetRegionStats and take a little bit from each one of these:

STATS_TIME_DILATION 0 STATS_IMAGE_MS 11
STATS_SIM_FPS 1 STATS_OTHER_MS 12
STATS_PHYSICS_FPS 2 STATS_IN_PACKETS_PER_SECOND 13
STATS_AGENT_UPDATES 3 STATS_OUT_PACKETS_PER_SECOND 14
STATS_ROOT_AGENTS 4 STATS_UNACKED_BYTES 15
STATS_CHILD_AGENTS 5 STATS_AGENT_MS 16
STATS_TOTAL_PRIMS 6 STATS_PENDING_DOWNLOADS 17
STATS_ACTIVE_PRIMS 7 STATS_PENDING_UPLOADS 18
STATS_FRAME_MS 8 STATS_ACTIVE_SCRIPTS 19
STATS_NET_MS 9 STATS_SCRIPT_LPS 20
STATS_PHYSICS_MS 10

Then an adversary would have to influence them ALL to mess up our entropy, and that practically reduces their chances to unplugging the computer - then again, with that sort of access, llFrand will not work either. An adversary will have to influence:

network AND all other scripts on the region AND the rendering capabilities AND pending uploads even AND time dilation AND number of active primitives AND agent related measurements AND any other parameter that we will be extracting entropy from.

Suppose all the inputs in the table above:

STATS_TIME_DILATION
STATS_IMAGE_MS
STATS_SIM_FPS
.. etc ...

are used to harness entropy in an equally weighted manner: that means, for example, we do not favour the value of STATS_TIME_DILATION over the value of STATS_SIM_FPS. Since there are 21 such inputs, we have some equally weighted combination of:

<math> I_{1}, I_{2}, I_{3}, ... ,I_{19}, I_{21} </math>

which represents our whole entropy. Now, let's assume that one of them can be compromised, say I_{2}. In that case, we consider that an adversary can induce an error by manipulating I_{2}. Thus, our final data has the error:

<math> \epsilon = \frac{1}{21} = 0.0476 </math>

that is, one input out of 21 inputs generates errors. If we multiply by 100, we get a 4.7%. A 4.7% error is insufficient to influence the overall outcome of the statistics.

Even for 1 input, we can take the curve from the frequency distribution above for FPSrand and apply a vertical bar on the y-axis to represent a 4.76% error. Even if we do that, even if we drag the whole curve down by 4.76%, the error falls well within the variance of the curve.

In that case, an adversary will have to be able to control networking, the rendering capabilities of the simulator, how many agents are on the current region, the activities that those agents perform, the number of objects at a given time in the region. We can already stop there, that is impossible, short off from turning the machine off.

In conclusion, FPSrand is definitely aperiodic and based of non-deterministic events, but it would be best combined with other entropy sources (the OpenSIM osGetRegionStats function would be a good place to start) so that no attacker could influence all the possible sources of entropy.

Based on FPSrand for Second Life, a trivial attempt to create an OpenSIM TRNG, without performing any testing, would thus be something like the following: <lsl> integer osTRNG(integer max) {

   list stats = osGetRegionStats();
   integer r = (integer)(llList2Float(stats, STATS_SIM_FPS) + 
               llList2Float(stats, STATS_PHYSICS_FPS) + 
               llList2Float(stats, STATS_TIME_DILATION) + 
               llList2Float(stats, STATS_ROOT_AGENTS) + 
               llList2Integer(stats, STATS_CHILD_AGENTS) + 
               llList2Integer(stats, STATS_TOTAL_PRIMS) + 
               llList2Integer(stats, STATS_ACTIVE_SCRIPTS) + 
               llList2Integer(stats, STATS_SCRIPT_LPS) * 10000000.0) % max;
   if(max > 0) return r; else return -r;

} </lsl>

The discussion is between deterministic systems and non-deterministic systems; for a compact answer: a physics simulation like llWind or llCloud is deterministic, even if at any point in time it may be difficult to describe what state it is in. No matter how difficult it is, numbers based off it will still be a PRNG. The number of avatars in a region is non-deterministic: we cannot vouch how many there will be at any point in time. It is not difficult, it is impossible. Even if we were to run a scheduled event, how could we know if 4, 10, 7 or 0 people will attend? We might assume there may be more than when an event is not scheduled, however it is impossible to determine how many there will be. It also may be that nobody shows up, or that the region is down or full.

Taking a counter-example, most random generators have a 2^32 period. That period is determined and inferred: more precisely, after 2^32 pulls off the generator, the numbers will repeat, whether you like it or not. That is a PRNG and also a deterministic system where we can most certainly infer that it will have a well-defined pattern.

Here's another example: for a high-traffic zone like an infohub, just counting the number of avatars (instead of using the FPS count) in the region is sufficient for a TRNG: it is non-deterministic. The number of pigeons in a city centre, at a certain point in time, during spring and summer, is also non-deterministic - although we may have to cut-off autumn and winter time periods if the birds migrate.

At a later time we will show how FPSrand can be bent at will, turning it temporarily into a periodic random number generator. We say temporarily because there is nothing wrong with a random number generator temporarily oscillating. A lot of processes oscillate in nature and even:

<lsl> list S = [ 0, 1, 2, 3, 4 ]; integer itra = 5; do {

 llList2Integer(S, llFrand(5));

} while(--itra>0); </lsl>

will give you at some point a set such as:

3, 1, 3, 1

for a certain number of pulls with: <lsl> do {

 llList2Integer(S, llFrand(5));

} while(--itra>0); </lsl>

Let us calculate how many iterations of (4-pulls) you will need in order to get the element:

<math> [ 3, 1, 3, 1] </math>

Suppose that the random generator is uniform (llFrand is, look at Figure 1), thus all the probabilities for the elements in the integer interval [0, 5) are equal:

<math> P(x) = \frac{100}{5} = 20% </math>

We know that the sample space is defined by the k-combination formula:

<math> \binom{ k } { n + k - 1 } = \frac{ (n + k - 1)! }{k!} </math>

Since we draw 4 numbers out of a set of 5 numbers, we obtain the sample space:

<math> \binom{ 5 }{ 4 } = \frac{( 4 + 5 - 1 )!}{4!} = \frac{40320}{24} = 1680 </math>

Thus, there are 1680 possible 3-combinations that we can draw, let us spell some of them out:

[ 0, 0, 0, 0 ]
[ 0, 0, 0, 1 ]
[ 0, 0, 0, 2 ]
[ 0, 0, 0, 3 ]
[ 0, 0, 0, 4 ]
[ 0, 0, 1, 0 ]
[ 0, 0, 2, 0 ]
...

Now, we want 1 outcome, notably:

<math> [ 3, 1, 3, 1 ] </math>

out of all the 1680 possible 4-combinations, the general probability to pull [ 3, 1, 3, 1 ] for just one round of: <lsl> do {

 llList2Integer(S, llFrand(5));

} while(--itra>0); </lsl> is given by:

<math> P(x) = \frac{1}{dim(4-S)} </math>

thus, substituting dim(4-S) with 1680 we get the probability of pulling [ 3, 1, 3, 1 ]:

<math> P ([3, 1, 3, 1]) = \frac{1}{1680} = 0.000595238095238 </math>

which is pretty grim. Nevertheless, that also means that one pull, out of 1681 pulls out of the set:

<math> S = ( 0, 1, 2, 3, 4 ) </math>

will be:

<math> [3, 1, 3, 1] </math>

which is pretty hopeful given that the computer can waste time on the iterations instead of us.

You might assume, if you were to see one particular isolated case such as <math>[ 3, 1, 3, 1]</math>, that our number generation algorithm does not work. The matter of fact is, it does up to at most 1680 iterations. If we happen to make a single pull, and get:

<math> [ 3, 1, 3, 1 ] </math>

that does not allow us to infer that our number generating algorithm is periodic just because we get a nicely ordered sequence [ 3, 1, 3, 1]. However, if we make 1681 pulls instead a single pull, we will observe that all the sub-sets repeat at some point:

[ 0, 0, 0, 0 ]
... later, after at least 1681 rounds, we will get a repetition of the same set:
[ 0, 0, 0, 0 ]
...

In the case of llFrand, it has a determined repetition period, just like our algorithm above except that for a period of 1681, it probably spans some 2^32 (which is a probably more than the distance from the earth than the distance to the closest star if that value represents kilo-metres). Coming back, after sufficient pulls, we will get repetitions.

In the case of FPSrand, we would not know where to start, since we measure the FPS off the region. There is nothing determined in that case.

One of the ideas to attack FPSrand would be to use an oscillator; some sort of pulse (periodic lag-bomb) that would generate a temporary significant decrease of the simulator's FPS with a precisely scheduled period between the pulses. By running such a pulsar over the simulator using FPSrand one could theoretically influence the periodicity of FPSrand. However, that might be harder to control than expected for sufficiently populated regions. Also, that would not make FPSrand generally periodic; it would be periodic just for the duration of the pulse experiment. And, as we have seen in our previous, personal, amazing 1681 period PRNG, an oscillating sequence such as [ 3, 1, 3, 1 ] is insufficient to allow us to infer anything.

Securing Entropy

As mentioned previously, the measured statistics are the best choice for a TRNG. However, it may be that you have concerns with large scale attacks on your TRNG. Here are some possible ways of dissimulating the possibilities of and adversary to influence the outcome of the TRNG, in case high security is a necessity:

  1. First precaution you should take, is to combine all the statistics in an equal ratio. For example, when combining the numbers between the statistics offered by: osGetRegionStats make sure that you do not favour one of them in particular. More than that, when combining the numbers, give favour to the statistics that seem the least influenceable.
  2. Based on the previous, you could use the LSL_Protocol/RestrainedLoveAPI perhaps to gather some entropy off your own viewer (in case the application is an attachment). In case an adversary were to influence the current region you are in, the entropy gathered off the viewer would still hold and the adversary would be able to control only half your entropy. For a full-control attack, where the adversary controls the outcome of your TRNG completely, the adversary would have to influence both the current region, as well as your own viewer. Given a strong privacy, it may be rather difficult for an attacker to find out your IP, let alone control your computer.
  3. Based on the previous, and as the fall-back case, some viewers are able to interact with your avatar more than what is exposed on LSL_Protocol/RestrainedLoveAPI. One example is the Phoenix bridge object that allows you to perform operations such as teleporting via double click with llMoveToTarget. Another example (our favourite) is a Singularity debug option, buried deep in the settings, which allows you to pretty much build your own LSL<->viewer bridge given some minor knowledge of C++ and LSL. In Singularity terms, they are the SHChat* debug options.