Difference between revisions of "Talk:Wizardry and Steamworks/Randomness, Entropy and Statistics"
Kira Komarov (talk | contribs) |
Kira Komarov (talk | contribs) |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 111: | Line 111: | ||
Kira Komarov 04:53, 15 January 2012 (PST) | Kira Komarov 04:53, 15 January 2012 (PST) | ||
---- | |||
llCloud and llWind are both deterministic but I can't help thinking the FPS of our deterministic physics engine, will also be deterministic and worse, we have the ability to manipulate the inputs. If we are going to use a system that is predictable we should use one that is harder to manipulate and provides a haystack to tease data from, hence my suggestion of using llCloud or llWind. | |||
I wasn't really considering that it would be constantly twiddled but only when you wanted to influence the lower bits of the mantissa based on estimations of how long the frame would take to run. The goal is to reduce the range of possibilities for what the output will be after the modulo. I'm put in mind of the attacks done against roulette wheels where they used custom built computers to guess which quadrant of the wheel the ball would land in. | |||
"How would you know if x people will attend?" You don't need to be a passive observer, flood the event with bots, max out the region. Makes me think of [http://xkcd.com/221/] oddly enough. | |||
OS gives more metrics which could be leveraged to address this problem. While I haven't studied the problem, I wouldn't feel comfortable writing an algorithm that mixes entropy sources, I'd worry that I was creating an attack surface. | |||
What I guess I'm getting at is Trust. If you don't trust the inputs how can you get a trustworthy answer? | |||
Random Ideas of little importance: | |||
* You could use an untrusted TRNG to shift where you read the weather data from (this doesn't feel like a great idea). | |||
* I know that multiplying by a non-power of two will trash the lowest bit in the mantissa (It's why my [http://forums-archive.secondlife.com/15/28/28006/1.html Float2Sci] function works the way it does). What I don't know (without doing the math) is if it will effect the probability of that bit being 1 or 0 but I think it does. Regardless depending upon how FPS is determined it may already have it's lowest bits trashed (1.0 / last_frame_time == fps?). | |||
** Instead of using a multiply to shift the digits into an integer, use [[User:Strife_Onizuka/Float_Functions#Float_.3C-Union-.3E_Integer|fui]] (it could be simplified since FPS is never negative and exceedingly unlikely to ever be in the renormalized range). | |||
-- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> | |||
---- | |||
'''llCloud and llWind are both deterministic but I can't help thinking the FPS of our deterministic physics engine, will also be deterministic and worse, we have the ability to manipulate the inputs. If we are going to use a system that is predictable we should use one that is harder to manipulate and provides a haystack to tease data from, hence my suggestion of using llCloud or llWind.''' | |||
First, are you mixing the "Physics FPS" with the "Sim FPS"? AFAIK, Linden claims they [http://lslwiki.net/lslwiki/wakka.php?wakka=llGetRegionFPS are not linked]. From what I understand, [[llGetRegionFPS]] returns the '''Sim FPS''', not the physics FPS. | |||
No. The frames per second is a measurement of how fast the region can render an object. The FPS value is not determined algorithmically as [[llCloud]] and [[llWind]] are, on the other hand, FPS is a a measured value. The problem is that your suggestion is broken on a ''theoretical'' level. All your numbers generated with [[llCloud]] or [[llWind]] ''can'' be reproduced at a later time - simply because it is deterministic. | |||
[http://wiki.secondlife.com/wiki/Category:LSL_Weather LSL_Weather] claims that the algorithm used is a 2D stable fluid simulation created by Jos Stam [http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/ns.pdf SIGGRAPH article]. If you open the article and read '''Method of Characteristics''', you will see that he explains how he calculates the velocity at any arbitrary time. | |||
Do you understand now what I mean when I say that it is deterministic? An adversary would be able to calculate all your values generated by [[llCloud]] or [[llWind]]. | |||
Here's another interesting quote from [http://wiki.secondlife.com/wiki/Category:LSL_Weather LSL_Weather]: | |||
''There used to be a way to interact with the wind (push it around) but one of the other developers quietly disabled that feature sometime after launch in June 2003, and I've never got around to putting it back in. Originally we were very concerned about how many CPU cycles the wind would use so it is dialed down to a fairly low resolution (16x16) simulation with only a few steps a second. Someday it would be nice to revisit the wind simulation code and enhance it, however looking at my busy schedule it would be a year or two away at least.'' | |||
Might be a no-go after all, in the long run if manipulating inputs is your concern. Then again, I do not see why one would want to create such a complicated PRNG - do you know the period of [[llFrand]]? I have no clue and as I see it, it is only warranted to create a new PRNG if it is better than [[llFrand]]. | |||
'''"How would you know if x people will attend?" You don't need to be a passive observer, flood the event with bots, max out the region. Makes me think of [http://xkcd.com/221/] oddly enough.''' | |||
That comment makes me think of this: | |||
<videoflash>mlovuo-5RZs</videoflash> | |||
If you get a brand new device and you smash it to bits with a baseball bat, it does not mean that the IPad does not work. At best, it means that it was not designed for that kind of stress. I am pretty sure [[FPSrand]] can be influenced yes. Then again you can easily crash regions in Second Life entirely... However, you would never be able to calculate the simulator FPS at any given time (well, except that it is capped to 45.0fps). | |||
Following your argument though, [http://xkcd.com/221/]. In that picture, the number 4 is indeed a random number, for those that do not have knowledge about it. The algorithm is merely a random number generator with a period of 1 in a domain of values [1, 1] in N. | |||
A more practical example, if you were to privately message me and say "Kira, please give me a number you can think of.", and I would tell you "4", it is pretty much a random number as far as you are concerned (the new part, last part of the article explains that). | |||
The connection between [http://xkcd.com/221/] and [[FPSrand]] eludes me though, given that the behaviour of [[FPSrand]] has already been measured: | |||
[[File:LlFrandvsFPSrand.png|thumb|800px|center]] | |||
'''OS gives more metrics which could be leveraged to address this problem. While I haven't studied the problem, I wouldn't feel comfortable writing an algorithm that mixes entropy sources, I'd worry that I was creating an attack surface.''' | |||
No. That's wrong. Multiple points of failure are better than one. That is why Second Life is a grid, instead of being a single big server. If one region crashes, then it does not take the others along with it. The OpenSIM metrics are also several points of failure: by harnessing a bit of all of them, you in fact decrease the possibility of all of them being influenced. To be honest, I understand you might influence the simulator FPS, however I am unsure how you would influence network packets as well (short of a DoS), or the number of primitives on a region, or the number of avatars in a region, or... Sorry, I don't understand that argument... | |||
EDIT: Got hammered by a nasty stomach bug yesterday. | |||
Suppose all the inputs in the table above: | |||
<pre> | |||
STATS_TIME_DILATION | |||
STATS_IMAGE_MS | |||
STATS_SIM_FPS | |||
.. etc ... | |||
</pre> | |||
are used to harness entropy in an equally weighted manner: that means, for example, we do not favour the value of STATS_TIME_DILATION over the value of STATS_SIM_FPS. Since there are 21 such inputs, we have some equally weighted combination of: | |||
<pre> | |||
I_{1}, I_{2}, I_{3}, ... ,I_{19}, I_{21} | |||
</pre> | |||
which represents our whole entropy. Now, let's assume that one of them can be compromised, say I_{2}. In that case, we consider that an adversary can induce an error by manipulating I_{2}. Thus, our final data has the error: | |||
<pre> | |||
ε = 1 / 21 = 0.0476 | |||
</pre> | |||
that is, one input out of 21 inputs generates errors. If we multiply by 100, we get a 4.7%. A 4.7% error is insufficient to influence the overall outcome of the statistics. | |||
Even for 1 input, we can take the curve from the frequency distribution above for [[FPSrand]] and apply a vertical bar on the y-axis to represent a 4.76% error. Even if we do that, even if you drag the whole curve down by 4.76%, the error falls well within the variance of the curve. | |||
'''What I guess I'm getting at is Trust. If you don't trust the inputs how can you get a trustworthy answer?''' | |||
The article is written under the assumption that [[llGetRegionFPS]] and the OpenSIM metrics are trustworthy. | |||
'''* You could use an untrusted TRNG to shift where you read the weather data from (this doesn't feel like a great idea).''' | |||
No, I would not combine a TRNG and a PRNG. Bad, bad idea indeed. Half the entropy (depends how much of the wind generated PRNG) could then be calculated. | |||
You are indeed ''the'' optimisation fan! :P Indeed, I have not thought too much optimising the algorithms. It will be done at a later date, I am sure about that. I promise that I will have a look at your references. :-) | |||
Kira Komarov 16:00, 17 January 2012 (PST) |
Latest revision as of 00:02, 18 January 2012
"There is no obvious reason for a region to have a cyclic FPS"
I can think of one: a periodic services used to backup the simulator state. It should cause a dip. I worry about the quality and quantity of bits available from llGetRegionFPS, choosing the wrong max could skew results dramatically. -- Strife (talk|contribs) 12:02, 13 January 2012 (PST)
Not convinced it is that obvious: but would make a nice project for some volunteers to test this out!
That is true however, there are many other concurrent processes running in parallel with a backup process which may alter the FPS in un-predictable ways. Similarly, just the usage of the simulator by the avatars would induce fluctuations. Additionally, as I take it GPU operations are split off from CPU driven operations (after all, that would be the point for an auxiliary processor).
I am unsure whether:
- there is a back-up process running or its periodicity
- whether it would provoke a massive and cyclic (truly periodic) fluctuation on the lower mantissa of the reported FPS number.
- whether the cumulative: avatars and tasks on the simulator, backup process and additional processes that may run on simulator servers
- the database for a simulator is not centralised and backed-up or if the simulator itself performs the backing up.
would lead to some true cyclic induced error: careful here, we're talking about an well-defined T, not just some random fluctuations.
Either way, we find it pretty clear that even avatars, provided they have rez / scripting permissions over the land may inject some true T periodicity in the FPSrand method: in the later, coming up (, when we have more time) demonstration, we will use a lag-pulse generator to influence the entropy.
So far, llGetRegionTimeDilation and llGetRegionFPS are the "best we have" candidates for some source of true entropy given what Second Life exposes on the API. However, OpenSIM users, would giggle and jump up and down because they have that stats-function mentioned in the article that returns stuff such as: In/Out Packets, Network Lag and pretty much everything else from the View -> Show Statistics - which is pretty cool. :-) The mentioned OpenSIM API function could be used to harness additional entropy: sort of like selecting a little of each and decreasing the chances that an adversary (under some restrictions) could influence ALL the sources of entropy at will.
For Second Life though, some exhaustive search of the functionality that is exposed by Linden on the API relating to simulator statistics would be great. Perhaps there might be some other function out there that may offer a potential source of entropy... At least we know that stuff like llWind is out of the question...
EDIT: ah yeah, for got this: the TRNGs work correctly in ideal conditions, of course. The real-life TRNGs, some of them, gather data from wind. Of course, if you put a blow-dryer directly on the detector, it will not be worth much anymore... :-P Hence the need for several sources (like OpenSIM), or at least not some gimmick with a blow-dryer and a bad attitude. xD
Kira Komarov 12:42, 14 January 2012 (PST)
Events hosted in the region (or any of the other regions run on the same server) would also generate scheduled lag.
As for making small modifications, I was thinking about flipping an object's STATUS_PHYSICS on and off, it should have a very small impact (least significant bit sort of thing, which is the most significant bit with regards to modulo).
The big difference between this and the commercially available ones is ease of third party manipulation. Your hair dryer attack is valid but can't be done by someone in Asia. I would personally be more comfortable with using llWind (both are extracting data from simulations), my reasoning being that you can't manipulate llWind easily (you might have to terraform the ground to do so and that requires land mod rights, big spinning objects have no effect). llWind and llCloud both are black boxes that would worry me. -- Strife (talk|contribs) 20:38, 14 January 2012 (PST)
No, that won't work. llWind and llCloud are computed values - they are predetermined by a fluid dynamics simulation. You cannot make a TRNG based on computed values or you will get a PRNG instead. You have to have measured data. Stuff that generates lag is also ok as long as it does not induce a period - regardless if it increases or decreases from time to time.
Let me put it differently: You cannot inject PRNG values into a TRNG or you will get a PRNG. With llWind and llCloud, if you do it that way, you will get a PRNG, no better than llFrand (perhaps a little): you need something measured, not something computed. Stuff like FPS, in-out packets, hard-disk movements are measured - you may influence them, but they are not computed.
Also flipping the status on and off will practically make an oscillator out of your object - that is something you want to generically avoid. If you meant for an attack, a pulse-oscillator, flipping a single object's physical status will not have that much of an outcome for a region: it is just one object and usually there are countless other objects on a region that influence the FPS. You need something that has a drastic impact on FPS, to offset all the other objects on a region.
As you said, the blowdryer attack cannot be performed by somebody that does not have direct access to my detector. However, if you use osGetRegionStats and take a little bit from each one of these:
STATS_TIME_DILATION | 0 | STATS_IMAGE_MS | 11 |
STATS_SIM_FPS | 1 | STATS_OTHER_MS | 12 |
STATS_PHYSICS_FPS | 2 | STATS_IN_PACKETS_PER_SECOND | 13 |
STATS_AGENT_UPDATES | 3 | STATS_OUT_PACKETS_PER_SECOND | 14 |
STATS_ROOT_AGENTS | 4 | STATS_UNACKED_BYTES | 15 |
STATS_CHILD_AGENTS | 5 | STATS_AGENT_MS | 16 |
STATS_TOTAL_PRIMS | 6 | STATS_PENDING_DOWNLOADS | 17 |
STATS_ACTIVE_PRIMS | 7 | STATS_PENDING_UPLOADS | 18 |
STATS_FRAME_MS | 8 | STATS_ACTIVE_SCRIPTS | 19 |
STATS_NET_MS | 9 | STATS_SCRIPT_LPS | 20 |
STATS_PHYSICS_MS | 10 |
Then an adversary would have to influence them ALL to mess up your entropy, and that practically reduces their chances to unplugging the computer - then again, with that sort of access, llFrand will not work either. They will have to influence:
network AND all other scripts on the region AND stuff like frame time AND pending uploads even AND time dilation AND number of ACTIVE primitives AND AGENT related measurements... etc...
So they will have to be able to control network, your rendering capabilities, how many agents are on your land, the activities that those agents perform, the number of objects at a given time on the region... We can already stop there, that is impossible, short off from turning the machine off.
The discussion is between deterministic systems and non-deterministic systems; if you want a short answer: a physics simulation like llWind or llCloud is deterministic, even if at any point in time it may be difficult to describe what state it is in. No matter how difficult it is, it will still be a PRNG. The number of avatars in a region is non-deterministic: you cannot vouch how many there will be at any point in time. It is not difficult, it is impossible. Even if you run a scheduled event, how would you know if 4, 10, or 7 people will attend?
EDIT: here's another example: for a high-traffic zone like an infohub, just counting the number of avatars (instead of using the FPS count) in the region is sufficient for a TRNG: it is non-deterministic. The number of pigeons in a city centre, at a certain point in time, during spring and summer, is also non-deterministic - although you may have to cut-off winter time periods if the birds migrate.
Kira Komarov 04:53, 15 January 2012 (PST)
llCloud and llWind are both deterministic but I can't help thinking the FPS of our deterministic physics engine, will also be deterministic and worse, we have the ability to manipulate the inputs. If we are going to use a system that is predictable we should use one that is harder to manipulate and provides a haystack to tease data from, hence my suggestion of using llCloud or llWind.
I wasn't really considering that it would be constantly twiddled but only when you wanted to influence the lower bits of the mantissa based on estimations of how long the frame would take to run. The goal is to reduce the range of possibilities for what the output will be after the modulo. I'm put in mind of the attacks done against roulette wheels where they used custom built computers to guess which quadrant of the wheel the ball would land in.
"How would you know if x people will attend?" You don't need to be a passive observer, flood the event with bots, max out the region. Makes me think of [1] oddly enough.
OS gives more metrics which could be leveraged to address this problem. While I haven't studied the problem, I wouldn't feel comfortable writing an algorithm that mixes entropy sources, I'd worry that I was creating an attack surface.
What I guess I'm getting at is Trust. If you don't trust the inputs how can you get a trustworthy answer?
Random Ideas of little importance:
- You could use an untrusted TRNG to shift where you read the weather data from (this doesn't feel like a great idea).
- I know that multiplying by a non-power of two will trash the lowest bit in the mantissa (It's why my Float2Sci function works the way it does). What I don't know (without doing the math) is if it will effect the probability of that bit being 1 or 0 but I think it does. Regardless depending upon how FPS is determined it may already have it's lowest bits trashed (1.0 / last_frame_time == fps?).
- Instead of using a multiply to shift the digits into an integer, use fui (it could be simplified since FPS is never negative and exceedingly unlikely to ever be in the renormalized range).
llCloud and llWind are both deterministic but I can't help thinking the FPS of our deterministic physics engine, will also be deterministic and worse, we have the ability to manipulate the inputs. If we are going to use a system that is predictable we should use one that is harder to manipulate and provides a haystack to tease data from, hence my suggestion of using llCloud or llWind.
First, are you mixing the "Physics FPS" with the "Sim FPS"? AFAIK, Linden claims they are not linked. From what I understand, llGetRegionFPS returns the Sim FPS, not the physics FPS.
No. The frames per second is a measurement of how fast the region can render an object. The FPS value is not determined algorithmically as llCloud and llWind are, on the other hand, FPS is a a measured value. The problem is that your suggestion is broken on a theoretical level. All your numbers generated with llCloud or llWind can be reproduced at a later time - simply because it is deterministic.
LSL_Weather claims that the algorithm used is a 2D stable fluid simulation created by Jos Stam SIGGRAPH article. If you open the article and read Method of Characteristics, you will see that he explains how he calculates the velocity at any arbitrary time.
Do you understand now what I mean when I say that it is deterministic? An adversary would be able to calculate all your values generated by llCloud or llWind.
Here's another interesting quote from LSL_Weather:
There used to be a way to interact with the wind (push it around) but one of the other developers quietly disabled that feature sometime after launch in June 2003, and I've never got around to putting it back in. Originally we were very concerned about how many CPU cycles the wind would use so it is dialed down to a fairly low resolution (16x16) simulation with only a few steps a second. Someday it would be nice to revisit the wind simulation code and enhance it, however looking at my busy schedule it would be a year or two away at least.
Might be a no-go after all, in the long run if manipulating inputs is your concern. Then again, I do not see why one would want to create such a complicated PRNG - do you know the period of llFrand? I have no clue and as I see it, it is only warranted to create a new PRNG if it is better than llFrand.
"How would you know if x people will attend?" You don't need to be a passive observer, flood the event with bots, max out the region. Makes me think of [2] oddly enough.
That comment makes me think of this: <videoflash>mlovuo-5RZs</videoflash>
If you get a brand new device and you smash it to bits with a baseball bat, it does not mean that the IPad does not work. At best, it means that it was not designed for that kind of stress. I am pretty sure FPSrand can be influenced yes. Then again you can easily crash regions in Second Life entirely... However, you would never be able to calculate the simulator FPS at any given time (well, except that it is capped to 45.0fps).
Following your argument though, [3]. In that picture, the number 4 is indeed a random number, for those that do not have knowledge about it. The algorithm is merely a random number generator with a period of 1 in a domain of values [1, 1] in N.
A more practical example, if you were to privately message me and say "Kira, please give me a number you can think of.", and I would tell you "4", it is pretty much a random number as far as you are concerned (the new part, last part of the article explains that).
The connection between [4] and FPSrand eludes me though, given that the behaviour of FPSrand has already been measured:
OS gives more metrics which could be leveraged to address this problem. While I haven't studied the problem, I wouldn't feel comfortable writing an algorithm that mixes entropy sources, I'd worry that I was creating an attack surface.
No. That's wrong. Multiple points of failure are better than one. That is why Second Life is a grid, instead of being a single big server. If one region crashes, then it does not take the others along with it. The OpenSIM metrics are also several points of failure: by harnessing a bit of all of them, you in fact decrease the possibility of all of them being influenced. To be honest, I understand you might influence the simulator FPS, however I am unsure how you would influence network packets as well (short of a DoS), or the number of primitives on a region, or the number of avatars in a region, or... Sorry, I don't understand that argument...
EDIT: Got hammered by a nasty stomach bug yesterday.
Suppose all the inputs in the table above:
STATS_TIME_DILATION STATS_IMAGE_MS STATS_SIM_FPS .. etc ...
are used to harness entropy in an equally weighted manner: that means, for example, we do not favour the value of STATS_TIME_DILATION over the value of STATS_SIM_FPS. Since there are 21 such inputs, we have some equally weighted combination of:
I_{1}, I_{2}, I_{3}, ... ,I_{19}, I_{21}
which represents our whole entropy. Now, let's assume that one of them can be compromised, say I_{2}. In that case, we consider that an adversary can induce an error by manipulating I_{2}. Thus, our final data has the error:
ε = 1 / 21 = 0.0476
that is, one input out of 21 inputs generates errors. If we multiply by 100, we get a 4.7%. A 4.7% error is insufficient to influence the overall outcome of the statistics.
Even for 1 input, we can take the curve from the frequency distribution above for FPSrand and apply a vertical bar on the y-axis to represent a 4.76% error. Even if we do that, even if you drag the whole curve down by 4.76%, the error falls well within the variance of the curve.
What I guess I'm getting at is Trust. If you don't trust the inputs how can you get a trustworthy answer?
The article is written under the assumption that llGetRegionFPS and the OpenSIM metrics are trustworthy.
* You could use an untrusted TRNG to shift where you read the weather data from (this doesn't feel like a great idea).
No, I would not combine a TRNG and a PRNG. Bad, bad idea indeed. Half the entropy (depends how much of the wind generated PRNG) could then be calculated.
You are indeed the optimisation fan! :P Indeed, I have not thought too much optimising the algorithms. It will be done at a later date, I am sure about that. I promise that I will have a look at your references. :-)
Kira Komarov 16:00, 17 January 2012 (PST)