Difference between revisions of "User:Becky Pippen/LSL Performance"
Becky Pippen (talk | contribs) (added Mono CIL; reformatted; in preparation for relocation from User:Becky_Pippen) |
m (typo - just easier to use template) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
With the move to virtualized class 7 servers and changes to the Mono virtual machine, these performance | |||
measurements made on old class 5 servers are no long very meaningful. Certainly the absolute timings are different now, | |||
and probably the relative timings are different too. The generated CIL code shown below might still be the same | |||
(I haven't checked), but that's about all that's left in this page that's relevant. So, this page is | |||
obsolete and will disappear soon. | |||
==LSL Timings and Rates== | ==LSL Timings and Rates== | ||
Here are some selected LSL function times and event rates comparing scripts compiled and run using the old LSL toolset (a.k.a LSO), and using Mono. | Here are some selected LSL function times and event rates comparing scripts compiled and run using the old LSL toolset (a.k.a LSO), and using Mono. | ||
Line 189: | Line 196: | ||
:Notes: verified and updated for server ver. 1.34.2.142087 | :Notes: verified and updated for server ver. 1.34.2.142087 | ||
=== | ==={{LSLGC|CSV}} [[string]] conversions=== | ||
:{{{!}} {{Prettytable|style=margin-top:0; font-size:100%}} | :{{{!}} {{Prettytable|style=margin-top:0; font-size:100%}} | ||
{{!}}- | {{!}}- |
Latest revision as of 17:28, 29 December 2011
With the move to virtualized class 7 servers and changes to the Mono virtual machine, these performance measurements made on old class 5 servers are no long very meaningful. Certainly the absolute timings are different now, and probably the relative timings are different too. The generated CIL code shown below might still be the same (I haven't checked), but that's about all that's left in this page that's relevant. So, this page is obsolete and will disappear soon.
LSL Timings and Rates
Here are some selected LSL function times and event rates comparing scripts compiled and run using the old LSL toolset (a.k.a LSO), and using Mono.
The basic test framework for these tests is a loop like shown below, where we take the difference in llGetTime() before and after the loop:
while (--i >= 0) { f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); f(); }
The loop is partially unrolled to make the loop control structure overhead negligible. We take several measurements in multiple Class 5 sims that are running consistently with a total frame time under 10 ms, where each test runs for 30 - 120 minutes, or longer if we fall asleep during the test. We measure several times during a 24-hour period until the numbers converge with a variance of 10% or less.
Measurements are taken in at least three regions, because we've occasionally seen regions with lots of reported spare time per frame, yet scripts seem to run inexplicably slow. If two out of three regions converge and agree, the anomalous region is ignored. Consider the numbers reported on this page to be maximums possible in cooperative regions.
In a Mono test, the result of the first run after script reset is ignored to eliminate the overhead of the VM initialization, JIT compilation, etc.
The Mono CIL shown for some of the LSL code below was obtained by modifying an SL viewer so that it calls lscript_compile() when saving a script. (Scripts are compiled on the simulators nowadays, but Linden Lab has kindly left the script compiler code in the open source viewer source code.) Comments and formatting were added manually.
Note that these are stopwatch times. How rapidly you can call a function or how often an event handler is invoked doesn't necessarily correspond to how much those functions and events lag a sim because of all the throttling and scheduling going on behind the scenes inside the run-time engine.
Class 7 hosts are starting to appear on the main grid as reported here and here. As Linden Lab moves toward classless regions, absolute timings like shown on this page will become less meaningful. Maybe we can make use of some sort of relative performance measurements in the future.
Functions, operators, & control structures
Floating point assignment
x = PI; where x is a global float
LSL: 0.11 ms/call *Mono: 0.0007 ms/call *Mono CIL: .field public float32 'x' // float x; . . . ldarg.0 // push 'this' pointer ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI stfld float32 LSL_script::'x' // pop and store 32-bit x ldarg.0 ldfld float32 LSL_script::'x' pop
- Notes: Verified and updated for server version 1.34.2.142087.
Call to empty f() { }
f(); where f() is defined as f() { }
LSL: 0.33 ms/call *Mono: 0.0026 ms/call *Mono CIL: Function definition:
.method public hidebysig instance default void 'gf'() cil managed { .maxstack 500 ret }
Caller:
ldarg.0 // push 'this' pointer call instance void class LSL_script::'gf'()
for loop overhead
- This test measures the overhead of a for loop that iterates an empty block just once.
for (i = 0; i == 0; ++i) { } LSL: 0.07 ms/loop *Mono: 0.001 ms/loop *Mono CIL: // loop init: i = 0 ldc.i4.0 // push 32-bit 0 stloc.s 0 // pop and store in i ldc.i4 0 // push 32-bit 0 dup // duplicate 0 on the stack stloc.s 0 // pop and store 0 in i pop // cleanup LabelTempJump0: // top of loop: test i == 0 ldc.i4 0 // push 32-bit 0 ldloc.s 0 // push i ceq // compare i and 0 brfalse LabelTempJump1 // branch if i != 0 // bottom of loop: ++i ldloc.s 0 // push i ldc.i4.1 // push 32-bit 1 add // pop i and 1, add, push result dup // duplicate result stloc.s 0 // save result in i pop // cleanup br LabelTempJump0 LabelTempJump1: ret
State change
state a; LSL: 0.14 ms/transition *Mono: 22.2 ms/transition *Mono CIL: ldarg.0 // push 'this' pointer ldstr "state2" call instance void class [LslUserScript] \ LindenLab.SecondLife.LslUserScript::ChangeState(string) ret
- Notes:
- State changes in Mono are tied to the sim frame rate of 45 fps (per Vektor Linden). The test code for state changes is an outer loop around 16 state changes like this:
- . . .
- state stateN { state_entry() { state stateN+1; } }
- state stateN+1 { state_entry() { state stateN+2; } }
- . . . etc.
Vector operations
v = <PI,PI,PI> + <PI,PI,PI> * PI; LSL: 0.43 ms/statement *Mono: 0.025 ms/statement *Mono CIL: ldarg.0 // push 'this' pointer ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector x) ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector y) ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector z) ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI multiplicand call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ 'CreateVector'(float32, float32, float32) call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ 'Multiply'(float32, class [ScriptTypes]LindenLab.SecondLife.Vector) ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ 'CreateVector'(float32, float32, float32) call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ 'Add'(class [ScriptTypes]LindenLab.SecondLife.Vector, \ class [ScriptTypes]LindenLab.SecondLife.Vector) // save result in v: stfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'v' ldarg.0 // push 'this' pointer ldfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'v' pop
llAbs()
llAbs(); LSL: 0.41 ms/statement *Mono: 0.012 ms/statement *Mono CIL: ldc.i4 0 // push the integer argument call int32 class [LslLibrary]LindenLab.SecondLife.Library::'llAbs'(int32)
- Notes: verified and updated for server ver. 1.34.2.142087
CSV string conversions
llCSV2List(llList2CSV(x)); LSL: 3.2 ms/call *Mono: 4 - 7 ms/call *Mono CIL: ldarg.0 // push 'this' pointer ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'x' call string class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llList2CSV'(class [mscorlib]System.Collections.ArrayList) call class [mscorlib]System.Collections.ArrayList class \ [LslLibrary]LindenLab.SecondLife.Library::'llCSV2List'(string)
- Notes: So far, the Mono results vary too much for any meaningful result.
llDetectedName()
llDetectedName(); LSL: 1.1 ms/call *Mono: 0.8 ms/call *Mono CIL: ldc.i4 0 // push argument call string class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llDetectedName'(int32)
llGetInventoryName()
llGetInventoryName(INVENTORY_TEXTURE, 0); LSL: 0.52 ms/call *Mono: 0.21 ms/call *Mono CIL: ldc.i4 0 // push INVENTORY_TEXTURE ldc.i4 0 // push 2nd argument call string class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llGetInventoryName'(int32, int32)
llGetObjectDetails
llGetObjectDetails(k, [OBJECT_OWNER]); LSL: 1.3 ms/call *Mono: 1.7 ms/call *Mono CIL: ldarg.0 // push 'this' pointer // push k: ldfld valuetype [ScriptTypes]LindenLab.SecondLife.Key LSL_script::'k' ldc.i4 6 // push OBJECT_OWNER box [mscorlib]System.Int32 call class [mscorlib]System.Collections.ArrayList \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ CreateList() call class [mscorlib]System.Collections.ArrayList \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ Prepend(object, class [mscorlib]System.Collections.ArrayList) call class [mscorlib]System.Collections.ArrayList \ class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llGetObjectDetails'(valuetype \ [ScriptTypes]LindenLab.SecondLife.Key, \ class [mscorlib]System.Collections.ArrayList)
- Notes: Verified and updated for server ver. 1.34.2.142087
llGetPos()
llGetPos(); LSL: 0.39 ms/call *Mono: 0.025 ms/call *Mono CIL: call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llGetPos'()
- Notes: Verified and updated for server ver. 1.34.2.142087
llList2Vector()
llList2Vector(x, 0); where x = [ ZERO_VECTOR ]
LSL: 0.55 ms/call *Mono: 0.012 ms/call *Mono CIL: ldarg.0 // push 'this' pointer // push list x: ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'x' ldc.i4 0 // push 2nd argument: 0 call class [ScriptTypes]LindenLab.SecondLife.Vector \ class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llList2Vector'(class \ [mscorlib]System.Collections.ArrayList, int32)
llMD5String()
llMD5String(s,0); where s is a 64-char string
LSL: 1.4 ms/call *Mono: 1.4 ms/call *Mono CIL: ldarg.0 // push 'this' pointer ldfld string LSL_script::'s' // push string s ldc.i4 0 // push 2nd arg: 0 call string class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llMD5String'(string, int32)
llMessageLinked()
llMessageLinked(LINK_THIS, 0, "", NULL_KEY); One consumer script: LSL: 1.5 ms/call *Mono: 0.9 ms/call Two consumer scripts: LSL: 1.8 ms/call Mono: 1.2 ms/call Four consumer scripts: LSL: 2.4 ms/call Mono: 1.8 ms/call Eight consumer scripts: LSL: 4.4 ms/call Mono: 3.3 ms/call 16 consumer scripts: LSL: 9 - 20 ms/call Mono: 7 - 15 ms/call 32 consumer scripts: LSL: 22.2 ms/call Mono: 22.2 ms/call *Mono CIL: ldc.i4 -4 // push LINK_THIS ldc.i4 0 // push 2nd arg: 0 ldstr "" // push 3rd arg: "" ldstr "00000000-0000-0000-0000-000000000000" // push 4th arg as string // convert that last string to a key type: call valuetype [ScriptTypes]LindenLab.SecondLife.Key \ class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \ 'CreateKey'(string) call void class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llMessageLinked'(int32, int32, string, valuetype \ [ScriptTypes]LindenLab.SecondLife.Key)
- Notes: The time it takes to call llMessageLinked() is dependent on the number of consumers of the message, up to about 20-30 consumer scripts. At that point, the time to send a link message becomes constant at one call per 22.2 ms, which is suspiciously similar to the sim's basic frame rate.
llParseString2List
llParseString2List(s,a,b); where s is 2000 random digits, a=["0","1"]; b=["2","3"];
LSL: 45 ms/call *Mono: 24 ms/call *Mono CIL: ldarg.0 // push 'this' pointer ldfld string LSL_script::'s' // push 1st arg: s ldarg.0 // push 'this' // push 2nd arg: list a: ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'a' ldarg.0 // push 'this' // push 3rd ard: list b: ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'b' call class [mscorlib]System.Collections.ArrayList \ class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llParseString2List'(string, \ class [mscorlib]System.Collections.ArrayList, \ class [mscorlib]System.Collections.ArrayList)
llSay()
llSay(-2, "0123456789"); LSL: 0.7 ms/call *Mono: 0.2 ms/call *Mono CIL: ldc.i4 2 // push 2... neg // ...and negate it = -2 ldstr "0123456789" // push 2nd arg call void class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llSay'(int32, string)
- Notes:
- The performance of llSay(-2, "0123456789") varies widely by region, although it can be pretty consistent within a region. In some nearly empty regions, llSay() slows to no better than 5 ms/call for reasons unknown. It's as if the throttling for this function is less deterministic than other LSL functions. But even in the worst case, llSay() can transmit faster than llListen() can receive.
- Verified and updated for server ver. 1.34.2.142087.
llSetLinkAlpha()
llSetLinkAlpha(link, alpha, side); LSL: 1.0 ms/call *Mono: 0.52 ms/call *Mono CIL: ldc.i4 1 // push 1st arg: link number ldc.r8 (00 00 00 60 b8 1e d5 3f) // push 2nd arg: float alpha ldc.i4 0 // push 3rd arg: side number call void class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llSetLinkAlpha'(int32, float32, int32)
llSetText()
llSetText(s,c,a); where string s is 20 chars
LSL: 0.79 ms/call *Mono: 0.38 ms/call *Mono CIL: ldarg.0 // push 'this' pointer ldfld string LSL_script::'s' // push 1st arg: string s ldarg.0 // push 'this' // push 2nd arg: color: ldfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'c' ldarg.0 // push 'this' ldfld float32 LSL_script::'a' // push 3rd arg: a call void class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llSetText'(string, \ class [ScriptTypes]LindenLab.SecondLife.Vector, float32)
llSubStringIndex()
llSubStringIndex(s, "X"); where s is 10000 digits not containing "X"
LSL: 16 ms/call *Mono: 22 ms/call *Mono CIL: ldarg.0 // push 'this' pointer ldfld string LSL_script::'s' // push 1st arg: string a ldstr "X" // push 2nd arg: literal string call int32 class [LslLibrary]LindenLab.SecondLife.Library:: \ 'llSubStringIndex'(string, string)
Event handlers
dataserver() for llGetNotecardLine()
Maximum rate of dataserver() events using llGetNotecardLine() LSL: 6.4 events/sec Mono: 6.4 events/sec
link_message()
Maximum rate of link_message() events LSL: 42-45 events/sec Mono: 42-45 events/sec
- Notes: The results were highly variable over several tens of millions of link_message() events in three lightly loaded sims, but I wonder if the maximum rate is tied to the sim frame rate of 45 fps.
listen()
Maximum rate of listen() events any number of scripts per prim, any number of prims in linkset
1 listener per script: LSL: 14.7 events/sec Mono: 14.7 events/sec Two listeners per script: LSL: 11 events/sec Mono: 11 events/sec Three listeners per script: LSL: 8.9 events/sec Mono: 8.9 events/sec
sensor()
Maximum rate of sensor() events after llSensorRepeat()
LSL: 15 events/sec Mono: 22 events/sec
- Notes: The maximum rate of sensor() events seems to be independent of the sensor radius or number of things detected.
timer()
Maximum rate of timer() events LSL: 22 events/sec Mono: 22 events/sec
- Notes: About half the sim's basic frame rate of 45 fps.
touch()
Maximum rate of touch() events LSL: 22 events/sec Mono: 22 events/sec
- Notes: About half the sim's basic frame rate of 45 fps.