User:Becky Pippen/LSL Performance

From Second Life Wiki
< User:Becky Pippen
Revision as of 10:04, 2 September 2011 by Becky Pippen (talk | contribs) (These old timing measurements are obsolete and this page will disappear soon.)
Jump to navigation Jump to search

With the move to virtualized class 7 servers and changes to the Mono virtual machine, these performance measurements made on old class 5 servers are no long very meaningful. Certainly the absolute timings are different now, and probably the relative timings are different too. The generated CIL code shown below might still be the same (I haven't checked), but that's about all that's left in this page that's relevant. So, this page is obsolete and will disappear soon.


LSL Timings and Rates

Here are some selected LSL function times and event rates comparing scripts compiled and run using the old LSL toolset (a.k.a LSO), and using Mono.

The basic test framework for these tests is a loop like shown below, where we take the difference in llGetTime() before and after the loop:

while (--i >= 0) {
    f(); f(); f(); f(); f(); f(); f(); f();
    f(); f(); f(); f(); f(); f(); f(); f();
}

The loop is partially unrolled to make the loop control structure overhead negligible. We take several measurements in multiple Class 5 sims that are running consistently with a total frame time under 10 ms, where each test runs for 30 - 120 minutes, or longer if we fall asleep during the test. We measure several times during a 24-hour period until the numbers converge with a variance of 10% or less.

Measurements are taken in at least three regions, because we've occasionally seen regions with lots of reported spare time per frame, yet scripts seem to run inexplicably slow. If two out of three regions converge and agree, the anomalous region is ignored. Consider the numbers reported on this page to be maximums possible in cooperative regions.

In a Mono test, the result of the first run after script reset is ignored to eliminate the overhead of the VM initialization, JIT compilation, etc.

The Mono CIL shown for some of the LSL code below was obtained by modifying an SL viewer so that it calls lscript_compile() when saving a script. (Scripts are compiled on the simulators nowadays, but Linden Lab has kindly left the script compiler code in the open source viewer source code.) Comments and formatting were added manually.

Note that these are stopwatch times. How rapidly you can call a function or how often an event handler is invoked doesn't necessarily correspond to how much those functions and events lag a sim because of all the throttling and scheduling going on behind the scenes inside the run-time engine.

Class 7 hosts are starting to appear on the main grid as reported here and here. As Linden Lab moves toward classless regions, absolute timings like shown on this page will become less meaningful. Maybe we can make use of some sort of relative performance measurements in the future.

Functions, operators, & control structures

Floating point assignment

x = PI;

where x is a global float

LSL: 0.11 ms/call *Mono: 0.0007 ms/call
*Mono CIL:
   .field public float32 'x'         // float x;
   . . .
   ldarg.0                           // push 'this' pointer
   ldc.r8 (00 00 00 60 fb 21 09 40)  // push 64-bit PI
   stfld float32 LSL_script::'x'     // pop and store 32-bit x
   ldarg.0
   ldfld float32 LSL_script::'x'
   pop
Notes: Verified and updated for server version 1.34.2.142087.

Call to empty f() { }

f();

where f() is defined as f() { }

LSL: 0.33 ms/call *Mono: 0.0026 ms/call
*Mono CIL:

Function definition:

.method public hidebysig instance default void 'gf'() cil managed
{
    .maxstack 500
    ret
}

Caller:

    ldarg.0    // push 'this' pointer
    call instance void class LSL_script::'gf'()

for loop overhead

This test measures the overhead of a for loop that iterates an empty block just once.
for (i = 0; i == 0; ++i) { }
LSL: 0.07 ms/loop *Mono: 0.001 ms/loop
*Mono CIL:
   // loop init:  i = 0
   ldc.i4.0      // push 32-bit 0
   stloc.s 0     // pop and store in i
   ldc.i4 0      // push 32-bit 0
   dup           // duplicate 0 on the stack
   stloc.s 0     // pop and store 0 in i
   pop           // cleanup

LabelTempJump0:
   // top of loop: test i == 0
   ldc.i4 0      // push 32-bit 0
   ldloc.s 0     // push i
   ceq           // compare i and 0
   brfalse LabelTempJump1 // branch if i != 0

   // bottom of loop: ++i
   ldloc.s 0     // push i
   ldc.i4.1      // push 32-bit 1
   add           // pop i and 1, add, push result
   dup           // duplicate result
   stloc.s 0     // save result in i
   pop           // cleanup
   br LabelTempJump0

LabelTempJump1:
   ret

State change

state a;
LSL: 0.14 ms/transition *Mono: 22.2 ms/transition
*Mono CIL:
   ldarg.0            // push 'this' pointer
   ldstr "state2"
   call instance void class [LslUserScript] \
           LindenLab.SecondLife.LslUserScript::ChangeState(string)
   ret

Notes:
State changes in Mono are tied to the sim frame rate of 45 fps (per Vektor Linden). The test code for state changes is an outer loop around 16 state changes like this:

. . .
state stateN { state_entry() { state stateN+1; } }
state stateN+1 { state_entry() { state stateN+2; } }
. . . etc.

Vector operations

v = <PI,PI,PI> + <PI,PI,PI> * PI;
LSL: 0.43 ms/statement *Mono: 0.025 ms/statement
*Mono CIL:
   ldarg.0  // push 'this' pointer
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector x)
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector y)
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI (vector z)
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI multiplicand
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           'CreateVector'(float32, float32, float32)
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           'Multiply'(float32, class [ScriptTypes]LindenLab.SecondLife.Vector)
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI
   ldc.r8 (00 00 00 60 fb 21 09 40) // push 64-bit PI
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           'CreateVector'(float32, float32, float32)
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           'Add'(class [ScriptTypes]LindenLab.SecondLife.Vector, \
           class [ScriptTypes]LindenLab.SecondLife.Vector)
   // save result in v:
   stfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'v'
   ldarg.0  // push 'this' pointer
   ldfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'v'
   pop

llAbs()

llAbs();
LSL: 0.41 ms/statement *Mono: 0.012 ms/statement
*Mono CIL:
   ldc.i4 0 // push the integer argument
   call int32 class [LslLibrary]LindenLab.SecondLife.Library::'llAbs'(int32)

Notes: verified and updated for server ver. 1.34.2.142087

string conversions

llCSV2List(llList2CSV(x));

where x is a list of 10 integers

LSL: 3.2 ms/call *Mono: 4 - 7 ms/call
*Mono CIL:
   ldarg.0     // push 'this' pointer
   ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'x'
   call string class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llList2CSV'(class [mscorlib]System.Collections.ArrayList)
   call class [mscorlib]System.Collections.ArrayList class \
           [LslLibrary]LindenLab.SecondLife.Library::'llCSV2List'(string)

Notes: So far, the Mono results vary too much for any meaningful result.

llDetectedName()

llDetectedName();
LSL: 1.1 ms/call *Mono: 0.8 ms/call
*Mono CIL:
   ldc.i4 0  // push argument
   call string class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llDetectedName'(int32)

llGetInventoryName()

llGetInventoryName(INVENTORY_TEXTURE, 0);
LSL: 0.52 ms/call *Mono: 0.21 ms/call
*Mono CIL:
   ldc.i4 0    // push INVENTORY_TEXTURE
   ldc.i4 0    // push 2nd argument
   call string class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llGetInventoryName'(int32, int32)

llGetObjectDetails

llGetObjectDetails(k, [OBJECT_OWNER]);
LSL: 1.3 ms/call *Mono: 1.7 ms/call
*Mono CIL:
   ldarg.0  // push 'this' pointer
   // push k:
   ldfld valuetype [ScriptTypes]LindenLab.SecondLife.Key LSL_script::'k'
   ldc.i4 6 // push OBJECT_OWNER
   box [mscorlib]System.Int32 
   call class [mscorlib]System.Collections.ArrayList \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           CreateList()
   call class [mscorlib]System.Collections.ArrayList \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           Prepend(object, class [mscorlib]System.Collections.ArrayList)
   call class [mscorlib]System.Collections.ArrayList \
           class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llGetObjectDetails'(valuetype \
           [ScriptTypes]LindenLab.SecondLife.Key, \
           class [mscorlib]System.Collections.ArrayList)

Notes: Verified and updated for server ver. 1.34.2.142087

llGetPos()

llGetPos();
LSL: 0.39 ms/call *Mono: 0.025 ms/call
*Mono CIL:
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
       class [LslLibrary]LindenLab.SecondLife.Library:: \
       'llGetPos'()

Notes: Verified and updated for server ver. 1.34.2.142087

llList2Vector()

llList2Vector(x, 0);

where x = [ ZERO_VECTOR ]

LSL: 0.55 ms/call *Mono: 0.012 ms/call
*Mono CIL:
   ldarg.0   // push 'this' pointer
   // push list x:
   ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'x'
   ldc.i4 0  // push 2nd argument: 0
   call class [ScriptTypes]LindenLab.SecondLife.Vector \
           class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llList2Vector'(class \
           [mscorlib]System.Collections.ArrayList, int32)

llMD5String()

llMD5String(s,0);

where s is a 64-char string

LSL: 1.4 ms/call *Mono: 1.4 ms/call
*Mono CIL:
   ldarg.0                       // push 'this' pointer
   ldfld string LSL_script::'s'  // push string s
   ldc.i4 0                      // push 2nd arg: 0
   call string class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llMD5String'(string, int32)

llMessageLinked()

llMessageLinked(LINK_THIS, 0, "", NULL_KEY);
One consumer script: LSL: 1.5 ms/call *Mono: 0.9 ms/call
Two consumer scripts: LSL: 1.8 ms/call Mono: 1.2 ms/call
Four consumer scripts: LSL: 2.4 ms/call Mono: 1.8 ms/call
Eight consumer scripts: LSL: 4.4 ms/call Mono: 3.3 ms/call
16 consumer scripts: LSL: 9 - 20 ms/call Mono: 7 - 15 ms/call
32 consumer scripts: LSL: 22.2 ms/call Mono: 22.2 ms/call
*Mono CIL:
   ldc.i4 -4    // push LINK_THIS
   ldc.i4 0     // push 2nd arg: 0
   ldstr ""     // push 3rd arg: ""
   ldstr "00000000-0000-0000-0000-000000000000" // push 4th arg as string
   // convert that last string to a key type:
   call valuetype [ScriptTypes]LindenLab.SecondLife.Key \
           class [LslUserScript]LindenLab.SecondLife.LslUserScript:: \
           'CreateKey'(string)
   call void class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llMessageLinked'(int32, int32, string, valuetype \
           [ScriptTypes]LindenLab.SecondLife.Key)

Notes: The time it takes to call llMessageLinked() is dependent on the number of consumers of the message, up to about 20-30 consumer scripts. At that point, the time to send a link message becomes constant at one call per 22.2 ms, which is suspiciously similar to the sim's basic frame rate.

llParseString2List

llParseString2List(s,a,b);

where s is 2000 random digits, a=["0","1"]; b=["2","3"];

LSL: 45 ms/call *Mono: 24 ms/call
*Mono CIL:
   ldarg.0                      // push 'this' pointer
   ldfld string LSL_script::'s' // push 1st arg: s
   ldarg.0                      // push 'this'
   // push 2nd arg: list a:
   ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'a'
   ldarg.0                      // push 'this'
   // push 3rd ard: list b:
   ldfld class [mscorlib]System.Collections.ArrayList LSL_script::'b'
   call class [mscorlib]System.Collections.ArrayList \
           class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llParseString2List'(string, \
           class [mscorlib]System.Collections.ArrayList, \
           class [mscorlib]System.Collections.ArrayList)

llSay()

llSay(-2, "0123456789");
LSL: 0.7 ms/call *Mono: 0.2 ms/call
*Mono CIL:
   ldc.i4 2           // push 2...
   neg                // ...and negate it = -2
   ldstr "0123456789" // push 2nd arg
   call void class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llSay'(int32, string)
Notes:
  1. The performance of llSay(-2, "0123456789") varies widely by region, although it can be pretty consistent within a region. In some nearly empty regions, llSay() slows to no better than 5 ms/call for reasons unknown. It's as if the throttling for this function is less deterministic than other LSL functions. But even in the worst case, llSay() can transmit faster than llListen() can receive.
  2. Verified and updated for server ver. 1.34.2.142087.

llSetLinkAlpha()

llSetLinkAlpha(link, alpha, side);
LSL: 1.0 ms/call *Mono: 0.52 ms/call
*Mono CIL:
   ldc.i4 1                         // push 1st arg: link number
   ldc.r8 (00 00 00 60 b8 1e d5 3f) // push 2nd arg: float alpha
   ldc.i4 0                         // push 3rd arg: side number
   call void class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llSetLinkAlpha'(int32, float32, int32)

llSetText()

llSetText(s,c,a);

where string s is 20 chars

LSL: 0.79 ms/call *Mono: 0.38 ms/call
*Mono CIL:
   ldarg.0                       // push 'this' pointer
   ldfld string LSL_script::'s'  // push 1st arg: string s
   ldarg.0                       // push 'this'
   // push 2nd arg: color:
   ldfld class [ScriptTypes]LindenLab.SecondLife.Vector LSL_script::'c'
   ldarg.0                       // push 'this'
   ldfld float32 LSL_script::'a' // push 3rd arg: a
   call void class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llSetText'(string, \
           class [ScriptTypes]LindenLab.SecondLife.Vector, float32)

llSubStringIndex()

llSubStringIndex(s, "X");

where s is 10000 digits not containing "X"

LSL: 16 ms/call *Mono: 22 ms/call
*Mono CIL:
   ldarg.0                      // push 'this' pointer
   ldfld string LSL_script::'s' // push 1st arg: string a
   ldstr "X"                    // push 2nd arg: literal string
   call int32 class [LslLibrary]LindenLab.SecondLife.Library:: \
           'llSubStringIndex'(string, string)

Event handlers

dataserver() for llGetNotecardLine()

Maximum rate of dataserver() events using llGetNotecardLine()
LSL: 6.4 events/sec Mono: 6.4 events/sec

link_message()

Maximum rate of link_message() events
LSL: 42-45 events/sec Mono: 42-45 events/sec
Notes: The results were highly variable over several tens of millions of link_message() events in three lightly loaded sims, but I wonder if the maximum rate is tied to the sim frame rate of 45 fps.

listen()

Maximum rate of listen() events

any number of scripts per prim, any number of prims in linkset

1 listener per script: LSL: 14.7 events/sec Mono: 14.7 events/sec
Two listeners per script: LSL: 11 events/sec Mono: 11 events/sec
Three listeners per script: LSL: 8.9 events/sec Mono: 8.9 events/sec

sensor()

Maximum rate of sensor() events

after llSensorRepeat()

LSL: 15 events/sec Mono: 22 events/sec
Notes: The maximum rate of sensor() events seems to be independent of the sensor radius or number of things detected.

timer()

Maximum rate of timer() events
LSL: 22 events/sec Mono: 22 events/sec
Notes: About half the sim's basic frame rate of 45 fps.

touch()

Maximum rate of touch() events
LSL: 22 events/sec Mono: 22 events/sec
Notes: About half the sim's basic frame rate of 45 fps.