LLTrace

From Second Life Wiki
Revision as of 01:03, 16 October 2013 by Strife Onizuka (talk | contribs) (isn't it c++ really?)
Jump to navigation Jump to search

What is it?

LLTrace is our system for capturing realtime statistic related to viewer performance and behavior and associating those statistics with certain actions or periods of time.

Most of the statistics that the viewer captures go through LLTrace. Examples include:

  • Frame rate
  • Bandwidth
  • Ping time
  • Object/Texture cache hit rate

In addition, LLTrace is used to capture detailed memory usage breakdown and time spent in various functions. By design, all of these metrics are available for run-time queries from the code, enabling detailed logging, in-client profiling, statistics reporting, and even self-tuning components and algorithms.

Usage

Declaration

First, you need to declare a statistic that you are going to track. For example, let's say you want to create a pedometer for your avatar. If you want to know how many footsteps your avatar takes, you would declare a *count* statistic for footsteps like this:

<cpp>

  1. include "lltrace.h"

static LLTrace::CountStatHandle<> sFootSteps ("footstepcount", "Number of footsteps I've taken"); </cpp>

This declares a handle to this particular statistic that you will use in all future reads/writes of that stat.

This object needs to be a global/static variable so that when we start running SL code, we have an accurate count of how many statistics we have. Support for dynamically adding/removing stats is under consideration for a future release.

The template parameter is used to specify the type of value the statistic is stored as. Here we are using the default, which is a double precision floating point value. All values are stored as double precision floats under the hood and converted to/from the requested type when reading and writing values, so don't bother optimizing for integers, etc. The first parameter is the name of the stat, used for tagging in output and stat lookup at runtime. The second parameter is a documentation string describing what this statistic means.

Generation

Next, you can generate your data. For a count-like statistic, this is done through the add method:

<cpp> // I took a step add(sFootSteps, 1); </cpp>

Recording

Generating statistics is useless unless you can read the values back. This is where the concept of a Recorder comes in. A recorder is used to capture statistics during a specified period of time. To use a recording, simply create one, and tell it to start().

<cpp> LLTrace::Recording my_recording; my_recording.start();

//do some complicated stuff that involves taking footsteps...

my_recording.stop();

// recording is now ready to read out values </cpp>

Calling stop() on a recording will capture the results. If you do not call stop (or pause), then your attempt to read a value out will assert. This is necessary to ensure that all information is gathered into your recording so that you won't be given partial results. This means that stopping or pausing a recording can be a relatively heavyweight process (depending on how long lived it is relative to other active recordings), but individual stat collection remains blazingly fast.

Once a recording is in a stopped/paused state, you can ask various questions about the stat(s) you are interested in. For count-type stats, you can ask for a sum or a rate.

<cpp> F64 num_footsteps = my_recording.getSum(sFootSteps); F64 footstep_rate = my_recording.getPerSec(sFootSteps); </cpp>

Recordings can be paused and resumed if you want to keep existing statistics and append any new ones, or stopped and restarted if you want to reuse them to gather new stats.




That's LLTrace in a nutshell. For more information about different stat types, how the data is recorded and maintains thread safety, and some crazy recording tricks, read on...

Stat Types

There are 5 fundamental stat types: 3 general and 2 special purpose ones.

Count

A count stat is used to...count...things. Generally you would use it to measure the rate of something happening, such as packets arriving, triangles rendered, etc. As seen above, you declare a count stat with

<cpp>

  1. include "lltrace.h"

static LLTrace::CountStatAccumulator s_packets_received("numpacketsreceived", "number of UDP packets received"); </cpp>

You write the stat with

<cpp>add(s_packets_received, packet_count);</cpp>

and you can query via

<cpp>recording.getSum(packet_count); recording.getPerSec(packet_count); recording.getSampleCount(packet_count);</cpp>

This is the most straightforward and lightweight type of stat we provide.

Sample

The sample stat is used to measure a value that can fluctuate over time, such as texture count or window size. Declare a sample stat like this:

<cpp>

  1. include "lltrace.h"

static LLTrace::SampleStatAccumulator s_texture_count("texturecount", "number of textures in scene"); </cpp>

Write the stat with:

<cpp>sample(s_texture_count, getNumTextures());</cpp>

With a sample stat, you can ask more interesting questions than those for simple counts.

<cpp> recording.getMin(s_texture_count); recording.getMax(s_texture_count); recording.getMean(s_texture_count); recording.getStandardDeviation(s_texture_count); recording.getLastValue(s_texture_count); recording.getSampleCount(s_texture_count); </cpp>

It is worth noting that the mean (and thus standard deviation) is a time-weighted average which assumes the value doesn't change between samples. So if you:

  • sample a value of 100...
  • wait 10 seconds...
  • sample a value of 0...
  • wait 1 second...
  • stop the recording and read out the mean...

you'll get:

<cpp> ((100 * 10s) + (0 * 1s)) / (10s + 1s) = 90 </cpp>

This reflects that for the majority of the time you weren't gathering samples, the value was last known to be 100. If this estimate provides bad results, you need to be taking more samples. No matter how many samples you take, the values will be weighted by time between samples, not by the number of samples actually taken. This frees you up to not have to worry about how frequently a value is actually changing, but to just ensure you are sampling it reasonably often.

There is no getSum() provided for sample stats since it doesn't seem useful, and the result would have to be in the confusing unit of (original units*seconds) to match the time-based behavior of sample stats in general.

Event

The last type of general purpose stat is the event. This type of stat is used to measure both the existence of a specific event *and* a value associated with it. For example, you could use it to measure triangles rendered per frame. This has both an event (a frame was rendered) and an amount associated with it (how many triangles were in the frame).

Declare it:

<cpp>

  1. include "lltrace.h"

static LLTrace::EventStatHandle<F64Kilotriangles> s_triangles_per_frame("trianglesperframe", "Triangles rendered per frame"); </cpp>

Notice that there is a non-default template parameter. In this case, we are specifying that we want to record the value in units of F64Kilotriangles (thousands of triangles, stored with double precision). This unit type and how it works are described in LLUnit.

Use it:

<cpp> record(s_triangles_per_frame, mNumTriangles); </cpp>

And read it:

<cpp> recording.getSum(s_triangles_per_frame); recording.getMin(s_triangles_per_frame); recording.getMax(s_triangles_per_frame); recording.getMean(s_triangles_per_frame); recording.getStandardDeviation(s_triangles_per_frame); recording.getLastValue(s_triangles_per_frame); recording.getSampleCount(s_triangles_per_frame); </cpp>

Notice that we do support a sum here, as we're dealing with discrete events. In this case, the sum would represent the total number of triangles rendered over the life of the recording. Similarly, mean in this case represents the average value over all events. The timing of the events doesn't affect the mean.

Fast Timers

Fast timers are a special type of stat used to measure time spent in various sections of code. They are activated via a macro in a given function/block of code to measure the amount of time spent in that code. They will also infer the hierarchical caller/callee relationship of various timed sections of code based on run-time behavior. The results can be viewed in the client using the fast timer display (Ctrl+Shift+9).

To declare a fast timer:

<cpp>

  1. include "llfasttimer.h"

static LLTrace::TimeBlock s_do_stuff_time("dostuff", "very important stuff"); </cpp>