LLTrace

From Second Life Wiki
Jump to navigation Jump to search

What is it?

According to Wikipedia: "In software engineering, tracing is a specialized use of logging to record information about a program's execution"

LLTrace is our system for capturing realtime statistics related to viewer performance and behavior and associating those statistics with certain actions or periods of time.

Most of the statistics that the viewer captures go through LLTrace. Examples include:

  • Frame rate
  • Bandwidth
  • Ping time
  • Object/Texture cache hit rate

In addition, LLTrace is used to capture detailed memory usage breakdown and time spent in various functions. By design, all of these metrics are available to query at run time in production code, enabling detailed logging, in-client profiling, statistics reporting, and even self-tuning components and algorithms. LLTrace is more than just a debugging tool.

Usage

Declaration

First, you need to declare a statistic that you are going to track. For example, let's say you want to create a pedometer for your avatar. If you want to know how many footsteps your avatar takes, you would declare a *count* statistic for footsteps like this: <cpp>#include "lltrace.h"

static LLTrace::CountStatHandle<> sFootSteps ("footstepcount", "Number of footsteps I've taken"); </cpp> This declares a handle to this particular statistic that you will use in all future reads/writes of that stat.

This object needs to be a global/static variable so that when we start running SL code, we have an accurate count of how many statistics we have. Support for dynamically adding/removing stats is under consideration for a future version.

The template parameter is used to specify the type of value the statistic is reported as. Here we are using the default, which is a double precision floating point value. All values are stored as double precision floats under the hood and converted to/from the requested type when reading and writing values. The first parameter is the name of the stat, used for labeling in output and stat lookup at runtime. The second parameter is a documentation string describing what this statistic means. You can access the name and description of a given stat via the getName() and getDescription() member functions.

Data Recording

Next, you can generate your data. For a count-like statistic, this is done through the add function: <cpp>// I took a step add(sFootSteps, 1); </cpp> Each type of stat has a distinctly named function for capturing data, to make it clear how the data is being consumed. In this case, we're using a count stat, so the function simply adds 1 to the count of footsteps. Other types of stats have distinct functions for gathering their data.

Trace Recording and Queries

Generating statistics is useless unless you can read the values back. This is where the concept of a Recorder comes in. A recorder is used to capture statistics during one or more periods of time. To use a recording, simply create a Recording object, and tell it to start(). <cpp>LLTrace::Recording my_recording; my_recording.start();

//do some stuff that involves taking footsteps...

my_recording.stop();

// recording is now ready to read out values </cpp> Once you have a recording in hand, you can ask various questions about the stat(s) you are interested in. For count-type stats, you can ask for a sum or a rate. <cpp>F64 num_footsteps = my_recording.getSum(sFootSteps); F64 footstep_rate = my_recording.getPerSec(sFootSteps); </cpp> Recordings can be paused and resumed if you want to keep existing statistics and append any new ones, or stopped and restarted if you want to reuse them to gather new stats.


That's LLTrace in a nutshell. For more information about different stat types, how the data is recorded and maintains thread safety, and some crazy recording tricks, read on...

Stat Types

There are 5 fundamental stat types: 3 general and 2 special purpose ones.

Count

A count stat is, as expected, used to count things. Generally you would use it to measure the rate of something happening, such as packets arriving, triangles rendered, etc. As seen above, you declare a count stat with

<cpp>

  1. include "lltrace.h"

static LLTrace::CountStatHandle sPacketsReceived("numpacketsreceived", "number of UDP packets received"); </cpp>

You write the stat with

<cpp>add(sPacketsReceived, packet_count);</cpp>

and you can query via

<cpp> recording.getSum(packet_count); recording.getPerSec(packet_count); recording.getSampleCount(packet_count); </cpp>

This is the most straightforward and lightweight type of stat we provide.

Sample

The sample stat is used to measure a value that can fluctuate over time, such as texture count or window size. Declare a sample stat like this:

<cpp>

  1. include "lltrace.h"

static LLTrace::SampleStatHandle sTextureCount("texturecount", "number of textures in scene"); </cpp>

Write the stat with:

<cpp>sample(sTextureCount, getNumTextures());</cpp>

With a sample stat, you can ask more interesting questions than those for simple counts.

<cpp> recording.getMin(sTextureCount); recording.getMax(sTextureCount); recording.getMean(sTextureCount); recording.getStandardDeviation(sTextureCount); recording.getLastValue(sTextureCount); recording.getSampleCount(sTextureCount); </cpp>

It is worth noting that the mean (and thus standard deviation) is a time-weighted average which assumes the value doesn't change between samples. So if you:

  • sample a value of 100...
  • wait 10 seconds...
  • sample a value of 0...
  • wait 1 second...
  • stop the recording and read out the mean...

you'll get:

((100 * 10s) + (0 * 1s)) / (10s + 1s) = 90

Even though you read two samples with an average value of 55, the result was much higher...this reflects that for the majority of the time you weren't gathering samples, the value was last known to be 100. No matter how many samples you take, the values will be weighted by time between samples, not by the number of samples actually taken. With this approach, you are free to only sample when the value is known to have changed, or when it is convenient, instead of worrying about sampling at regular intervals. Of course, if you sample too infrequently, your results can still be off.

There is no getSum() provided for sample stats since it isn't useful for time series data (not to mention the result would have to be in the confusing unit of units*seconds).

Event

The last type of general purpose stat is the event. This type of stat is used to measure both the existence of a specific event *and* a value associated with it. For example, you could use it to measure triangles rendered per frame. This has both an event (a frame was rendered) and an amount associated with it (how many triangles were in the frame).

Declare it:

<cpp>

  1. include "lltrace.h"

static LLTrace::EventStatHandle<F64Kilotriangles> sTrianglesPerFrame("trianglesperframe", "Triangles rendered per frame"); </cpp>

Notice that we have provided a type this time as a template argument. In this case, we are specifying that we want to record the value in units of F64Kilotriangles (thousands of triangles, stored with double precision).

Use it:

<cpp> record(sTrianglesPerFrame, mNumTriangles); </cpp>

And read it:

<cpp> recording.getSum(sTrianglesPerFrame); recording.getMin(sTrianglesPerFrame); recording.getMax(sTrianglesPerFrame); recording.getMean(sTrianglesPerFrame); recording.getStandardDeviation(sTrianglesPerFrame); recording.getLastValue(sTrianglesPerFrame); recording.getSampleCount(sTrianglesPerFrame); </cpp>

Notice that we support a sum here, as we're dealing with discrete events. In this case, the sum would represent the total number of triangles rendered over the life of the recording. Similarly, mean in this case represents the average value over all events. Unlike samples, the timing of the events doesn't affect the mean and standard deviation.

Block Timers

Block timers are a special type of stat used to measure time spent in various sections of code. They are activated via a stack-based object in a given function/block of code which measures the amount of time spent in that code. LLTrace will also infer the hierarchical caller/callee relationship of all timed sections of code based on run-time behavior. The results can be viewed in the client using the fast timer display (Ctrl+Shift+9).

To declare a block timer: <cpp>

  1. include "llfasttimer.h"

static LLTrace::BlockTimerStatHandle sBreakfastTime("eating_breakfast", "time spent eating breakfast which includes eggs, toast, and coffee"); </cpp>

To time a block of code, use the provided macro: <cpp> void eat_breakfast() {

   LL_RECORD_BLOCK_TIME(sBreakfastTime);
   eatEggs();
   eatToast();
   drinkCoffee();

} </cpp> This macro will create an object on the stack whose constructor and destructor are responsible for starting and stopping the timer, respectively. Block Timers *must* be stack allocated, as LLTrace requires that they have properly nested lifetime (timers are destructed in the reverse order of construction). This is enforced by requiring the macro to construct a block timer on the stack (putting this macro anywhere but in a function body will not work).

To get the time spent in a particular block, you use a recording as usual: <cpp> F64Seconds time_eating_breakfast = recording.getSum(sBreakfastTime); F64Seconds fraction_of_second_spent_doing_stuff = recording.getPerSec(sBreakfastTime); </cpp> Notice that the results are in explicit units. The resulting values reflect the amount of time spent in the time block, including any other time blocks that were entered as a result. If you want *just* the time spent in the specified block, excluding any other block, you need to use the selfTime facet. Imagine that the drinkCoffee() function had its own timer, and you wanted to know how much time was spent eating breakfast that was not covered by other timers. You would then call:

<cpp> F64Seconds time_just_eating_breakfast = recording.getSum(sBreakfastTime.selfTime()); F64Seconds fraction_of_time_just_eating_breakfast = recording.getPerSec(sBreakfastTime.selfTime()); </cpp>

A facet is accessed by calling a special method on the stat handle. This allows you to access a different form of the data being measured. For example, if you want to know how many times a given block timer was entered:

<cpp> S32 call_count = recording.getSum(sBreakfastTime.callCount()); F32 call_count_per_sec = recording.getPerSec(sBreakfastTime.callCount()); </cpp>

Notice that here the getPerSec() method returns a floating point value while the getSum() returns a whole number of calls as an integer.

Block Timer Hierarchy

LLTrace will infer the call hierarchy of functions that include block timers based on the usage pattern of the timers. This is an approximation of the actual call hierarchy, since if a block timer is entered in more than one context, LLTrace cannot currently deduce how the contexts shared the time spent in that common code. For example:

<cpp> void commonCode() {

   LL_RECORD_BLOCK_TIME(sCommonCodeTime);
   // some common code...

}

void foo() {

   LL_RECORD_BLOCK_TIME(sFooTime);
   
   commonCode();
   // other stuff...

}

void bar() {

   LL_RECORD_BLOCK_TIME(sBarTime);
   commonCode();
   // other stuff...

} </cpp>

In this case, the time spent in commonCode() is split between foo() and bar(), but LLTrace is optimized to do as little work as possible when processing stats, so it isn't able to measure that split (in general). It will do the next best thing and report commonCode() as a sibling to foo() and bar().

To traverse the call hierarchy, you can use LLTreeIterator and the associated helper functions in LLTrace to do depth first preorder, depth first postorder, and breadth first traversal. For example, to dump timer values in depth first order:

<cpp> // depth first traversal for(LLTrace::block_timer_tree_df_iterator_t it = LLTrace::begin_block_timer_tree_df(sMyTimer);

   it != LLTrace::end_block_timer_tree_df();
   ++it)

{

   std::cout << "Timer: " << (*it)->getName() << " " << recording.getSum(*it) << std::endl;

} </cpp>

The timer hierarchy is tracked per thread, so your results depend on the thread where the traversal is performed.

Mem Stats

Mem stats track the amount of memory allocated for a given class or subsystem. You can manually track memory by using a mem stat handle directly:

<cpp> static LLTrace::MemStatHandle sMySystemMem("MySystem", "Memory allocated by my system");

Object* gObject = NULL; F32* gBuffer = NULL; const S32 BUFFER_SIZE = 1000;

void createObject() {

   gObject = new Object();
   LLTrace::claim_alloc(sMySystemMem, gObject);
   gBuffer = new F32[BUFFER_SIZE];
   LLTrace::claim_alloc(sMySystemMem, sizeof(F32) * BUFFER_SIZE);

}

void deleteObject() {

   LLTrace::disclaim_alloc(sMySystemMem, gObject);
   delete gObject;
   delete[] gBuffer;
   LLTrace::disclaim_alloc(sMySystemMem, sizeof(F32) * BUFFER_SIZE);

} </cpp>

You can either claim/disclaim an object (pointers to objects can be used as well as references...in either case it is the size of the object being referenced/pointed to that is measured) or you can pass a specific integral value representing the amount of memory in bytes.

Alternatively, you may want to use the LLTrace::MemTrackable mixin to annotate a specific class as needing to have its memory usage tracked:

<cpp> class MyClass : public LLTrace::MemTrackable<MyClass> { public:

   MyClass() 
   : LLTrace::MemTrackable<MyClass>("MyClass") 
   {
       mObjectp = new Object;
       claimMem(mObjectp); // add sizeof(Object) to claimed memory, since I own that instance
       
       mOtherObjectp = getSomeOtherObject();  // since this other object is not considered owned by me, don't claim its memory
   }

private:

   Object* mObjectp;
   SomeOtherObject* mOtherObjectp;
   U8      mData[100];

};

MyClass* gMyClass = new MyClass; // overloaded new operator claims sizeof(MyClass) for MyClass stat </cpp>

By deriving from MemTrackable, you get:

  • a static MemStat specific to your class (that you name via the constructor call to MemTrackable on each instance)
  • the methods memClaim and memDisclaim that manage claimed memory for your particular class
  • automatic tracking of all memory claimed for a specific instance, which will be disclaimed for you when the object is deleted

In the above example, the call to new MyClass will claim sizeof(MyClass) for the "MyClass" MemStat. The constructor of MyClass will claim an additional sizeof(Object) for "MyClass." When gMyClass is deleted, all that memory will automatically be removed (disclaimed) from the "MyClass" stat by the MemTrackable destructor.

Since the MemTrackable class works by overriding operators new and delete, it will not work if you provide your own versions of those operators (when requiring aligned allocations, for example). Fortunately, the MemTrackable class takes a second template parameter that lets you specify the desired alignment:

<cpp> class MyAlignedClass : public LLTrace::MemTrackable<MyAlignedClass, 32> </cpp>

If your class has one or more containers as members, and it is assumed to *own* the objects in those containers, it is important that you track the elements stored in those containers as they are added and deleted. In addition, accounting for clearing a container would require a disclaimMem(sizeof(ItemInContainer) * mContainer.size()). It isn't practical to account for the overhead of the container nodes themselves, but that usually is a small percentage of your memory usage. If your accounting does get out of sync (you've disclaimed more than you claim, for example), the error will not accumulate but will be limited to the lifetime of the particular object being tracked, since the MemTrackable destructor will completely negate the memory tracking effect of each object instance.

To read the memory usage of a given stat:

<cpp> recording.getMin(sMySystemMem); recording.getMax(MyClass::sMemStat); recording.getMean(MyAlignedClass::sMemStat); recording.getStandardDeviation(sMySystemMem); recording.getLastValue(sMySystemMem); </cpp>

Note that the currently allocated memory is essentially a Sample stat.

There are also facets that allow you to measure allocations and deallocation counts and rates:

<cpp> // allocations recording.getSum(sMySystemMem.allocations()); recording.getPerSec(MyClass::sMemStat.allocations());

// deallocations recording.getSum(MyAlignedClass::sMemStat.deallocations()); recording.getPerSec(sMySystemMem.deallocations()); </cpp>

Recordings

You must have a trace recording in order to read back any trace values. Fortunately, there are some default ones provided for you, if all you want is to query per frame or application-lifetime values.

For example, you can usually get last frame's recording data this way:

<cpp> LLTrace::Recording& last_frame_recording = LLTrace::get_frame_recording().getLastRecording(); </cpp>

You can also access a recording that has been running for the application's lifetime via LLViewerStats:

<cpp> LLTrace::Recording& app_recording = LLViewerStats::instance().getRecording(); </cpp>

Now let's look at how you create and use your own recording. To record some trace data, simply create a LLTrace::Recording object and call start() on it (it will not be started by default). You can also use stop() and pause() to control the activity of the recording:

<cpp> void doStuff() {

   LLTrace::Recording important_stuff_recording; // not started by default
   important_stuff_recording.start();
   
   //do some important stuff that generates stats for LLTrace...
   // log the stat
   LL_INFOS() << "Stat sum: " << important_stuff_recording.getSum(sSomeStat) << LL_ENDL;
   important_stuff_recording.pause();
   //do some unimportant stuff that is not relevant...
   important_stuff_recording.start();
   //do more stuff related to my_recording...
   // done with important stuff, so stop the recording
   important_stuff_recording.stop();
   // you can always ask a stopped recording for any data it had captured
   LL_INFOS() << "Final sum: " << important_stuff_recording.getSum(sSomeStat) << LL_ENDL;

} </cpp>

Notice how the recording object itself is used to read back the value of a stat (in the case the sum). While you can read stats from a recording that is started, it is often more efficient to read from one that has been stopped. The act of stopping or pausing a recording ensures that all of its stats are up to date.

There are a handful of methods that let you control a recording, but the fundamental operations are start(), stop() and reset().

  • start() makes the recording active, such that any subsequent stats will be captured by that recording.  
  • stop() stops further stats from being included in the recording, but leaves existing statistics available for queries.
  • reset() zeros out all stats in a recording, but leaves the recording in its current started/stopped state. 

There are several other methods that provide context-sensitive control of the recording start/stop state, as well as the ability to pause a recording and resume it later so that statistics from multiple time spans can be accumulated into a single recording. Details of recording start/pause/stop behavior follows:


Method State
Stopped Paused Started
start() Started* Started Started
stop() Stopped Stopped Stopped
pause() Stopped Paused Paused
unpause() Stopped Started Started
resume() Started Started Started
restart() Started* Started* Started*
* stats cleared as side effect

Periodic Recordings

You will often be interested in a statistic that is recorded over multiple periods (frames, typically) and how the statistic varies from one period to the next. This will provide you with minimum, maximum, and mean values that pertain to the recording periods that you sampled. Note that this is different from the minimum, maximum, and mean values that you might get from a recording that was continually active over the time periods in question. For example:

Recording Min Max Mean
Frame 1 2 6 3
Frame 2 1 5 2
Frame 3 3 8 4
Frames 1-3 recording 1 8 3.697

Notice how the 3 frame recording spans the min and max of the individual frame, and the mean is not the average of the individual frames means, but the average of all the samples across the 3 frames. Since the frames will not always have the same amount of samples or same durations, it is possible for the mean of one frame to weight more heavily than another.

However, a periodic recording treats all periods (frames) as equal, and considers a single aggregate value from each period (mean or sum, depending) as a data point. So if could ask the periodic recording for the 3 frame min, max, and mean and you would get:

Min Max Mean
3 Frame Periodic Recording 2 4 3

To use a periodic recording, you must decide whether to give it a fixed number of periods that are recycled (ring buffer) or grow in an unbounded way.

<cpp> PeriodicRecording last_100_frames_recording(100); // record last 100 frames of stats PeriodicRecording all_frames_recording(); // record as many frames as we want </cpp>

A periodic recording acts like a normal recording: you can start, stop, pause, etc. You can also call nextPeriod() to advance to the next recording period.

<cpp> while(1) {

   idle();
   render();
   // end of frame
   all_frames_recording.nextPeriod();

} </cpp>

At any point you can get the recording for an individual period with getPrevRecording(periods_in_the_past) or the most recent recordings with getCurRecording() and getLastRecording().

The interesting methods on PeriodicRecording are the periodic statistics. For example, instead of getMean() or getMin() you can call getPeriodMean() and getPeriodMin(). These calls can be quite expensive, depending on the number of periods in the recording, as they will walk each recording and request its stat, and then generate the aggregate result on the fly. To avoid aggregating over all periods in the recording, you can pass a period count to any of the getters and the result will only use that many periods in its calculation:

<cpp> // calculate average framerate over last 10 frames F32 average_framerate = all_frames_recording.getPeriodMean(sFrameRateStat, 10); </cpp>

Extendable Recording

An extendable recording allows you to speculatively gather trace data and when you decide that the data is valid, commit the data to a running accumulation of accepted data. For example, imagine that you want to gather trace data during the loading of the contents of a region. There is no definitive way to know that a region has finished loading, except that no further object data arrives (assuming new objects are not being created all the time). To measure this, you start an extendable recording, and at any point object data arrives, you know the region is still loading, so you commit the pending trace data by calling extend(). Whenever you query the recording, you can reference the accepted contents of the extendable recording that contains all trace data up to the point you last called extend(), or the potential contents that include all trace data that has yet to be accepted. To access the recording object containing the data you've accepted, call getAcceptedRecording() and to access the recording with the remaining pending data, call getPotentialRecording().

The extendable recording also has a periodic variant that works just like a PeriodicRecording in that you can call nextPeriod() for each period of time (per frame, for example) and works like an ExtendableRecording because you can call extend() to accept all pending periods.

Threading

LLTrace is thread safe, provided you follow these rules:

  • Always stop/pause a recording on the same thread it was started.
  • Do not read from recordings that are active on another thread. Reading from a stopped/paused recording is fine on any thread.
  • Recording member functions are not intrinsically thread safe, but const methods are guaranteed to be read-only and can be executed concurrently with other const methods on different threads.
  • The MemTrackable mixin class is not thread safe, so its claimMem() and disclaimMem() methods must be used under a lock or only on one thread.

Finally, in order to collect trace data on any given thread (including your main thread), you will need to construct a thread recorder object in the context of each thread:

<cpp>

  1. include "lltracethreadrecorder.h"

LLTrace::ThreadRecorder gMainThreadRecorder; // constructed on main thread by global init

void workerThreadFunc() {

   // must be constructed in worker thread context, otherwise it will usurp the data meant for the main thread
   LLTrace::ThreadRecorder worker_thread_recorder(gMainThreadRecorder);

} </cpp>

Notice that the thread recorder on the worker thread takes a reference to the main thread recorder in an argument to its constructor. This establishes a parent/child relationship which is required for aggregating trace data from worker threads upstream to the main thread for querying. You can create any form of reporting hierarchy you want, and it will always take the form of a tree. If you do not do this, any data generated on the background thread will not appear in queries executed against recordings on the main thread. If you do not create a thread recorder *at all* then you will not be able to gather any trace data. However, any code that attempts to generate trace data by calling sample() or record(), etc. will continue to run. This allows you to call code that generates trace data as a side effect in a thread context that doesn't bother with LLTrace at all.

You shouldn't normally need to interact with a thread recorder object (except for pushing and pulling data, convered below), but if you need to access the thread recorder for the current thread, you can call LLTrace::get_thread_recorder().

Finally, in order to push data upstream to your parent context, you will need to periodically stage your data by calling LLTrace::ThreadRecorder::pushToParent() on the child thread and then eventually consume any staged data by calling LLTrace::ThreadRecorder::pullFromChildren() on the parent thread.