Profiling and Optimization
Revision as of 08:03, 21 June 2007 by Doug Linden (talk | contribs) (New page: First, this article assumes that you're working with an existing and essentially complete piece of code or product that you want to optimize. However, if you are working on performance cr...)
First, this article assumes that you're working with an existing and essentially complete piece of code or product that you want to optimize. However, if you are working on performance critical code (rendering code falls into this category, for example), you will have wanted to start thinking about performance long before you reach this stage. A good article to read is this one: http://www.acm.org/ubiquity/views/v7i24_fallacy.html
Secondly, I would recommend reading the following article - if only to have an understanding of terminology that will be used. http://en.wikipedia.org/wiki/Performance_analysis
First things first: ALWAYS profile before optimization.
Profiling
- Choosing a test case:
- If possible, have a reproducible test case in your production environment.
- A good test case may not be steady-state. It may be better for your test case to reproduce the production environment than have it be repeatable.
- If you are performing micro-optimizations that you can't reproduce, generate a good synthetic test case that heavily exercises what you're working on, and optimize that.
- Choosing the profiler:
- If a profiler is slowing down your application, it is probably distorting your results. This is usually because the profiler is adding/running significant amounts of instrumentation which is doing a lot of work. In addition, in network and graphics applications, significant performance degradation can significantly impact things such as the amount of network traffic recieved and the amount of time spent blocking waiting for the GPU.
- For the above reason, I generally prefer time-based sampling profilers over instrumented call-graph profilers. The one disadvantage of most time-based profilers is that they don't generally have call-chain information (they only keep track of the function where CPU time was spent, not where it was called from) - although at least one profiler (Shark on OSX) is capable of doing this without performance degradation).
Optimization
- Always try to do algorithmic and architecture-based optimizations before code optimizations.
- Memory and cache considerations are often MUCH more important than instruction count. This is a relatively recent phenomenon. On some architectures, you can run dozens of instructions or more in the amount of time that it takes to fetch data from main memory. Thus, when optimizing, you may want to actually pay more attention to how your algorithms access data than the number of instructions that they use.
- Don't write assembly unless you absolutely have to. It's not easy to make cross-platform, and unless you are REALLY good, compilers are often better than you are. If you need to, restructure the C/C++ code to make it more clear to the compiler what you are attempting to do.
A list of profilers that we've used at Linden:
- Shark (Mac) - my preferred profiler right now. Sampling profiler, but it still manages to have call chain information. Great UI. Unfortunately, can't use it for everything, as it's Mac only, and graphics stuff is very different on the Mac vs. PC.
- oprofile (linux) - another sampling profiler. No UI, no call chain information. However, very easy to use on Linux.
- VTune (Windows) - sampling and call graph profiler. The UI is terrible - but there aren't many other sampling profilers available on Windows.
- GlowCode (Windows) - instrumented call-graph profiler, but a lot faster than typical - so it runs in near real time. If you want call chain information on Windows, this may be the way to go.