Tag Archives: performance

Tips For Investigating Performance Issues

Performance issues usually take the longest to resolve and require a lot of patience, luck and testing iterations before a satisfactory outcome is reached.  Performance tuning is a part of most development projects that have a substantial code base or do a very CPU intensive job.

This article provides few basic tips that development and testing teams can perform to reduce the time required to resolve such issues.

Reproduce The Performance Issue

Well this is true for any bug but more so for performance issues.    Time reported in an original bug report may not necessarily match with that on a tester’s machine which in turn won’t necessarily match with that on a developer’s machine.   These numbers depend on the state of a machine, the hardware used and the version of the software being used to report the problem.

Therefore it is important to recalibrate the numbers with the latest sources and see whether the percentage of degradation reported matches with the original bug report or not.

Find A Build Or Change That Introduced The Issue

A performance issue does not necessarily require intense profiling to arrive at the cause of the issue.   At times it is important to be able to identify the build number or the change in source code that has caused the degradation.   Knowing a range of changes or the exact change that caused the degradation amounts to solving half the problem.

The maximum time taken to resolve performance problems is finding the bottleneck and the rest goes into solving the problem itself.    A profiler is not the only quickest means to find the bottleneck.   The testing team can easily identify the build where the problem was first introduced.  Likewise, the development team should try to narrow down on the exact source change that introduced the problem.

Use Release Builds Only

Do not compute performance results or perform investigations on a debug build. With optimizations turned off and extra debug code, numbers from debug builds are neither accurate nor reliable.

The reason many teams use the debug build is because the profiles generated using debug symbols look more meaningful to them as programmers. Use a release build with debug symbols instead.

Use The Same Machine

If you are comparing performance between two different versions, use the same machine to compute the numbers. This ensures that differences in processor speed, state of the machine, etc do not play a role in the difference in timings.

Catch Issues Early

The testing team should regularly run performance tests and report issues immediately. This is a huge time saver and helps narrow down the problematic code early.   All results and binaries used during the testing should be preserved so that if any issue if found late in the cycle, it should still be possible to narrow down the problematic build number.

Performance suites take time to evolve but the time spent in putting one together is worth the dividends it pays in the long run.

Profile And Know Your Profiler

Though it may sound simple to narrow down the source and solve a performance issue, it is pretty common practice to use a profiler to determine the bottlenecks in code.   The idea is not to jump to the step of profiling without trying to narrow down problems through other means.  Once you know you require a profiler, make sure you understand the technique your profiler uses behind the scenes.

Some profilers “instrument” code and insert extra code to collect the timing information. Profilers of this category are accurate for relative comparison but may crash if they do a bad job at instrumenting the code.

Some profilers “sample” the state of the program by collecting the call stack of the program at regular intervals. These profilers indicate the area of the code where the program spends the maximum time without modifying the running program. Here the sample interval and duration of sampling determine the usefulness of the generated profile.

It is important to understand how your profiler collects the data which further aids in its better interpretation.


Performance and profiling do go together but it helps to execute a few steps and to setup some processes in a development cycle that quickly and accurately help in narrowing down the source of the problem.

Profiling with procexp (process explorer)

While running your program in memory intensive workflows, you may often run into a situation where the low memory condition starts to thrash the system.  Such a program usually exhibits a performance problem as it has consumed most of its available virtual memory.  Developers often use their favorite profilers to figure out the performance bottlenecks though profilng such programs is difficult and sometimes impractical.  A lot of good profilers crash when collecting data in such situations.  They need additional memory and resources to collect and consolidate the data which makes the situation even worse.

The good thing about a program that is thrashing a system is that it tends to be in the slow portion of the code for a long period of time.  So when it is bringing down the system, it is mostly executing the code that is responsible for the situation and all we want is to take a peek at the call stack at that very moment.  An obvious choice is to use a debugger or a profiler but given the low memory condition of the system, one may get little help from such tools.  When debugging or profiling become painfully slow, people may get evil ideas of reformatting their system or may start comparing their state of the art machines with ones they had five years back :).  This article describes a light weight profiling trick where one can get the call stack of the unresponsive program without really loading the system further.

Procexp (Process Explorer) is a tool from sysinternals (now Microsoft) and both the download and documentation is available at the link here (Microsoft’s site).

The tool allows one to view the call stack of any running program on the system.  Below are the steps needed to display a call stack of any running program.

  1. Launch procexp.
  2. In the process tree, find your process that is thrashing the system.
  3. Right click on “Properties” and select the “Threads” tab.    

    procexp threads tab    





    procexp threads tab
  4. Sort the “Cycles Delta” (on Vista) or “CSwitch Data” (on Windows XP) column in descending order and select the topmost thread.  For some programs there might be just one thread.
  5. For the selected thread click on the stack button to see the current call stack of the program.  Do note that this is a snapshot of the call stack and does not change dynamically.      

    procexp call stack    





    procexp call stack

This call stack can provide a good insight of the area of the code that is causing the system to stall.  In the example above functiondothis() is where the thread is spending the most time.  Take more than one sample to reconfirm the findings.  This is a very unintrusive and light weight method of getting a call stack of a running program.  The same trick can be used to debug a hang but there a debugger works equally well. 

Sometimes you don’t need heavy debugging tools and sometimes you just can’t use them.  Procexp is a nifty debugging utility (in addition to being a process explorer) that a developer should download and keep handy for times when nothing else works.