Debugging – Divide And Conquer The Input Data

As a developer, the first step for solving any bug is to reproduce it. This is an important step before the investigation actually begins. Once the issue is reproduced, the developer starts investigating the bug with the knowledge, tools and debugging skills at hand.  The source of the problem is then determined, a fix is proposed and later checked in after completing the unit testing and other required processes followed by the team.   This may sound very simple but there are obstacles at every step unless the bug is very trivial.    This is routine work for any seasoned software developer.

One mistake developers often make is to jumpstart the debugging process sooner than needed.  They fire up their favorite debugger, set the breakpoints and then start debugging when actually they should have first reduced the data set required for reproducing the bug.

The rule of debugging is very simple  –

“Every developer should strive to reproduce the bug by hitting the required set of breakpoints the least number of times. This allows faster and efficient debugging sessions.”

So if you are hitting a breakpoint hundred times more than you should have, then you are either not using the debugger efficiently or you have not worked enough to reduce the data required to reproduce the bug.

The latter problem is more common as sometimes developers put in little effort to reduce the input data and start debugging prematurely.   This leads to longer debugging cycles where a lot of time is wasted on investigating code that is not even relevant to the problem being solved.

For example, if a there is 1MB text file that is processed by your program and crashes it, then one should first try minimizing the text file such that if it is reduced any further, the crash goes away.  The text file obtained in such a manner is the smallest input data set required to reproduce the crash.  Once this goal is achieved, the program will now be processing lesser data and effectively less code will be executed which results in important breakpoints being hit fewer times.

Well how does the developer go about reducing the input data set?  There is no single method but the one that commonly works is to run a binary search on the input data.  For example in the above case, the text file should be split into two.  Two text files of half MB each would be obtained and one should test to see if the crash reproduces with either one of the files.   If you still see the crash, you have halved the data set.  Then the smaller file should be again split into  two and the process should be repeated till a very small text file is obtained that still causes the crash to occur.  Depending on what your program does, you can even reduce the 1 MB text file to a single character file.  Debugging your program with a single character file is much simpler than using the initial 1MB file.

reducing data set
Reducing The Data Set

The Caveat

Sometimes a large data set may have multiple issues.  By reducing it as described above, one may solve a partial problem but other problems may go unnoticed in the reduced set.  Therefore once a bug has been solved on a reduced input data set, it should be tested against the one provided with the original bug.  This ensures that no other issue that should have been fixed got ignored in a bid to made debugging more efficient.

Final Note

Reducing the input data is essential before starting the debugging process and a great productivity aid too.  If possible, this should be a part of the bug reporting process for quality engineers or customers who often log the issue.    Reducing the data may not always be possible but it is certainly worth an attempt.

 

Debugging – Using Breakpoint Hit Count For Fun And Profit.

If you are familiar with hit count breakpoints already, you may want to click here to jump to the advanced tricks shared in this article.

What is the hit count of a breakpoint?

A debugger allows users to set a breakpoint at a specific line in code.  When the execution reaches that line, the breakpoint is said to have been *hit* and the execution of program being debugged is suspended.

Internally the debugger also keeps a count of the number of times the breakpoint has been hit.  This is called the hit count of a breakpoint.  Debuggers allow users to set conditions based on the hit count of the breakpoint.  For example, you can specify that the execution of the program should only be suspended when the hit count is greater than or equal to 250. To put it in other words, the breakpoint will be skipped for the first 249 times it is hit.

The advantage of being able to set a condition with the hit count of a breakpoint is to make the process of debugging faster.

 

How can hit count based breakpoints be set?

Debuggers today have either a command line or a graphical user interface.  Mostly all debuggers provide a means to set hit count based breakpoints. Below are steps on setting such breakpoints in some of the debuggers I have used.

Microsoft Visual Studio 2005

  1. Set a breakpoint at a line in your code.
  2. Right click the breakpoint and then click on “Hit Count”.  You can also go to Debug -> Windows -> Breakpoints  and right click on the breakpoint that was just created and select “Hit Count”.
  3. In the dialog that pops up, you can choose from four ways of controlling the breakpoint based on its hit count.  The default is to ignore the hit count and suspend the program always when the breakpoint is hit. It is good to take note of the other three options.
Hit Count Window In Visual Studio 2005
Hit Count Window In Visual Studio 2005

When the program is in suspended mode, one can see the current hit count of the breakpoint in the breakpoint window.  In the image below, the “Hit Count” column shows the current hit count of the breakpoints.

Visual Studio Breakpoint Window

gdb

In gdb, the command continue is used to resume execution of the suspended program. When followed by a number N, the breakpoint is hit the Nth time.

(gdb) help continue
Continue program being debugged, after signal or breakpoint.
If proceeding from breakpoint, a number N may be used as an argument,
which means to set the ignore count of that breakpoint to N – 1 (so that
the breakpoint won’t break until the Nth time it is reached).
(gdb) continue 20

gdb is available on Mac OSX, Linux, AIX, Solaris, HPUX and Cygwin on Windows, etc so this is one command one should learn by heart.

On Mac OSX, the XCode IDE uses gdb internally and allows access to it through the menu (Debug -> Console Log).  Through the command line interface continue can be used as described above.

In gdb the info breakpoints command can be used to view the current hit count of all breakpoints.

(gdb) info breakpoints
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x0040118a in main at try.cpp:6
breakpoint already hit 246 times
3   breakpoint     keep y   0x004011a5 in main at try.cpp:8

Visual Studio and gdb differ slightly in terminology.  One allows setting breakpoints with a hit count and the other lets skipping of breakpoints for a certain count.  However they are essentially the same features that allows the programmer the option of not having to stop always at a breakpoint.  In the subsequent sections, the term “set a hit count breakpoint” is used instead of “skip the breakpoint n times”.  It should be trivial to interpret the tricks in terms of skipping a breakpoint.

 

WinDbg

I did not find a way to set hit count breakpoints in windbg yet.
Here is how you set a hit count breakpoint in WinDbg.

  1. Go to the source view and set a breakpoint in the source code.  The shortcut F9 can be used to toggle a breakpoint.
  2. In the command window (alt + 1), list all breakpoints using the bl command.
  3. Take note of the breakpoint that you just set and copy the location of the breakpoint which is listed in the format of <module_name>!<function_name>+<offset>.  See example below.
  4. Now redefine the breakpoint with the bp command.  After the bp command paste the location that you copied in the previous step followed by the hit count.

0:000> bl

1 e x86 00000000`004113b2     0001 (0001)  0:**** test_project!wmain+0x42
0:000> bp test_project!wmain+0x42 2300
breakpoint 1 redefined
0:000> bl
0 e x86 00000000`004113b2     2300 (2300) 0:**** test_project!wmain+0x42

The hit count in the above example is set to 2300.  This current  hit count as shown above is decremented each time the breakpoint is hit but the execution stops only when this number is equal to 1.  The number within the parentheses denotes the hit count that was originally set by the user. 

Using Hit Count For Fun And Profit

Many developers set breakpoints without the hit count conditions.  There are lot of nifty ways in which a hit count breakpoint can be used.

Below are some scenarios which developers will find useful while using hit count breakpoints:

Break In A Loop More Conveniently.

Setting an unconditional breakpoint in a loop (e.g. for, while, do-while) may break execution more often than needed.  If you know the iteration of the loop when you want to suspend execution of the program, you can set a hit count breakpoint.

For example, in the while loop below if the intention is to break in the 21stiteration,  a hit count based breakpoint will be more useful and simpler than a conditional one.  Do note that in the loop below, the variable i does not increment by one.

<code>int i = 0;
while( !flag &amp;&amp; i &lt; N )
{
/* some code */
i *= 2;
}</code>

Likewise, the for-loop below traverses through the int vector using an iterator.  If the intent is to break when the 10thelement in the vector is being processed in the loop, then a hit count breakpoint will be more useful and easy to set.

<code>std::vector&lt;int&gt;::iterator iter;
for( iter = vec.begin() ; iter != vec.end() ; ++iter )
{
/* some code */
}</code>

 

Create A Quick And Dirty Profiler And Much More.

 

Part I

Profilers that instrument code log the time taken by a function and the number of times it is called.  It is the latter where hit count breakpoints are very useful.  The greatest advantage of being able to track the number of times a function is called is that you don’t have a to run the code through a profiler but you get the results with the same accuracy.  Moreover profilers may crash at times but debuggers are pretty stable when it comes to debugging code.

The trick here is to set a hit count breakpoint that will never be reached.  For example, set a hit count to an unpractically large value (say 1000000) and set one breakpoint at the program termination (for example at the end of the main() function).

When the program is run, due to the large hit count, the breakpoint will never be hit and only the breakpoint at the end of the program will be hit.  The debugger however has no knowledge that the breakpoint hit count is too large for it to be hit and therefore tracks the count whenever execution reaches the breakpoint.

At program termination, when the program gets suspended due to your second breakpoint, you have the debugger waiting to tell you what the hit count of the first breakpoint currently is.  In other words it just told you how many times did the line of code get hit before the program terminated.  That exactly is the kind of information that the profiler would have told you.  Voila – you have that quick and dirty profiler ready for use :-).

Maybe someday I will write about how how a breakpoint works internally and then you can relate the similarity between what do the debugger and code instrumenting profiler have in common.

The above trick is explained in the C code snippet below.

<code>
void profile_me()
{
/* set hit count breakpoint here with a very large hit count */
/* function code */
}</code>

<code>
int main()
{
profile_me();
/* Set the second breakpoint here and when this is hit,*/
/* observe the hit count of the breakpoint set above */
return 0;
}</code>

Part II – Smart Breakpoints

Another use of hit breakpoints is very similar to the quick and dirty profiler trick.  At times when one encounters a crash in a loop or in a repeated function call, it may make more sense to debug a few iterations prior to when the crash actually happens.  For example,  say a loop is processing tokens  and a crash happens while processing the 2520th token.  The crash itself may not make much sense once it has occured but it may help to know what happened 5 iterations prior to the crash.  That way, the programmer  can collect data for prior iterations and reach the crash condition.   This will equip the programmer with relevant data needed to solve the crash at hand.

<code>while( token = get_token() )
{
/* some code */
switch( token )
{
case token_1: /*do code */
case token_2: /*do code */
/* more case statements */
}
}</code>

The trick here again is to set a very large hit count so that the breakpoint is never hit.  Once the crash occurs, the hit count of the breakpoint is noted.  Then the hit count of the breakpoint is reset to 5 minus the hit count obtained when the crash had occured.  From now on whenver the hit count condition is met and the breakpoint is hit, the programmer will know that in 5 iterations a crash is expected.  The data collected for the 5 iterations may be essential for resolving the crash.

 

Part III – Matching calls.

Hit count breakpoints have yet another use in debugging – matching the call count for a pair of functions.  For example, for every malloc call a free call should have been made in order to have zero memory leaks.  Similarly, a constructor (for now assume there is only one) and a destructor of a class should be called equal number of times.  These calls have an opposite effect but their pair should match to ensure that resources don’t leak.

<code>
C::C()
{
/* constructor */
}</code>

<code>~C::C()
{
/* destructor */
}
</code>

The trick is to set two hit count breakpoints with very large values that will never be reached in both the constructor and destructor above.  Also a breakpoint should be set at the point of program termination (for example at the end of function main() ).  The two breakpoints in the constructor and destructor will not be reached due to the very large values.  When the program’s execution is suspended at program termination due to the final breakpoint, the hit count of the two breakpoints set in the constructor and destructor should be checked and hopefully their hit counts should match.  Here I am assuming the class C was not involved in creating global or static objects.  A mismatch of hit counts may suggest that not all objects of class C were destroyed and a possible resource leak should be looked into.

In summary, if there are two calls that should be called equally during the life span of a program, then this trick can be used to check that the call hit counts do indeed match.

 

Final Note

Hit count is a slightly under used feature of a debugger but it can be used in many innovative ways to gain better control over debugging.  It is not a replacement for profilers but a great tool when you do not have one with you at hand.  The infinite-hit-count-breakpoints are useful to keep track of code workflow as these breakpoints are set with the intention of never wanting them to be hit.  However the information that such breakpoints can provide can be pretty useful and accurate.

Web Hosting – Blogging 101 : Signing Up With A Web Host.

Choosing a web host is like buying any other product online.  One has a lot of choice and good competition owning to the declining hardware costs.

A web host is a server that will physically host your web site for everyone to access.  A web host is usually chosen on the basis of reliability (high uptime), good customer service, bandwidth + hardware provided and cost effectiveness.  Each web host will have something or the other to offer over others.  The best way to choose one is to read reviews online and see comparison tables on unbiased sites.  Everyone will have different priorities be it cost, bandwidth, space provided, etc.

webhost shopping

Below are the top lessons learnt while choosing a web host:

  1. Shared servers are cheaper than dedicated ones (duh!). The shared servers are the cheapest to host one’s website especially if it is your first.  Not a bad choice when you are just starting out and are yet to learn the tricks of the trade.
  2. Get discounts if possible. Be on the lookout for coupons / offers before placing your order.  Click here to read my blog entry that describes this.
  3. Browse a few of the sites hosted on the web host. This will give you an idea of the kind of response time the audience of your site can expect.  Check out http://www.myipneighbors.com/ to see the sites that hosted by your web host in addition to yours.
  4. There is nothing called free lunch. If some web host offers unlimited hard disk / bandwidth, just ignore the claims  as these sites are promising the world on the assumption that you never really need those many resources.  If you actually do, you will be restricted on the basis of CPU usage and the word “unlimited” will start making a lot less sense.
  5. Understand the services provided. For example, if you plan on writing a blog, check the version of PHP, MySQL and WordPress that the web host will provide.  If ssh is important for you, check to see if it included in the package or not.
  6. Choose a web host that provides a 30 day cancellation policy. Try to sign up with a host who has a cancellation policy and read it before you actually pay the money.  Get started with your site from day 1 so that you know whether the service provided by the host is acceptable or not.
  7. There is no free dinner either.  Make sure you are not taking the domain name (usually a promised lifetime freebie) from the web host.  If you cancel, you will usually have to pay for the domain name at a much higher rate.  Not taking the domain name from the web host gives you a lot of freedom to switch as per your will.  Remember the longer you commit yourself with the web host, the cheaper the cost will be.  However, you will be losing out on freedom to switch at will.

I think most of the time spent while finding the right web host goes into reading reviews, visiting forums and getting feedback on the shortlisted web hosts.  Getting hold of coupons and filling in the sign up form on any web host should not take more than 20 minutes.  The research is always worth the effort but do remember that people still make mistakes and learn from them too.

BTW the image above is my failed attempt to depict 99.9% uptime, a case where only 20MB of the unlimited harddisk space is used by a website, freebies offered by web hosts and your attempts to shop and search for something that suits your needs.

Web Hosting – Blogging 101 : Who Said Registering Your Domain Name Was Easy?

The first step while setting up a site is getting a domain name.

I think that the primary reason for not starting your web site is trying to get that name right.  And finally when you have it all figured, you realize that the name is already taken 🙂  And it doesn’t stop there.  The next 20 names (good or bad) that you can think of will be taken as well.  And then rather than searching for the domain name related to the one you initially had in mind, you start playing around to see if everything under the sun is taken.

It comes as a rude shock to the uninitiated that selling and buying of domain names is a trade in itself and a profitable one too.

Choosing the domain name is not easy.  SEO (Search Engine Optimization) gurus have a lot to share on how to choose a domain name.

?What Name?

In addition to what the pundits had to say, I introduced a few rules for myself (not necessarily applicable to everyone) to help me come up with a name I’d like to keep.

  1. Don’t choose a name with hyphens or numbers.  I have a personal dislike for them as they are difficult to remember.
  2. Don’t try to insert a city’s name in the domain name just because you live there.  Do it only if your blog covers a specific region.  To connect to a geography neutral audience, it may not always make sense to include the city / country in your domain name.
  3. Two words in a domain name is more catchy than three (or four or five for that matter).  Also the name gets to be a bit shorter with two words.
  4. Don’t use words that others are bound to misspell or misinterpret.  mysiteforeveryone.com may be typed in as mysite4everyone.com and your potential readers may never visit your site.
  5. If you are not into domain name trading, don’t take time bound names. myeuro2008sitehere.com is not going to make much sense since Euro 2008 is now over.
  6. I for one am not very fond of remembering URLs.  Domain names should have an inherent bookmarking capability. The words contained in the URL should be memorable and a person typing them in Google should be able to reach the site with ease.  Remember the keywords techno+chakra for your future searches 😉 [update: the site technochakra.com is now mohit.io]
  7. Before making the payment for your domain name, break its name into words and put them up in Google’s search box.  If the search results bring up embarrassing or negative stuff which you wouldn’t like to associate your site with, then you may want to give the name a pass.

However, at the end with all the rules and the free advice at your disposal, you will end up taking the name that is available :-).

A free time saving tip: Search for domain + names + ajax in Google and you will come across sites that will do search-as-you-type lookups for domain names. Given that a lot of domain names are already taken, such sites make it faster for you to search for that name you have been looking for.

Web Hosting – Blogging 101 : Top 5 Lessons Learnt.

Getting started with web hosting is not always easy.

If you are the kind who likes to do the research before taking the plunge, the amount of information on this topic can be very overwhelming. This is true especially if this is your first time trying to setup something on a web host.

Confused

There will be those who are just trying to setup that one blog that they have been planning to write ever since and then there are those looking forward to setup complete sites.

The top 5 lessons I learnt while digging for information on blogging / web hosting are :

You cannot learn all the steps before starting. Try to take one step at a time.  It does not matter if you do not know what you will be doing next.  If you are trying to register a domain name you should conveniently delay learning everything else about web hosting.  That way you will actually make progress.

Research takes time and even the best search engine (read as Google) can throw data at a rate you cannot possibly dream of understanding.  Don’t worry – eventually everything will start making sense.

There are good authors out there who write really good stuff. If you know how to use Google efficiently, you will manage to step over the various commercial sites who only want to sell you domain names, web space, dedicated servers and rank very high in the search results.  Learn the art of Google if you haven’t already.

The dilemma of choice. When you have decided to shell out money for a particular service provider, (say a web host), you are bound to come across a post that criticizes that very service.  If you take each experience very seriously, you will end up rejecting every provider out there.   The fact is that you will have to choose someone who does not have too many bad reviews or is not the one who was rated well in the past but has lost credibility over time.

There are discount coupons at every step of web hosting. Before paying money for a domain name or a web host, make sure you search for discounts available through coupons.  Just type the_name_of_site_you_are_going_to_pay + coupons in the search box and then let Google do its magic. Don’t be cheap but saving money is absolutely fine.  I would suggest you first decide a web host and then search for coupons than the other way around.