Saturday, May 31, 2008

"If you didn't write it down, it didn't happen."

Cliff Stoll, in his enjoyable book Cuckoo's Egg, attributes to astronomers the maxim:
If you didn't write it down, it didn't happen.

A quick Google-search of the phrase attributes the same idea to:
  1. experimental scientists
  2. the US FDA
  3. chemists
  4. doctors
  5. and the Florida Academy of Physician Assistants
If the idea is good enough for astronomers, experiments, regulators, guys in labcoats and the Florida Academy of Physician Assistants, then we software developers might also want to have a look at it.

But following on the heels of my previous eulogy for Fred Fish, let me change it to:
If you didn't log it, it didn't happen.

Logging. Log everything. Don't think you'll need to log it? You will regret not doing so. By all means log errors, warnings, and rare events. But also log successes, decision points, iteration counters, enters and exits!

A few weeks ago I was working with a large group of people from various locations around Canada (Halifax, Ottawa, Toronto, and points between) who had come together for a networked experiment. In this experiment, several very large computer systems which handled data from diverse sources would, for the first time, be integrated into a single entity.

The central actor for this drama was a complex piece of translation software, whose main role was to enable the other machines to talk together. The translator would receive data in one dialect, and convert, transform, recast, fold, spindle and mutilate the data until it was in a form acceptable to the other machines.

Unsurprisingly, it didn't work very well. Messages were lost, never generated, duplicated, wandered away; day after day of laborious tracking and second-guessing what was happening. Although this is fairly standard when first integrating large heterogeneous systems, what was frustrating is that the translation software -- the only entity in the system which had the potential to know all that was happening -- had almost no ability to log what it was doing.

A few hundred lines of code, a few more switches and print statements, could have saved dozens of person-hours and thousands of dollars in travel time. Invest upfront to save long term.

Using Fred's approach enables extensive logging without extensive time-space penalties, as you can choose at run-time the statements you wish to enable. I have a real-time, embedded control architecture that measures time in microseconds, but it still has massive Fish-style logging implemented throughout. You can do it!

Log to debug. Log to test. Log to verify. Log to diagnose.

Remember, if you didn't log it, it didn't happen.

Friday, May 16, 2008

Fred Fish' wonderful idea

/*
* FILE
* dbug.h user include file for programs using the dbug package
*
* AUTHOR
* Fred Fish
* (Currently employed by Motorola Computer Division, Tempe, Az.)
* hao!noao!mcdsun!fnf
*/

Fred Fish was a well-loved Amiga programmer. Back in 1987 (yes, we did have computers back then),he wrote and released this marvelous dbug macro package.

(In case you're wondering, that list of names with exclamation marks between them is how we used to give our e-mail address. It's called a "bang path", and each entry is a machine that is capable of exchanging e-mail with the next machine. The last entry -- fnf -- would be Fred's e-mail id. I am very grateful for user@machine.domain addresses!)

Fred's idea was to incorporate the debugging print statements permanently in the code. That is nothing spectacular in itself, but the clever thing he did was allow the programmer to tag the individual debugging statements with text literal tags. For example, if you were monitoring the progress of a network connection in C, you might include a line such as:

  s = socket( PF_UNIX, SOCK_STREAM, 0 );
if ( s == -1 ) {
DBUG_PRINT( "sock", ("Main socket opening did not work!") );
perror( "socket" );
exit( -1 );
}


A method is provided for accepting a list of tags to activate via the command line. So, if you were having trouble with this program and its sockets, you could specify the "sock" debug flag (by convention, the argument was "-# d,sock") and all DBUG_PRINT statements tagged with "sock" would run. You could specify one or more tags to activate, allowing you to choose specific subsegments
of the system to test.

bash$ ./testprogram -# d,sock
sock: Main socket opening did not work!
socket: Connection refused


In addition to DBUG_PRINT, there was DBUG_ENTER( "module" ) and DBUG_RETURN which allowed run-time tracing of the program calls. Add a DBUG_PRINT attached to a "trace" flag, and you could also print out the arguments. DBUG_EXECUTE allowed you to connect an executable code fragment with a debugging tag.

I have used this concept, if not this implementation, in almost every major system I have constructed over the past 20 years. And every time I didn't use it, I regretted it. The ability to tap into logical streams within the program, for example, tracing network io, or memory allocation, or database connections, has been a lifesaver on more than one occasion. The second best approach -- using a numeric verbosity level -- is a pale imitation of this functionality.

Now, granted the original implementation needed to be extended. I've modified the system to accept and identify multiple threads, to permit multiple instantiation of objects, and to use symbolic flags instead of text literals, but the idea remains the same: logically group debugging statements under symbolic tags.

Whether small basic servers, or real-time microsecond-sensitive control systems, Fred's idea of logical debugging tags combined with run-time selection of those tags, has been a wonderful boon to me. And in researching (well, googling) other uses of his package, I was surprised and yet not surprised to find this 21 year old idea implemented in modern packages such as MySQL and dmake.

Learn a lesson from the Fish: symbolic, logical tagging of permanent debugging structures will provide great value to you and your users!

Thanks, Fred.