Saturday, May 31, 2008

"If you didn't write it down, it didn't happen."

Cliff Stoll, in his enjoyable book Cuckoo's Egg, attributes to astronomers the maxim:
If you didn't write it down, it didn't happen.

A quick Google-search of the phrase attributes the same idea to:
  1. experimental scientists
  2. the US FDA
  3. chemists
  4. doctors
  5. and the Florida Academy of Physician Assistants
If the idea is good enough for astronomers, experiments, regulators, guys in labcoats and the Florida Academy of Physician Assistants, then we software developers might also want to have a look at it.

But following on the heels of my previous eulogy for Fred Fish, let me change it to:
If you didn't log it, it didn't happen.

Logging. Log everything. Don't think you'll need to log it? You will regret not doing so. By all means log errors, warnings, and rare events. But also log successes, decision points, iteration counters, enters and exits!

A few weeks ago I was working with a large group of people from various locations around Canada (Halifax, Ottawa, Toronto, and points between) who had come together for a networked experiment. In this experiment, several very large computer systems which handled data from diverse sources would, for the first time, be integrated into a single entity.

The central actor for this drama was a complex piece of translation software, whose main role was to enable the other machines to talk together. The translator would receive data in one dialect, and convert, transform, recast, fold, spindle and mutilate the data until it was in a form acceptable to the other machines.

Unsurprisingly, it didn't work very well. Messages were lost, never generated, duplicated, wandered away; day after day of laborious tracking and second-guessing what was happening. Although this is fairly standard when first integrating large heterogeneous systems, what was frustrating is that the translation software -- the only entity in the system which had the potential to know all that was happening -- had almost no ability to log what it was doing.

A few hundred lines of code, a few more switches and print statements, could have saved dozens of person-hours and thousands of dollars in travel time. Invest upfront to save long term.

Using Fred's approach enables extensive logging without extensive time-space penalties, as you can choose at run-time the statements you wish to enable. I have a real-time, embedded control architecture that measures time in microseconds, but it still has massive Fish-style logging implemented throughout. You can do it!

Log to debug. Log to test. Log to verify. Log to diagnose.

Remember, if you didn't log it, it didn't happen.

No comments: