Logic VS Art in writting errors

As I sleep in my cosy blanket on a beautiful winter morning, I get a call from my onshore teammate (who is in a timezone where it's still morning). My teammate is panic-stricken and screams, there is something wrong with the production site, and we can't fix it!

Still, in my bed, I asked: "what is the error?". I get a feeble response - "It simply says - Something went wrong. Please contact the site administrator!" 

And then the journey of debugging begins. It is like shooting in the dark and hoping (and eventually praying) that arrow hits the bull's eye. 

Hours of frustration and effort, one discovers the bug. The issue was, someone had removed the rights from the folder to make the file write. 

Though simple, many a times, we end up spending hours in finding the issue or bug. And this not only causes frustration, sleepless nights but also makes one lose face to customer and customer lose their image to their users. 

So, how to handle such situations! 

Best answer could be, write systems such that they never have an error/bug. OR Test the system for all possible scenarios and fix any potential error.  But that is like living in an idealist world where alians and humans are friends and meet daily to have breakfast together.

Given that we don't live in such an idealist world, what do we do?  

IMO, we have the exception messaging system. Any exception, the message logged or displayed should be clear enough such that it directs the engineer in the right direction and fast. 

Let us break this discussion into two parts. 

Part One - What, where and when to capture errors. 
Part Two - How to display the error messages (after we capture them).

Part one is relatively simple. We put exception blocks at all possible places where the given logic can go wrong. And one should collect all exceptions in a central location. Use following principle to address the approach:

  • Centralised exception logging
  • View and search all exceptions across all servers and applications
  • Uniquely identify exceptions
  • Receive email alerts on new exceptions or high error rates

I am not going to spend a lot of time of part one because it requires more logic than art. 

Part two of this discussion is where we have to worry about "how" to capture or write or display the error/exception. 

Before we get into the "how" part, let us figure "where" all we capture/display the error. We catch/display the error in/at:

  • Logs & Audit trails
  • User interfaces (screens)
  • Email
  • SMS

And we can't apply the same rule of showing/capturing the identical massage everywhere. 

Depending on the medium, we have to change the text and decide on to what extent it has to be verbose. 

Hence, we can:

  • Know the audience (an engineer vs user)
  • Capture about the what and why it (error) happened
  • Suggest next step 
  • Show the exception 
  • Automatically inform the right stakeholders. 

From the above steps, all are logical. What requires more than logic and instead an artistic touch is where we are to display the message. 

We can, however, keep in mind the following guidelines: 

  • Let the message be not too technical:
    • e.g. "Due to unhandled memory (0x001100) allocation, the bootstrapping is failing" 
  • Let the message be not too simple:
    • e.g. "Something went wrong"
  • Let the message be not too kiddish:
    • e.g. "Opps! Sorry mate, something ain't right. TTYL"
  • Let the message be not too informative (sensitive)
    • e.g. "The access to folder ..\system\files\ is prohibited." - Displaying the server path is not a good idea. 
  • Let the message be not too stupid: 
    • e.g. "The password you have entered is same as that of user 486360. Please use a different password to create your account."

Hence, while writing/capturing messages, try to empathise with the developer who is going to debug it and also with the user who is going to read it. 

Errors give rise to stressful and serious condition, and a silly tone would be inappropriate. So, keep calm and provide that non-scientific touch while writing error messages. 



Post a Comment