“This program has performed an illegal operation” – Why are error messages so bad?
Last week Thomas Fuchs wrote an excellent post on how to write a great error message. He shows plenty of examples of all-to-common terrible error messages, and has solid advice on how to do it better.
For me this sparked the question, why has the software industry been so bad at this, and for so long? When I was in grad school, I made money on the side teaching people (mostly middle-aged) how to use their home computers. When I went to visit one of my clients, she was visibly shaken as I walked in the door. She told me she just got a message saying she had performed an “illegal operation.” She was genuinely concerned that it might have been automatically reported to the police. I had to explain to her that “illegal” had a different meaning to programmers, and it had nothing to do with criminality.
As someone who’s been responsible for my own share of unhandled errors and poor error messages over the years, I’ll share my thoughts on why this happens, and what to do about it:
-
Errors lost by the wayside of the “happy path:” developers, project managers, and most everyone involved in developing an application are focused on how to deliver the features they want their customers to use. The desired series of actions we want users to take is the “happy path” through the application. In developing and testing, we get tunnel vision, tending to use the application the same way, over and over again. But actual users will do all kinds of things with an application that we developers never dreamed of, and unintentionally will come up with novel ways to break it.
Many years ago I had a formative experience as a junior developer: I was invited to a professional user testing lab, complete with one-way glass for watching participants. After months of working on the application being tested, and clicking through the same screen hundreds of times myself without incident, I was astonished to see a user completely crash our application in less than 60 seconds.
Also, we developers often make all kinds of implicit assumptions about the environment of the application: database connections, API dependencies, browser versions, etc. We often don’t provide good error handling for when dependencies in the environment fail or don’t behave as we expect.
- Lack of cross-functional teams: many organizations have tried to solve these problems by having dedicated testing teams. These teams are often great at finding errors, but then their reports are “thrown over the wall” back to the developers. The developers themselves may be divided into a database team, a back-end coding team, UI team, etc. A UI developer may be asked to add an error message, but this developer may be dealing with results from code created by a back-end developer that returns only “true” or “false,” indicating only whether the function worked out not. This leaves them with little useful information to communicate back to the user. A situation like this may very well be the story behind this Windows 10 error:
- No recognition of business value: this is the key issue. The quality of error handling will only be as robust as the weight its given in the cost/benefit analysis that goes into the prioritization of work. For many projects I’ve seen, error handling often doesn’t come up as a point of discussion in the planning process, as an area where time and money needs to be dedicated. Which browser versions need to be supported? How often and what kinds of user testing should we do? How should we handle an API outage? Questions like these often go unraised (now that I’m old and wise, I always make sure to raise them 😉 ).
Error handling is an especially important issue for a consulting company like ours. Nothing will shake a client’s confidence in your ability more than seeing the application you’re developing for them crash with a cryptic and unhelpful error message. How do we address this, and how do we do it without driving the budget for a project through the roof?
- Have Agile, cross-functional development teams: this removes the organizational barriers that prevent testers, developers and UI designers from working closely together. It allows their focus to go where it should: to the needs of the user, instead of being driven by the implicit incentives of organizational divisions. This approach doesn’t add cost, and may even decrease it.
- Have a standard design pattern for handling errors: dealing with ugly error conditions only when your boss notices or a customer complains is a recipe for inconsistent results and messy, hard to maintain code. A better approach is for the team to develop a standard for how error conditions will be reported up the stack (e.g. from the database, to the business code, to the UI). This facilitates making error handling a routine and consistent aspect of the development process. Fuchs also provides excellent advice on the front-end aspect of this: having good and consistent UI elements for displaying error messages, and using clear, human-readable language.
- Have standards for testing: you should have an automated test suite that confirms your error handling continues to function as expected as the application code evolves and changes (as it would be prohibitively expensive to manually and repeatedly test all the edge cases in the application). Usability testing with real customers is also important, but when and how to do this depends on several factors, such as whether the application is intended for use by the public.