Nielsen’s Heuristic Evaluation: Limitations in Principles and Practice

Recently, I was asked to review the usability of an audio configuration app and propose enhancements in the areas of navigation design, workflows, workspace management, and the overall usability of the application. Mandated by the business to adopt a cost-effective and a rapid evaluation method, I was compelled to choose Nielsen’s heuristic evaluation to assess the product’s ease of use.

Jakob Nielsen’s heuristic evaluation is a quick method to examine a user interface and identify usability issues. Since my prior experience with Nielsen’s heuristics wasn’t praiseworthy, I endeavored to explore other well-established principles like Bruce “Tog” Tognazzini’s First Principles of Interaction Design and Ben Shneiderman’s Eight Golden Rules of Interface Design. Tog’s principles have been around for years and were revised recently to account for the adoption of mobile, wearables, and internet-connected smart devices. They cover discoverability, readability, accessibility, learnability, latency reduction, and Fitts’s Law, which are some of the key principles to attain design success. Shneiderman’s guidelines focus on keyboard shortcuts, which can increase the pace of the interaction and are often a lifesaver for advanced users.

Nielsen’s heuristics have the ability to reveal usability issues, but at the same time they fail to capture some of these important areas, which have stemmed from evolving technologies over time. The problem lies not only with the guidelines themselves, but also in the way the heuristic evaluation is practiced.

What Constitutes Good Qualitative Research?

Well-formulated user research helps businesses drive their future strategies. It is a proven fact that no single research method has the ability to uncover the array of a user’s unmet needs. There are both qualitative and quantitative techniques that are combined to achieve a research goal. Over the years the most debated topic has centered around judging the quality of qualitative research. Researchers have questioned the effectiveness of qualitative study since there is a likelihood of subjectivity and researcher bias. It is a normal practice to have a single interviewer set up a questionnaire, talk to the end users, collect data, and analyse the data. While executing the entire process, the researcher bias gets added into the study. Termed as confirmation bias, a researcher tends to add supporting facts to back his/her own belief. The process is no different in the case of heuristic evaluation. What constitutes a good and trustworthy qualitative study is still a matter of critical debate.

In their paper “Ensuring Rigour and Trustworthiness of Qualitative Research in Clinical Pharmacy,” Muhammed Abdul Hadi and S. Jose Closs offer strategies to further evaluate the quality of qualitative research. They propose methodological techniques that include triangulation, self-description, member checking, prolonged engagement, audit trail, and peer debriefing.

  • Triangulation – A strategy where the intent is to collect two or more related data sources, data collection methods, or researchers to reduce the bias of a single source method or researcher.
  • Self-Description – This enables researchers to explain their position within the study and how their personal beliefs and past training have influenced the research.
  • Member Checking – Validation of data by formal and informal means, analyzing themes and categories, and interpreting and concluding them with study participants
  • Prolonged engagement – A prolonged engagement with the user can help an evaluator bring out important issues. A user cannot learn the entire software in a short time frame. Over time the user becomes an expert and gets a deeper understanding of the entire anatomy of the software.
  • Audit Trails – In qualitative research, audit trails make it possible for others to see how analysts achieved their decisions through point-by-point documentation of every part of the examination procedure.
  • Peer Debriefing – Discussing a research topic with a disinterested peer can help researchers illuminate newer angles of data interpretation and often act as a good basis for identifying possible sources of bias.

The Problem with Nielsen’s Heuristics

When Nielsen developed the heuristic guidelines with Rolf Molich in the 1990s, user interfaces were not considered as critical or complex in terms of navigation, workflows, aesthetics, and layout as they are today. Though it cannot be denied that earlier generation user interfaces suffered major usability issues, they were disregarded by the fact that there was a lack of usability awareness and the interaction problems felt by the user while accessing websites or software applications were never realized and reported. As mentioned by Randolph G. Bias and Deborah J. Mayhew in their book Cost-Justifying Usability: An Update for an Internet Age, enterprises used to put up strong resistance since they felt that usability science was disruptive, expensive, and time consuming. The real need for usability began to be felt as user interfaces became more complex. There was a noteworthy move from building simple static websites to creating dynamic, real-time, data-oriented web applications. At this juncture, the majority of enterprises integrating usability as a practice embraced Nielsen’s principles as a quick method to reveal usability issues. Nielsen’s heuristics continued to gain popularity over time.

However, since these heuristics are one dimensional and were shaped with desktop applications in mind, one of the biggest challenges the guidelines face is scalability. They are less effective with the next-generation design ecosystem where conversational user interfaces, multimodal interfaces, tangible user interfaces, and wearables are taking precedence. These interfaces are built for emerging users with newer interaction rules that tend to solve unique design problems.

In an article “Usability Expert Reviews: Beyond Heuristic Evaluation,” author David Travis points out that Nielsen’s heuristics can be challenged by the fact that these principles, which are widely used, have never been validated. There is no evidence that applying these heuristics in the design and development of a user interface will improve its usability. Dr. Bob Bailey, president of Computer Psychology, Inc., mentioned that a better, research-based set of heuristics was proposed by Jill Gerhardt-Powals. Created in 1996, these rules were planned with reference to how humans process information. In the paper “Cognitive Engineering Principles For Enhancing Human-Computer Performance,” Gerhardt-Powals claims that a cognitively engineered interface is prevalent in performance, satisfaction, and workload when contrasted with interfaces that are not cognitively engineered. Research was carried out with 24 Drexel University students to support the hypothesis.

As discussed in the paper “Discount Usability Testing” from the University of Calgary, heuristic evaluations suffer from oversimplification. Adopted from the original version of the usability evaluation method, which has more than a thousand entries, Jakob Nielsen created an optimized set to make the guidelines simple, easy, and fast — making it generic in nature. This generic nature often misleads designers and never helps them uncover usability issue of a specific nature.

A usability expert can recommend ways to improve the usability of an application using the heuristic evaluation, but measuring user satisfaction post-implementation is a step that is often omitted. This constrains us to live in a state where user satisfaction scores are never integrated into the design process. Usability testing should be tied to the design process to validate whether the heuristic guidelines actually made a difference to the product.

Nielsen heuristics prompt two distinct problem types.

The way the heuristic evaluation is practiced

The main reason the heuristic evaluation is losing viability is a direct result of it not being practiced as articulated by the rule book. Nielsen states that a heuristic evaluation should be a group effort, insisting no individual can examine all the issues and that  “different people find different usability problems.” Due to a shorter timeline and budget constraints, heuristic evaluations are often conducted by a single evaluator and the subsequent researcher bias is completely ignored. The researcher does not describe his/her personal belief system and the setting under which the analysis was carried out. Since there is no shared mental model between the user and the evaluator, certain usability issues identified during the evaluation process may turn out to be false positives. The severity ratings—a combination of frequency, impact, and persistence of a particular interface-related problem assigned by an evaluator—are subjective. The end users might have completely different pain points which are not even realized by the evaluator. An evaluator might find an additional click as a core usability issue, but the end user doesn’t mind since they are used to it and works well for them.

The guidelines prescribed by Nielsen have limitations

  • Visibility of the system status – In this guideline Nielsen talks about keeping users informed about what is going on through appropriate feedback within a reasonable time. Nielsen never stated that status information ought to be precise and simple to see. It is often a case while installing an update that the system shows a false notification about the time it will take to implement the update. (When it says 5 minutes it typically takes more time than that.) Sometimes time stamps are not available and the status bar keeps animating. The interface feedback should be prompt, meaningful, and easily understood so that users know their actions are noted by the system. These key points are lost in this rule. Also, displaying active states in limited real estate is an impossible task. Today, even desktop apps are adopting concealed navigation patterns in the form of the hamburger menu. When the navigation is invisible, so is the active state.
  • User control and freedom – This guideline addresses the “undo” and “redo” actions and a clearly marked “emergency exit” to leave an unwanted state. The guideline is presented in an abstract form with no clear indication as to what extent the user should have control over the system. Should the control be extended to modify the anatomy of the software? Should the control be over the data or over the windows and dialog boxes? The guideline talks about undoing an action, but to what extent is not mentioned. Will an undo command revert a stream of actions or just one action? As stated in a course lecture from MIT, taking the case of performing an undo in an application with various concurrent users—like a common system whiteboard where anyone can scribble—confronts the topic of whether undo should influence just a user’s own actions or everyone’s actions. These intricacies were never thought of and discussed in the guideline. A good interface offers the user a platform to explore. Cancel, undo, and back buttons are essentials in building an interface, but once a user decides which streams of actions to undo or cancel, the next question is how to divide the stream into units or chunks as desired by the user. For example, in a wizard-based navigation, if a user completes a number of steps and finds it is not solving the purpose, should an undo at that moment reverse just the previous action or roll back the entire wizard? The guideline fails to manage such complexities. Take the example of choosing a password. A typical system is tied to certain mandates and guidelines and prompts the user to choose a lengthy and complex password that becomes difficult for a user to memorize. In this instance, the purpose of the guideline is defeated since the system dictates the password rules, leaving no room for freedom to the users. Possibly the guideline requires more expansion on a case-by-case basis.
  • Recognition rather than recall – In this guideline, Nielsen only scratches the surface on the ways to maximize recognition and minimize recalls. He recommends removing visual clutter, building on common UI metaphors, and offloading tasks. But considering the multifaceted nature of interface design, with newer interaction patterns being adopted, metaphors have short stints. Metaphors are either fundamental concepts or they are acquired knowledge from the past, which is then applied while interacting with something new. It is true that people use acquired knowledge of other things while using something else. The classic example is working with a word processor and mapping the experience of using a typewriter. A mobile user who has never experienced a multi-touch gadget won’t suddenly consider performing a double-tap to expand content; it’s not an obvious thing to do and since there is no acquired knowledge, it is hard to recognize. An article titled “Why UX Designers Should Use Idioms Rather Than Metaphors” mentions that since metaphors depend on pre-existing knowledge, it is a challenging task for a designer to create a metaphor from a limited pool of objects and actions. The article further concludes that common metaphors are not permanent. Recognition requires only a simple familiarity decision. When a design is built using familiar items it is widely accepted. However, as we move forward, UI design frameworks are updated with newer patterns, making the lifespan of familiar items shorter. This is particularly true for the emergent users who have not experienced the progression of user interfaces and the evolution of interaction patterns. For them, familiarity is nothing better than a short-lived specimen.
  • Flexibility and efficiency of use – While Nielsen mentions accelerators, keyboard shortcuts—which are a significant way to speed-up interaction and minimize task time—are not indicated in the original guideline. There are many tailored versions of the guideline that possibly address that issue; adding shortcuts is just one aspect. Applying flexibility in the interface needs a broader look. To induce flexibility, a user should have more than one way of doing things rather than following a linear pattern. For instance, there are four different ways to close a modal dialog box. It can be closed either by clicking on the cancel button, the cross icon located at the top right corner of the window, by pressing the escape key, or by clicking outside the modal window. The guideline misses out on the perspective of adding multiple entry and exit points and incorporating non-linearity to make a design flexible and efficient.
  • Aesthetic & minimalist design – This guideline fails to capture the essence of design and lacks depth. The guideline mentions only dialog boxes while a full-fledged interface is beyond a dialog box. The definition of minimalist design is not clearly stated. With each designer interpreting the meaning of minimalist design in their own way, the outcome becomes subjective. Does aesthetic and minimalist design mean an uncluttered interface or a progressive disclosure of information at the cost of an extra click or an additional user input?
  • Human visual system – Nielsen’s principles do not talk about the importance of the human visual system and how it affects interface design. Placing items in a predictable place is a key design consideration. As mentioned in the article “F-Shaped Pattern For Reading Web Content” by the Nielsen Norman Group, an eye-tracking study conducted with 232 users revealed that the dominant reading pattern looks somewhat like an F. However, the study never disclosed the monitor resolution used to conduct the experiment. While users are moving to different screen dimensions for web browsing, sometimes the content becomes scattered or gets dense and the eye travels to unpredictable directions in a way to find the content. In such a scenario, the F pattern is bound to break.

Conclusion

In his 2013 “Mobile Usability Features” lecture at Google, Nielsen stated, “Usability science reveals the facts of the world,” and “You’ve got to design for the way people actually are.” However facts of the world are mutable over time. They are replaced with newer thoughts, visions, and insights. Nielsen’s heuristic principles lack in their ability to adapt to this reality. While the guidelines remain a standard assessment tool, they fall short on scalability and adaptability to the changing ecosystem of design. With the proliferation of smart devices, wearables, and mobile phones, the global design language is being updated with new rules.

The heuristic evaluation is a quick and brief method to reveal usability problems. An evaluator must have domain knowledge, have undergone training on the software that has to be evaluated, and must have clear direction from the business on what is expected out of the study. In evaluating a web-based communication tool for nurse scheduling, a study conducted by Po-Yin Yen and Suzanne Bakken reveals that while usability experts detect general interface issues, end users are the ones who identify serious interface obstacles while performing a task. Therefore, leveraging empathy and establishing a shared mental model between end users and designers holds the key to understanding end-user problems better and routing them appropriately to design better systems.

How to Conduct a Better Heuristic Evaluation

  • Use multiple evaluators – Involving multiple evaluators will minimize researcher bias.
  • Severity ratings – Severity ratings are subjective. The best way to make them effective is by making use of multiple evaluators and taking the mean average.
  • Understand business goals – Look for usability standards that best suit your product. A combination of multiple standards might be a better approach depending on the type of interface you’re evaluating. Look to Tog’s First Principles of Interaction Design and Shneiderman’s Eight Golden Rules of Interface Design to bridge the gap between Nielsen’s heuristics and the areas in which they’ve fallen short in their adaptation to today’s current design ecosystem.
  • Domain knowledge – Get a solid understanding of the domain before studying a piece of software so that a shared mental model can be established between the evaluator and the end user.

 

Ballav, A. (2017). Nielsen’s Heuristic Evaluation: Limitations in Principles and Practice. User Experience Magazine, 17(4).
Retrieved from http://uxpamagazine.org/nielsens-heuristic-evaluation/