Cognitive Interviewing: A Method to Evaluate Surveys

Acknowledgements

Sadly, my co-author, Jennifer Edgar, passed away before the publication of this article. She and I worked on a draft of the article together. I hope that I continued to capture her perspective in subsequent revisions and that she would be pleased with the final article.

I also want to acknowledge and thank our colleague Scott Fricker. He provided significant contributions to our early work before he passed away.

I am grateful for the opportunity to have collaborated with both Jennifer and Scott on this work. They both shared their deep understanding of survey methodology as we explored the similarities and differences in our methods.

Introduction

As user experience researchers, we often use surveys to collect data from participants easily and systematically. We may call them surveys, questionnaires, or even polls, but whatever we call them, we use them to collect feedback in a variety of situations, including research on users’ backgrounds and expectations, input on product features, as well as user satisfaction during a usability test.

We sometimes use surveys that are well-tested and proven to be reliable and valid (such as the System Usability Scale or SUS), but we often build our own, so we can address the specific issues that are most pressing for a particular project or usability test. These surveys may use a wide variety of question types, including open-ended, multiple choice (choose one or mark all that apply), or ranking. The questions can be displayed in a variety of formats, including individual questions vs. grids and drop-down lists vs. radio buttons. Although it can seem easy enough to write survey questions, the wording (and the response options provided) can unintentionally bias the responses and affect the decisions we make based on our findings. For example, asking people about shopping frequency may result in different responses, depending on whether they are thinking about shopping for food or shopping for clothing.

Therefore, as we do with any design project, ideally we work with design experts (e.g., survey methodologists or UX designers) and incorporate input from subject matter experts. We follow best practices as we create the product (in this case, a survey). However, we also conduct testing with users to be sure our final product does what we intend. This is especially important for surveys that serve as the primary source of information for a project.

Cognitive interviewing, also sometimes called cognitive testing, is one method we can use to help ensure our surveys are collecting the data we are really looking for. Cognitive interviewing has been pioneered by professionals in the field of Survey Methodology as a way to evaluate all types of questions across surveys for different topics or purposes.

The goals of cognitive interviewing are similar to those of usability testing, and in many ways the methods themselves are similar as well. In a cognitive interview, participants answer one or more survey questions, sometimes thinking aloud, then answer debriefing questions (what survey methodologists call “probes”) about their response process or experience. We then interpret their answers to the probes to learn how well the questions worked, if they measured the intended information, and if we should make any changes.

Although these two methods are similar in many ways, the differences are important for UX researchers and survey methodologists to understand. Running a traditional formative usability test on the survey will not get you the information you need to really understand how the questions are performing.

In this article, we discuss how cognitive interviewing and usability testing are similar and how they are different, so you can use cognitive interviewing to test your own surveys. For a more in-depth discussion of cognitive interviewing, see Gordon Willis’s classic book on the topic, Cognitive Interviewing: A Tool for Improving Questionnaire Design.

Cognitive Interviewing Basics

Cognitive interviewing can uncover potential problems in all the stages involved in answering a question. We can evaluate whether respondents understand the intended meaning of each survey item; whether they have, and can access, the information required to respond to the item; whether they can form a response; and how they select their final response given the response options provided.

Cognitive interviewing can identify a wide range of problems with surveys, such as the following:

confusing or vague question wording
inappropriate jargon or other unfamiliar language
inappropriate assumptions
complicated instructions
problems within the respondent’s response process, such as
- lack of knowledge of the required information
- inability to remember or find the information needed
- computation or estimation errors
- sensitivity to a specific topic
inappropriate or incomplete response options
complicated skip patterns (i.e., which questions respondents should skip over)

Finding and addressing problems such as these can help ensure that a survey is truly collecting the information we are looking for. Although we are focusing on using cognitive interviewing to evaluate survey questions in this article, you can also use the method to evaluate other types of products, such as print or online forms, and content that is primarily textual or graphic.

Cognitive Interviewing Specifics

To get into more specifics about cognitive interviewing, we compare it to usability testing, and we show how the two methods are similar and how they are different.

Goals

We use both methods to improve the product or survey we are testing. Sometimes we use the methods iteratively throughout the development process, incorporating findings from one round of testing and conducting another round to ensure that the changes addressed the issues uncovered and to identify further opportunities for improvement. However, each method has some additional goals as well. In some usability tests, we may collect quantitative measures to confirm that our final product meets our standards. Or we could compare our product to one from another organization. Likewise, with cognitive interviewing, we are often trying to improve a survey, but there may be other goals as well. For example, for a survey that is already in the field, we may want to explore respondents’ thought processes to help us interpret the data being collected.

Participants

With both methods, we aim to get participants who represent the target user groups. In a typical usability test, we may run about five to ten participants per user group. For a quantitative study, we would want more than that. For cognitive interviewing, although it is primarily a qualitative method, we might run 20 to 30 participants. It can take a while to recruit and run this many participants, then analyze the results, but this approach ensures that we collect information about a variety of situations and respondent experiences so we can identify even slight differences in comprehension issues.

Tasks

In both methods, we give the participants something to do, then get feedback about their experience. In usability testing, we present participants with tasks to complete, sometimes with a scenario to provide some fictional background information and motivation for participants to consider as they complete the tasks. It is important to choose the tasks carefully, as you will only get feedback on the tasks included in the test. Often, we use tasks that are especially important, difficult, or common.

Likewise, in cognitive interviewing, it may not be possible to test the whole survey, so we need to pick the questions and/or sections we test carefully. We often pick questions that are especially important or potentially difficult (e.g., a screener question that affects the questions that follow or a question covering a particularly complicated concept), or we may test how the question format affects the respondents’ understanding and responses (e.g., a grid vs. individual questions).

On the other hand, we rarely use scenarios in cognitive interviewing because we are interested in the participant’s actual response process, which you may lose if you rely on a hypothetical situation provided in a scenario. We only use scenarios (which survey methodologists call “vignettes”) when we need to evaluate certain situations where it is difficult to find the appropriate participants. For example, you might ask respondents to imagine they had just been fired and then answer questions about their employment status. The vignettes provide all the information the participants need to answer the survey questions. We then have to interpret the findings with the understanding that the participant’s actual response process may have been influenced by the method.

Protocols

The general procedures followed in both methods are similar. With either method, a typical session might include the following steps:

Introduce the researcher and team members.
Describe the process for the session.
Have the participant complete an informed consent form.
Have the participant perform the tasks or complete the survey.
Ask questions about the participant’s experience.
Thank and (sometimes) pay the participant.

With both methods, we usually ask the participants questions to learn about their experiences and thoughts during the session. We can ask questions while the participants are completing the tasks (or answering a question), after completing individual tasks/questions, or at the end of the session.

However, the types of questions we ask in each method are different. While a participant is completing a task in a usability test, we might ask questions like the following:

What are you looking for?
What did you expect to happen when you clicked that button?

After they have completed the task, we might ask questions like the following:

How easy or difficult was the task to complete?
How confident are you that you completed the task correctly?

At the end of the usability test, we might ask questions about the experience as a whole. We could use an established, standard survey such as SUS or ask more general questions tailored to the specific product. The questions provide us with information to help evaluate the usability of the product, which we interpret along with any observations and metrics from the test.

In cognitive interviewing, the questions we use (the “probes”) and when we ask them are different. We are less likely to ask questions while participants are answering questions. We are looking to understand their response process, which we could interrupt if we asked questions as they are responding. We can ask probes either after an individual question, at the end of a group of questions, or at the end of the whole survey.

After the participants have answered a question or series of questions for the survey, we ask probes that will help us dig into their response process. For example, we might ask questions such as the following:

In your own words, what was this question asking? How would you ask this question?
What does the term/phrase _________________ mean to you?
How did you decide on your response?
Did you consider any responses that you eventually rejected? If so, what were they? Why did you decide to reject these responses?

For both methods, it is a good idea to run several pilot participants through the process to be sure that the protocol runs smoothly. The pilot participants should ideally be members of the target audience to ensure that you uncover issues with the protocol, such as confusing language in the tasks, probes that are difficult to answer, or lack of knowledge about key concepts.

Another consideration for both methods is whether to ask participants to think aloud. We can use a think aloud approach with either method, but we need to consider the context of the test. Thinking aloud can give some rich qualitative information about the participant’s experience, but it can also affect their behavior.

Analyzing Data

In usability tests, we evaluate both qualitative and quantitative data, looking at all the data together to get a full picture of a product’s usability and how to improve it. For qualitative data, we record our observations and the participants’ comments during the session. We may also record the steps the participants took to complete the tasks. We analyze these qualitative data, looking for common themes and trends within and across participants. From these common themes, we identify potential usability problems.

In addition to the qualitative data, a number of quantitative measures are also useful in usability testing, such as task time, success rates, error rates, and satisfaction ratings (taken after each task and/or the end of the session). Some usability tests also use eye tracking or other biometric measures. With small sample sizes, we have to limit our analysis of the quantitative measures, but with larger samples we can do more detailed statistical analyses.

On the other hand, cognitive interviewing is primarily a qualitative method. The principal source of data is the participants’ responses to the probes. The participants’ comments from thinking aloud can also be insightful. We may analyze the survey responses to help explain the participants’ response processes and experiences. Overall, however, there are not as many objective measures in cognitive interviewing. It may be difficult to measure task success or error rates because it is often impossible to determine whether a participant’s response is actually the “true” or “right” response to the question. Instead, we focus on systematic analyses of the responses to the probes, looking for trends of interpretation or response problems within and across participants.

Remote and Unmoderated Sessions

We often run usability tests in a lab, led by a test administrator. However, we also have tools to run usability tests remotely, which allow us to better accommodate more diverse user populations. Web conferencing tools may be enough for some testing needs, but there are also specialized tools that are tailored for usability testing. Remote usability tests can be moderated (where a test administrator observes the session via web conferencing) or unmoderated (where participants access the tasks over the internet and complete the tests on their own).

Cognitive interviews are generally moderated sessions. Before the pandemic, they were mostly in-person sessions, but there are now more tools to run cognitive interviews remotely. These include improved web conferencing, as well as tools designed specifically for remote interviewing. The one-on-one conversation is important because it allows the interviewer to tailor the probes during the course of the session. Unmoderated cognitive interviewing is rare, but it is gaining some followers. In this approach, the probes are built into the online survey.

Final Thoughts

Because surveys are a common way to collect data for UX work, we need to ensure that our respondents can understand and answer our questions. Although it may seem easy to write good survey questions, we cannot always predict how our users will react. Therefore, as with any design effort, even when we follow design best practices, we still want to test with users before deploying. Testing survey questions before collecting data will ensure that we are collecting the information that we want and will help make sure that we base decisions on sound data. This is especially important in efforts that rely primarily on the survey results.

Cognitive interviewing is similar to usability testing in many ways, but there also are some important differences, including the types of data collected, the use of scenarios, and the types of debriefing questions (or probes) used. These differences are important to remember, but the similarities between the methods suggest that most user experience professionals should have the skills necessary to conduct their own cognitive interviews.

In order to make sure that the surveys we use are collecting the data we want, cognitive interviewing is a great addition to any UX toolbox.

Jean Fox

Jean Fox is a UX Researcher at the US Bureau of Labor Statistics, which collects much of their data through surveys administered to businesses and individuals. Jean has been in UX for over 30 years and has led a wide variety of UX research efforts, including many usability tests. Twitter: @jeanharrisfox

User Experience