Skip to content Skip to footer

Search Results for: diary studies – Page 4

User Research for Non-Researchers: How to Get User Feedback Without a Dedicated Researcher

We’ve all been there. The product team needs data to make a decision, the user research team is fully engaged on other projects, and there’s no budget or time to hire a consultant. So, the product team has to decide between no research or doing user research themselves.

Everyone on the product team should know the basics of research for two reasons. First, if others on the product team can conduct basic research sessions themselves, then the internal research team can focus on more strategic or methodologically difficult research projects. Second, those non-researchers who can do the basics themselves become better consumers of research. They better understand the challenges of finding appropriate participants, designing a good study, and collecting unbiased data.

Possibly most beneficial to the research team, non-researchers who can do some basic studies themselves better understand the labor and time involved, and as a result have more realistic expectations of the research team. This article is a primer that user researchers can provide to others in their organization that will teach them the basics.

How to Get User Feedback Without a Dedicated Researcher

In a nutshell, you need to:

  • Define your goals and decide on an approach
  • Recruit and screen participants
  • Gather data
  • Analyze the data
  • Share it with the team
  • Make decisions

Forming Your Goals and Deciding on an Approach

Setting research goals is the step we see skipped most often by non-researchers. How do you set your goals?

  • Start with the end in mind. Ask yourself, “What decision do I need to make?” Work backwards to figure out what information you need in order to make that decision.
  • Focus on objectives, not questions. Often people start with “What do I want to know?” but that will result in a long list of questions that may not have any real impact if you answer them. For each question, ask yourself, “What will I do with this information?” If it won’t put you closer to a product decision, leave it out.
  • Write down your goals and decisions to be made. You WILL forget, so make sure these are recorded. This also makes it easier to get the team to align on a strategy and to write a report.

Once you’ve identified your goals, now you need to decide how you’re going to address them. In other words, if your questions can’t be answered with a Google search (which may actually be what your team needs), you need to decide which research method you are going to use.

A detailed explanation of methods is beyond the scope of this article, but there are many tools out there to help you decide (such as Laura Klein’s UX for Lean Startups).

We’ll use the overly simple distinction:

  • Why or how question qualitative approach. If you want to know why users do (or would be interested in) a thing or how they interact with something, then go with qualitative research. Common methods include focus groups, interviews, and usability tests. These can be in person or remote (for instance, by telephone or video conference) depending on who you want to recruit and what resources you have (renting a research facility, for example, will require a budget).
  • How many or how much question survey. If you want to know how many or how much, then go with quantitative research. Common methods include surveys, A/B testing, web analytics, and summative (benchmark) studies.

In this article, we’ll focus on one-on-one interviews (often called in-depth interviews or IDIs) and usability studies because they are the most foundational types of studies that a novice should be able to learn quickly and conduct independently with little guidance from the research team. There are, however, a number of situations when you should bring in help from experts (see Figure 1), including when you know you need any method other than an interview or a usability study. In those cases, you should talk to your in-house researchers or seek the help of an expert; methods like focus groups and surveys are always more complicated than they seem.

Call the experts when… Logistics get complex (users outside the US or sending physical prototypes to users), Your users are specialized (low vision ATM users, IT administrators that work with DevOps teams in healthcare), or Your question required an advanced approach to answer (pricing surveys, co-creation sessions, or Kano modeling)
Figure 1. You may need to call the experts when logistics get complex, your users are specialized, or your question requires an advanced research approach to answer.

Recruiting & Screening Participants

You can do this yourself or hire an agency that specializes in recruiting research participants. Either way, the steps are the same:

  • Write a screening questionnaire
  • Decide on what, if any, incentive to offer
  • Find participants who meet your screening criteria
  • Schedule the participants and send them reminders to help ensure they show up
  • Send a thank-you note and incentive quickly after the session (if it wasn’t in person)

If you hire a recruiting firm, you can skip these last three steps, as the research agency will take care of them for you.

Who to recruit as participants

Who you recruit will depend on current goals for your product or designs. Depending on the complexity of the product, you likely have multiple types of target customers. It’s usually helpful to think about the situations in which your product or feature will be used, and then recruit people who are in those situations regularly.

In addition, there are other characteristics you should consider, depending on your research goals:

User Profiles. Your company may have personas, or at least marketing segments, defined. If so, use them in your recruiting. For example, is your product targeted at a certain age group or other broad group such as “moms” or “students?” Is your product targeted at business users at a certain company size, such as start-ups or enterprises? Do users’ businesses fall within a certain vertical, such as “financial services”?

Knowledge. Are you trying to attract new customers or make the initial experience very intuitive? If so, you’ll likely want to recruit people for your study who have never used your product before.

Are you trying to increase your current customers’ usage of your product or change their product mix? Then you’ll want to recruit current customers with at least a moderate amount of experience with your product.

Are you designing features that make users more efficient at repetitive or frequent tasks? If so, then you’ll want to recruit heavy users of the features being designed.

Role. People play roles in both their personal and professional lives. For example, if your product is an ecommerce website, you’ll usually need to recruit people who are the purchase decision makers as well as purchase influencers. However, if your product is a piece of software or a web app, you’ll usually be less interested in the purchase role and more interested in those who are the actual hands-on end users.

How to screen

You will need a screening questionnaire even if you just interview your own customers. (This is a short list of questions you ask prospective participants to make sure they are your target users.) When screening participants, the most important questions to ask concern role and knowledge. You may also want to make sure that you get good representation across some common demographic variables, such as age and gender for consumer products, or industry, company size, and job tasks for business products.

When writing your screening questions, keep in mind that you want to avoid giving away the “right” answer to the potential participant. Avoid yes/no questions and use multiple choice questions wherever possible.

Where to find participants

Finding participants differs depending on whether or not you are recruiting current customers and have access to a customer list.

Working With a List. If the study calls for current customers, you may be able to simply pull a list of customer contacts narrowed down by any criteria for which you already have data (for example, company size or how much the customer spends with your company).

Before reaching out to current customers you will also want to check with, or at least inform, anyone in your company who has a relationship with the customer, such as a sales rep or a customer support contact. This is both a professional courtesy and a safeguard against reaching out to customers who shouldn’t be contacted—like those who are currently so dissatisfied that they are on the brink of leaving. Make sure you screen this list against your company’s “do not call” list.

Working Without a List. If you don’t have a list of customers or if the study calls for non-customers, there are a number of options to pursue, but the process can be labor-intensive. You may be able to use a site intercept tool such as Ethnio to invite website visitors or app users to participate in your study. You may also be able to reach both customers and non-customers through social media, such as Facebook, LinkedIn, and Twitter.

If you have a social media team, be sure to reach out to them for guidance and assistance. There may often be other online sources available, such as forums for people who fit your profile. There may also be in-person Meetups, clubs, organizations, conferences, or user groups that can be a source for recruiting. Sometimes, these types of groups will expect you to “sponsor” their meeting in the form of refreshments in exchange for being allowed to recruit for your study.

[bluebox]Pro tip. There are likely other groups in your organization whose job it is to engage with customers and prospects. In the long run, establishing partnerships and joint events with the marketing, sales, or customer support organizations can be fruitful. Keep in mind, however, that other groups have different overall goals than the UX team. For example, marketing may have a goal to convince prospects that the product’s features are useful and desirable. As the researcher, you may have a goal to understand whether those features truly are useful and desirable. Therefore, care must be taken to clearly separate “research” activities from other activities, such as a “marketing pitch,” or you can end up with biased data.[/bluebox]

Write a Discussion Guide

Whether you’re conducting an interview or a usability study, you must write a discussion guide to make sure you cover the key things you’re looking to understand during the session.

Why write a discussion guide? Because you’re human. The act of writing the guide (and sharing it with your team members for review) gets your questions outside your head and helps you better understand the problem. It also allows you to have a repeatable and consistent structure to the sessions. By keeping the sessions largely consistent across participants, you can detect patterns in what participants say and do.

For both interviews and usability studies, the session should be ordered in an hourglass format, that is, general/broad to specific/narrow to general/broad.

Introduction

  • In both interviews and usability studies, introduce yourself and ask the participant to “tell me a little about yourself.”
  • Explain “what to expect from the session.”
  • Ask for participants’ approval to record the session and discuss confidentiality (you may need them to sign a non-disclosure agreement (NDA) or a Consent to Record).
  • You probably want to start each session with a few background questions about the participant to provide context for the participants’ answers and behaviors during the session. For example, if you were building an app to help people buy movie tickets, perhaps you’d want to know what kinds of movies they like best, how often they go to the movies, and with whom they go to the movies. These initial questions help the participant warm up and get used to the environment of a research session, and allow you to build rapport with the participant.

General Usage

  • In both interviews and usability studies, move to general questions about how they usually use similar products or conduct usual activities with regard to your product category. For example, if you were building an app to help people buy movie tickets, you might ask questions like, “Tell me about the last time you went to the movies. [Pause for answer.] [If they didn’t spontaneously tell you] Tell me about how you got your tickets that night.”

Specific Usage and Feedback

  • In an interview, ask specific questions about your product or concepts that tie in with what you wanted to learn in your research.
  • In a usability study, ask people to complete typical tasks using your product or prototype.
    • Scenarios – When writing tasks for a usability study, you’ll want to have an introductory scenario that sets the stage. You may have one such scenario that sets the stage for all tasks or you may have multiple scenarios. These scenarios typically include motivation. In the movie ticket app example, it might be something like, “You and your friends Bob and Jane have decided to go out this Saturday evening.” Next, have the participant complete tasks such as find a movie and buy tickets. Later in the session, you might set up future tasks with the scenario, “Bob has realized he can’t make it after all.” This might be followed by tasks such as return Bob’s ticket or find another friend to transfer Bob’s ticket to.
    • Tasks – Write your tasks so that they are commands that represent real user goals, such as, “Find a movie that starts around 7:00 p.m. this Saturday evening.” Tasks should not tell participants how to do a task, such as “Click the ‘movies’ button.”
    • Think Aloud & Follow-up Questions – You should ask your participants to “think aloud” as they complete tasks. In addition, prepare a short set of probing or follow-up questions for each task to help you understand what needs to be improved. A typical question after a task is, “What do you think about how that worked?”

Wrap-up

  • At the end of interviews and usability studies, ask more general questions, like overall feedback on the product/prototype or concept, blue-sky brainstorming about how things could be improved in the future, and final thoughts. A good question to end sessions with is, “Is there anything I didn’t ask you that I should have?”

The guide is there to help structure the session. Keep in mind it really is a guide—not a script. You won’t necessarily ask every question exactly as written. You should aim to keep things conversational. If a participant answers a question you were planning to ask later, you probably won’t need to ask that question again. Every participant may not complete every task. Some take longer or are more verbose in answering questions. Let them go where their own thoughts take them, so long as they are not going too far off-topic, keeping in mind your goals for research and the time available for each session.

A good time to revisit goals is after you’ve written the first draft of your discussion guide. Check each question and task against your original goals. Prioritize them based on how important each is to achieving your goals. Then, review your draft with the rest of the product team to make sure you are all aligned on the goals and that they will be achieved.

[bluebox]Pro tip. Usually you have more questions than you can ask or more tasks to be completed than most participants can get through. Highlight the most important ones and make sure you ask those of everyone. Ask the others if you have time.[/bluebox]

Running a Session

Legal Considerations

If you’re planning to record a session, you must get the participant’s consent. You should have a policy on whether the consent needs to be in writing or whether it can simply be verbal. If you’ll be using any information, designs, ideas, mock-ups, or concepts that are not publicly available, you’ll need to get an NDA in place between you and the participant. If you don’t already have standard forms for these, you’ll need to work with your legal department to have forms drawn up. You’ll also want to check with your company’s legal department to see if they have any policies about where and for how long the recording consent and NDA documents must be stored.

Logistics

It’s often helpful to have at least two people: a moderator and a note-taker. It’s difficult for one person to do both of these jobs well at the same time.

Decide on your dress code. Some teams believe that it’s important to blend in with the participants, and thus, try to dress like they do. Some teams believe it’s important to project a professional image and therefore have a more professional dress code. This is something you’ll need to decide upon and make sure everyone follows. You don’t want your moderator showing up in a suit and your note-taker showing up in jeans and a t-shirt.

If you’re able to video and audio record, do so. There are many smartphone apps that let you easily create audio recordings that will suffice for interviews. If you can, though, it is preferable to make video recordings. It’s more engaging for the other members of the product team to see video clips than to just read quotes in your final report.

Whether you’re conducting your sessions in-person or remotely, the easiest way to record video sessions is to use one of the many web-based video conferencing solutions such as WebEx, GoToMeeting, or Zoom. In addition to recording, these tools will allow the product team to observe the sessions in real-time and also allow for remote or in-person participants. For in-person recording, invest in a decent omnidirectional microphone or an external webcam with an omnidirectional microphone; otherwise, your recordings may only pick up one person clearly.

Note Taking

Taking good notes for research is a little different from taking notes in class or a meeting. One way to facilitate good note taking is to prepare a note-taking template for both data collection and analysis. This type of template should have your discussion guide questions pre-populated so you can easily fill in responses. You must be prepared to jump around in your note-taking template because participants may jump around in their responses.

A different way to take notes is to construct a coding system that allows you to write more free-form notes, but quickly tag the notes in real-time. For example, you could have codes that represent tasks (#T1, #T2), interview questions (#I), comments from participants (#C), observations of what participants did (#O), great quotes (#Q), great video (#V), and your own interpretations of insights or findings (#F). This keeps your notes in chronological order, but allows you to quickly scan and search them during analysis.

Remember, in a usability study, you want to take notes not only on what participants say, but also on what they do during the tasks. It’s just as important to observe behavior as to listen to their verbal responses.

Make sure your notes include:

  • Participant’s name
  • Session time
  • Task or question asked
  • As complete as possible statement (verbatim is best)
  • What the respondent was doing at the time

[bluebox]Pro tips. Consider taking notes in Excel and use Excel’s built-in time functions to time-stamp each line. That makes it easier to compare notes to videos and audio recordings later.

Write down EVERYTHING. Don’t just wait for “relevant” or “interesting” comments. You may not be able to tell which answers are important until after you collect all your data. For interviews, consider getting transcripts of your recordings. Transcription services are quite affordable and you may find it easier to get verbatim quotes from your transcribed audio than to try to capture verbatim quotes in your notes in real-time.[/bluebox]

Asking questions during the session

What makes a question “good?” What gets you usable, accurate data? Asking questions during a session is not like having a normal conversation, so you may be uncomfortable at first. Practice will help. When probing and following up on the questions in your discussion guide, keep these tips in mind:

  • Questions should ask about only one thing: “What do you notice first?” NOT “What’s the first thing you notice on this page and where would you go next?”
  • Questions should be open-ended—not easily answered with yes or no.
  • Avoid leading questions: “Is this easy?” Instead, ask, “What do you think about this?”
  • Ask questions you think you know the answer to. You might learn something new or realize your basic assumptions were wrong.
  • Ask questions that seem obvious. Don’t assume you understand what the participants “mean” without them actually saying it.
  • Don’t provide an answer within a question: “Are you trying to find your order history?” Instead, ask, “What are you trying to do here?”
  • Wait for an answer. This is harder than it sounds. If the participant doesn’t answer immediately, keep waiting. Do not offer an answer or ask another question. Remember, this is not a normal conversation; you do not need to avoid silence. Let them think and answer.
  • Ask follow up questions: “Why?,” “Tell me more about that,” “Why do you say that?,” “Help me understand what you’re thinking.”
  • Do not correct answers. If participants misunderstand your product, ask questions to figure out what is causing the confusion. Do not tell them they are wrong.

[bluebox]Pro tip. Remember: Build rapport, but be professional. Too much small talk can be confusing. Too little can be rude. This should be conversational, but targeted.[/bluebox]

Analyzing Data

Now that you have those nicely recorded notes, analysis should flow from your observations.

In the case of interviews, printing your notes out and pulling out a highlighter and/or sticky notes is useful. If you didn’t organize your notes by discussion guide question or task, you can take the time to do that now.

Review your key themes and note any patterns, focusing on behaviors displayed, issues encountered, or comments made by multiple participants. Look for common themes across participants.

If something is interesting but is only supported by a comment or observation by a single person, then note it. It’s something that may need further investigation, but not likely something that can be used to immediately affect your design.

[bluebox]Pro tip. Count how many people displayed a behavior. Counts can help you avoid confirmation bias, but can be difficult to interpret for novices. Usually it’s a bad idea to report counts as percentages because you have too few respondents for a percentage to be meaningful. Use counts to help guide a qualitative analysis and to double-check your assumptions and memories (for instance, “I remember lots of people had a problem here, oh, but I see it was only three out of 12.”) When in doubt, ask someone on the research team for a quick consult.[/bluebox]

Sharing Your Findings

Make sure to document what you did: your study design and methodology, the details about who you recruited, and when you conducted the sessions. If you’re testing prototypes or concepts, you’ll also want to note the version or sprint because these tend to evolve quickly over time. Naturally, the bulk of your report should document findings and insights that came out of the analysis of your observations. Documenting both what you did and what you found are important so that future team members understand your rationale and don’t retread the same ground.

[bluebox]Pro tip. In addition to documentation, we strongly encourage a presentation to the team or a working session to talk through what was found and what to do about it. This is so that everyone on the team takes away the same things from the project.[/bluebox]

Making Decisions

Now is the time to decide what changes to make to a product or experience design based on feedback from users. The real discipline comes in waiting until research is complete and the findings are shared and digested. It’s tempting for product team members observing research sessions to see one participant struggle or hear one participant make a suggestion and then jump into “fixing it.” You don’t want to do this; if only 1 out of 10 participants had an issue with something, that means 9 out of 10 didn’t have an issue. You want to make product and design decisions based on those 9 out of 10 participants, not on the 1 participant who was different from the rest. Be sure to prioritize what you’re going to do and record those things in your product backlog or put them on the roadmap for future releases.

Now you have a process for defining your goals, recruiting and screening participants, gathering and analyzing data, sharing findings with the team, and making decisions based on those findings. These basics should get you well on your way to conducting straight-forward interviews and usability studies with only minimal assistance from your research team. Sometimes you will still need to bring in an expert, but now you’ll likely find you can do a lot on your own.

Trust in Mobile Shopping: An Empirical Study

Mobile commerce (m-commerce) is changing the way people shop. Millions of people purchase and download apps for their phones, and companies are anticipating an ever-increasing revenue stream from m-commerce. For example, by 2016, PayPal expects to see up to $10 million in mobile payments per day, and Forrester Research predicts m-commerce revenue reaching $31 billion by 2016. The potential wielded by four billion mobile phones worldwide is continuously referenced in marketing meetings across the world.

At Apple’s Worldwide Developer Conference 2012, Apple reported that their app store had more than 400 million user accounts (with credit card numbers attached), including 650,000 apps available for download and a total of over 30 trillion apps downloaded by consumers to date. These numbers point to an urgent need for user experience practitioners and designers to understand and enhance the m-commerce user experience. While past research has clearly stated that m-commerce is not simply an extension of electronic commerce, we simply can’t ignore that e-commerce is the closest indicator of what we might expect from this new area. Traditionally, we have seen trust cited as a major obstacle in e-commerce activity. In 2000, the Better Business Bureau publicly stated the necessity of “promoting trust and confidence on the Internet,” and claimed that a lack of trust was a major reason why users did not buy online. This is usually the case for new technology, as familiarity is a well-documented precondition for trust.

Many researchers have developed models of trust to explore how trust can be achieved for commerce. Notable is Lynne Zuker’s model of trust, which outlines three ways in which companies can gain trust with consumers: by showing similarities (for example, in lifestyle and goals) between the company, its products, and the consumers (character-based trust); by creating a history of trustful transactions with consumers (process-based trust); and by presenting a public presence that is respected and shows integrity (institutional-based trust). These trust mechanisms have been applied to e-commerce in the past, and our team builds on them as we work with m-commerce.

Studying Mobile Shoppers

To help explore this new form of commerce and what role trust plays within the mobile commerce space, we conducted a diary and interview study with m-commerce shoppers. Participants kept diaries about their mobile shopping activities, including details about what they shopped for, what they ended up purchasing, and what activities led them to their shopping and buying. An overwhelming number of participants reported shopping for “deals,” as well as clothing, hotels, general accessories, shoes, cellphone accessories, toys, and pet products.

We also looked closely at trust to understand if consumers had issues with trust and how these were being mitigated. The ways in which consumers developed trust in companies often took on a form specific to m-commerce when compared to e-commerce or traditional retail shopping. Overall, we found that people have few trust concerns when it comes to m-commerce because of brand recognition, social recommendations from friends, and minimal perceived risk in many of the purchases made from a mobile device.

Brand Recognition

The largest factor for m-commerce in developing strong trust with consumers was “brand recognition.” Participants regularly stressed their trust in the brands they were buying. In all shopping instances, people reported having a past experience with the vendor. In addition to trust in store and product brands themselves, participants mentally transferred their trust from large companies (for example, Apple) which approved m-commerce applications to the applications themselves. That is, app marketplaces like Google Play (formerly Android App Market), Amazon’s Marketplace, or Apple’s iTunes were highly successful in transferring trust from their well-known company label to the product being purchased. For example, if people were using an app on their mobile device for shopping, regardless of which company made the individual app, because the app had been approved through a large trusted company, the trust the participant had in the large company transferred to the app itself and it was deemed trustworthy.

Some participant quotes:

  • “Everything is prescreened in the [Apple] app store, so there is no worry about [trust].”
  • “It just feels like a more cohesive thing when it is under that one umbrella company of Apple…[not using the app store] just feels like you are opening up your phone to all the Internet and random companies.”

Trust in m-commerce largely hinges on the respected, public presence of large companies and their associated marketplaces. In e-commerce, such institution-based trust is commonly done through third-party guarantors, membership in associations with professional codes of conduct, and so on. Historically, this type of trust did not include distribution models such as app marketplaces. Yet app marketplaces have essentially played the role of third-party guarantors in m-commerce. The often stringent approval process that Apple mobile apps must go through before they are even placed in the hands of consumers acts as a guarantor of service or products acquired through it.

The implication for vendor companies is that the more a product can be tied to a larger company and its marketplace, the more likely people will trust the product and purchase it. Consumers should certainly be cautious, though, because app marketplaces may not, in fact, provide the best screening of trustworthiness related to shopping.

Social Recommendations

We also found that our m-commerce participants had few trust concerns because many of their shopping or purchasing activities were based on recommendations from close friends or family members. Similar to the transfer of trust with known brands and app marketplaces, users would also assume vendors and products recommended by their family or friends were trustworthy, whether they actually were or not. In most cases, social recommendations came from close family members, but sometimes they came from strangers if there appeared to be a consensus among them.

For example, one participant talked about getting a recommendation from a friend about a particular pizza delivery place. Even though they had never used the website before, they made the purchase based on the recommendation and thought little about trust. Other participants described eBay seller ratings. They would indicate that with high positive feedback, trust was simply not an issue for them. However, if the feedback was neutral or negative, they would not purchase from the seller.

Stepping back, we can see that social recommendations relate to character-based trust. This is the idea that trust can be developed by showing similarities between a company and its consumer. In a mobile context, social similarities were most prominent between people and not companies. That is, the act of having friends or family suggest a product, or, to a lesser extent, social networks, meant that people felt comfortable with the product because they recognized their friends or family had similar values to theirs. The implication is that m-commerce companies are more likely to be successful if they are able to tie their products to social sites that allow people to easily recommend shopping items to others and act on these recommendations.

Lack of Risk

Finally, our study revealed that participants felt there was little actual risk within their mobile commerce experience, regardless of true risk. Other traditional e-commerce trust concerns such as loss of personal information, tracking one’s purchases, and poor service quality, were rare among users. In e-commerce, providing an email address, shipping information, and a credit card number most often happens at the time of purchase. However, this is not often the case for m-commerce.

Our users instead made most of their purchases through app marketplaces, which meant that purchase information went through the larger trusted brand provided by the marketplace and not necessarily at the actual time of purchase. This type of “automatic” payment eliminated perceptions of vulnerability. The implication, again, is that trust can be improved with companies if they are willing to tie their payment mechanisms into larger trusted brands that have marketplaces with information stored prior to purchase.

Mistrust concerns did arise but were very rare, occurring in only 7 percent of the reported shopping instances. Reasons for mistrust included negative social recommendations, poor brand experience, and usability issues with a company’s app or site.

Conclusion

Overall, participants had few concerns, which we attribute to several factors that map at a high level to trust mechanisms established for both retail and e-commerce. In each case, though, m-commerce provided unique nuances in terms of how trust mechanisms were applied and thought about by users. Since purchases were made on mobile devices, unlike personal computers, they tended to be from companies which already had strong relations with users from previous mobile transactions, purchases done in other mediums, or because of strong referral by friends or referrals in a social space.

Compared to e-commerce, m-commerce seems to be more an extension of previous brand experience and less an introduction to a brand. Our findings suggest that the more m-commerce applications tie to existing friend networks or established and known brands, the more likely people will trust them.

Perhaps the most fascinating difference between e-commerce and m-commerce activities and notions of trust was the heavy use of application stores and apps designed by specific companies. The regular use of these applications is non-existent within the e-commerce space. Of course, we are now beginning to see companies migrate many strategies from m-commerce to the e-commerce domain in which computer-based shopping and purchasing can be performed in app marketplaces just as they are on mobile devices. For example, the Apple App Store can now be used on a Mac computer for buying software programs and games. We believe this blending of e-commerce and m-commerce foreshadows continued emergence in the future.

It is clear that mobile commerce is in its infancy. Decades of future research and development are needed to fully understand the role trust will have in mobile commerce, and how this new form of commerce will affect current forms. Starting this conversation is something each nation must not delay on, or more agile countries and industries will reap the benefits of early adoption and a place among the technology elite.


尽管还处于起步阶段,移动商务 (m-commerce) 在北美和全球经济中都在快速形成影响力。与早已成熟的电子商务中的信任相比,移动商务中的信任又是怎样的呢?我们怎样才能改善移动体验的设计以帮助提升信任呢?

文章全文为英文版아직도 시행 초기 단계에 있는 모바일 커머스(m-커머스)는 북미와 전 세계 경제권 내에서 빠르게 영향력을 확산해가고 있다. m-커머스 내에서의 신뢰는 상대적으로 이미 잘 발달된 e-커머스와 어떻게 비교되는가? 신뢰 증진을 돕기 위해 우리는 모바일 경험을 어떻게 더 잘 디자인할 수 있는가?

The full article is available only in English.Ainda em sua infância, o comércio móvel (m-commerce) está rapidamente criando um impacto na América do Norte e nas economias mundiais. Como é possível comparar a confiança no m-commerce à confiança em sua contraparte desenvolvida, o e-commerce? Como podemos projetar melhor a experiência móvel para ajudar a promover a confiança?

O artigo completo está disponível somente em inglês.モバイル商取引(mコマース)はまだ初期段階にあるとはいえ、北米や世界の経済圏において急速にその影響力を高めつつある。mコマースにおける信頼は、既に発展を遂げている電子商取引(eコマース)とどう比較されるであろうか。信頼感を高めるために、我々はどのようにしたらモバイル体験の向上を図れるだろうか。

原文は英語だけになりますAún en su etapa inicial, el comercio móvil (m-commerce) está causando un rápido impacto en Norteamérica y las economías mundiales. ¿Qué comparación se puede establecer entre la confianza que despierta el comercio móvil con la que despierta el comercio electrónico, su contraparte ya desarrollada? ¿Cómo podemos diseñar mejor la experiencia móvil a fin de que nos sirva para promover la confianza?

La versión completa de este artículo está sólo disponible en inglés.

Games UX Testing with Artificial Intelligence

Digital gaming is the largest entertainment industry in the world, growing at a rate of around 15% per year and reaching $137.9 billion (USD) in annual revenue worldwide. The composition of this industry is incredibly varied, ranging from independent developers working on small mobile games to major commercial releases larger than Hollywood films in terms of production budgets and revenue. Over the past decade, the games industry has advanced substantially in terms of technology, design, and business practices. From the introduction of new platforms (e.g., mobile, virtual reality, augmented reality), to new publishing models (e.g., free-to-play, season passes), and the inclusion of new audiences (e.g., kids, older adults), developers are rapidly adapting to provide engaging experiences for players across these new design frontiers.

In focusing on the creation of fulfilling user experiences, design techniques that involve players in the development process (e.g., user-centric and participatory design) have been receiving more attention from game designers. These efforts helped to establish the field of games user research (GUR) among academic and industry practitioners. GUR aims to understand players (users) and their behavior to help developers optimize and guide user experience closer to their design intent (see my previous article on GUR).

GUR fundamentally relies on iterative design and development processes, where prototypes created by game developers are evaluated through testing with players drawn from the game’s target audience. Insights gained from this process help developers improve their designs until the desired player experience is achieved. User testing (also often referred to as play testing in the game industry) is one key evaluation approach whereby researchers aim to understand player behavior, emotions, and experience by collecting and analyzing data from players interacting with a prototype or pre-released version of a game.

Repeatedly creating high-fidelity prototypes (game builds) suitable for user testing is not only time consuming but also expensive. Moreover, recruiting users and conducting evaluation sessions are labor-intensive tasks. These challenges are especially pressing in game evaluation, where designers may wish to evaluate many alternatives or rapidly measure the impact of many small design changes on a game’s ability to deliver the intended experience. How might we lessen this burden while improving our ability to conduct more representative and comprehensive testing sessions? What if we developed a user experience (UX) evaluation framework driven by artificial intelligence (AI)?

In this article, I highlight our learnings from working on projects focusing on the development of game evaluation tools using customizable AI players.

AI, Games, & UX

Artificial intelligence research has always had a strong relationship with games, often serving as a testbed for the development of novel algorithms. In the 1950s, some of the earliest AI algorithms were developed to play games, where continual advancement eventually led to the development of systems capable of defeating world-class human champions, such as IBM’s famed Deep Blue defeating Garry Kasparov in chess and the Othello-playing program Logistello defeating Takeshi Murakami. In time, the complexity and applicability of AI increased beyond the bounds of playing games to game development, providing reductions in human labor and resource requirements.

Within UX evaluation in games, AI extends the limits of what researchers can accomplish, augmenting human analysis through intelligent interfaces and helping build player models from large data collections. AI supports complex and difficult data analysis tasks, such as players’ recognizing emotional responses.

Investigations of user behavior often rely on extracting insights from massive collections of player telemetry data, such as movement in a game’s world, time to complete tasks, and character deaths. With data from thousands of players, manual analysis may prove impossible. AI helps to improve the efficiency of many of the tedious tasks associated with such complex analyses, like processing large and disparate datasets and predicting behavior. Based on existing data, what future user actions can we predict? Knowing the answer to that question, such predictions may be valuable to project and evaluate user experience for the purposes of creating adaptive game interfaces.

AI has also been used as a proxy for human users in UX evaluation and quality assurance (QA) testing, but questions surrounding user experiences are far more subjective than their QA counterparts. In UX evaluation, the task is often focused on creating agents that “think like humans.” Through the creation and observation of AI agents that obey a game’s rules and/or approximate player behavior, game designers try to better understand how different players will respond to a given game event. The concept of evaluation through deploying agent-based testing has already demonstrated promise as a means to reduce the resource and labor requirements of game testing. Another testing application focuses on evaluating playability, especially the objective features of a game’s content, such as whether it is possible to complete a given level or how many possible solutions exist for a given puzzle.

PathOS

In our lab (UXR Lab, Ontario Tech University), we focus on the development of UX research frameworks and tools that directly support the work of both academic and commercial GUR practitioners in the collection and analysis of gameplay data. One of our current projects is PathOS, an open-source Unity tool and framework that explores the utility of AI-driven agents in game user testing. We are interested in modeling player navigation in games with an independent, configurable agent intended to imitate the spatial (navigational) decision-making of human players. Our goal is to make a system capable of simulating player navigation for a diverse population of AI-controlled agents (users), standing in for early-stage user testing aimed at evaluating level or game stages and design. In particular, we focus on producing customizable models of human memory, reasoning, and instinct pertaining to spatial navigation.

The PathOS framework consists of two key layers designed to work atop any navigational data system (i.e., navmesh or navigation mesh) contained within a given game engine or development framework. These two layers are the non-omniscient player model, analogous to a “brain” or decision-making machine, and an intermediary communicator, analogous to the sensory organs (see Figure 1).

Flowchart showing the connection of the player model layer (brain), which is memory, reasoning, and instinct, to the communicator layer, then to the game system.

Figure 1. Logical layers of the PathOS framework.

The core functionality of PathOS resides within the brain of the system. While traditional navigation modeling focuses on the generation of optimal paths, we are especially interested in modeling fallible components of human perception and reasoning to simulate issues with player navigation, such as getting lost, missing key areas, or losing track of game objectives.

As such, our memory component will track past actions (e.g., turning) and “visible” information to construct a simulated mental model of player surroundings. Information contained within “user” memory will be mutable and forgettable based on factors such as information volume, action speed, and simulated experience level. Simulated player reasoning will be based on patterns of action available in memory (e.g., having traveled in the same direction for several paces), information given about objectives (e.g., in-game compass indicating goal is to the north), and spatial cues (e.g., placement of walls). And finally, the instinct component aims to help model how personality and past experience might influence navigational behavior by emulating, for instance, an inexperienced player’s tendency to favor investigating safe, open spaces as opposed to potentially dangerous narrow corridors.

The communicator filters information available to the brain from the navigational data, based on player point-of-view visibility. The brain maintains the artificial player’s memory and makes navigational decisions based on information from memory, characteristic play style, and “sensory information” from the communicator. Decisions the brain makes feed into the communicator, which in turn manipulates an in-game agent and logs actions accordingly.

In any given game navigation scenario, players may have a number of factors to consider before making their next move, such as current objectives, hazards or enemies, and a desire to explore. Simulated agents should therefore evaluate available navigational alternatives based on heuristics such as alignment with game objectives (i.e., might this pathway lead to an in-game goal?), potential for danger (i.e., are enemies visible along this pathway?), potential for discovery (i.e., is this new territory?), and number of previous traversals.

What’s Next

Integration of agent-based testing into a commercial development cycle would vary significantly depending on the nature of a given project. For PathOS, we envision the framework as having the greatest utility early in the development process to give developers a coarse indication of how players might experience the virtual world. After creating an initial level design, the framework could be deployed to identify likely navigation patterns of players within that design. Unintended or unexpected features of these patterns may then be used to identify and remedy issues related to world design. For instance, if a developer notes that an intended objective is not reached by the majority of agents, the developer can review a simulated playtrace (movement) to identify whether the unmet objective is caused by poor visibility, a lack of points of interest (POI) in the surrounding area (see Figure 2), or some other factor (e.g., a high density of hazards discouraging cautious players from exploring a given region). Based on the insight gained, the design can then be iteratively refined and retested to validate the intended experience.

Example map of a game level featuring fenced in green space with buildings, trees, rocks, and so on. There are colored lines depicting the paths each AI-agent has taken and their engagements with point of interest (such as enemies and collectables).

Figure 2. Example visualization of paths taken and engagements with POI by different AI-agents in the PathOS framework (Created by Atiya Nova, UXR Lab).

Identifying basic issues early in the development process prevents these issues from propagating to later testing with human users. For example, if the goal of a late-stage user test is to assess players’ experience with a game’s narrative, but players miss interactions with key characters due to basic issues with level design, then the ability of the test to accurately investigate the core research question has been compromised. Testing early with artificial agents to identify these basic issues could help prevent these situations. Furthermore, the pursuit of continuous, informed improvement from a very early stage of development may contribute to overall improved product quality and a better experience for the end user.

Acknowledgments and Further Reading

Recent projects at UXR Lab motivated me to write this article about exploring AI applications in games UX evaluation. I would like to thank and acknowledge the research students who were involved with these projects: Samantha Stahlke, Atiya Nova, and Stevie Sansalone. This article summarizes the key lessons we learned from these projects. For more information, please refer to the following:

Web 2.0 for an Older Population: Exploring the Limits

The personal computer is, in many ways, a rather inaptly named device. From the earliest days computers have been about sociality and community. The staples of everyday computer use email and the Internet are intimately social. As the Internet evolves, its uses become ever more implicated in relationships between people.

Indeed, one of the most widely used pieces of web jargon at present, Web 2.0, refers to designing the Web for, and developing with, intensively social uses in mind. As Danah Boyd, a Fellow at Harvard Law School’s Berkman Center for Internet and Society puts it, the term Web 2.0 expresses a shift to the Web as “an information ecology…organized around friends.” In recent years, socially oriented websites have become hot properties, attracting both significant user communities and venture capital. Facebook, MySpace, and many others have become must-see destinations on the information superhighway.

For many younger users, these sites define what an internet experience is all about. At the same time, the populations of the developed world have begun to experience the rapid aging of their populations. National governments and care systems have become aware that health and other challenges are associated with this aging trend. In 2003, 10 percent of the world population was over age 60. By 2050 that will be 21 percent-nearly two billion people.

homepage of a website
Figure 1. Main page of the Eons website showing available activities.

Technology is capable of playing a role in rising to the challenges that such demographic change will bring. This may be in the form of telehealth devices or other tools for enabling independent living. Yet this older population is diverse in nature. Some analysts regard the older population as more diverse than any other population in terms of lifestyle, attitudes, and outlooks. The differences between a 60 and 90 year old are enormous, and we cannot expect this group to have a shared sense of technology literacy or confidence. However, the way we talk about, and sometimes design for, older populations suggest that the younger-old and the older-old are one homogonous group.

In response to this aging trend, social networking or social media sites for older users are becoming a recognizable subset of the overall social web genre. However, these sites tend to cater to the younger-old-boomers in their 50s and 60s. But in many countries, the older-old-those 80-plus years of age-is the segment growing most rapidly. It is this population group, unlike the more youthful boomers, that is unlikely to have used computers during their working life. Yet it is at this point in life-as bereavement and life-course events start to shrink one’s social networks, and social isolation, loneliness, and depression become real concerns, that the possibilities of social networking sites and social media might start to have some relevance as a means of maintaining networks of friends, family, care providers, and others providing support, as well as channels, for sharing interests and concerns.

In short, can an online world increasingly focused on supporting rich social experience provide realistic avenues for dealing with some of the threats and opportunities that an aging world presents? How can social networking tools be used to support successful aging? How can we use Web 2.0 to support the sort of experiences that older people want to have online and offline?

A recent article in the New York Times titled “New Social Sites Cater to People of a Certain Age” (2007) made it clear that social media sites aimed at older populations are a discernible trend. Eons, Boomertown, Multiply, and Boomj are a few sites that aim to profit from the fact that, as the author put it, “older people are sticky.” By offering tools for sharing media such as photos, making and maintaining friendships, and finding information relevant to their place in the life-course, these sites hope to help older people.

Precise figures on the actual use of these sites is not easy to come by, but Facebook’s advertising rate data reveal just how few people 65 years or older in the U.S. and major European countries actually use the site.

table of facebook users
Figure 2. Number of Facebook Users age 65 and over by country.

Friend or Foe

Of course, Facebook is not a site that is specifically designed for, or targeted at, older users. There are specific sites aiming at those groups. But all such social media sites share are some basic features: Friends, Networks, Contact Lists, Feeds, Profiles, Privacy Setting; newcomers to the world of Web 2.0 will be faced, soon enough, with terminology of this type. Whereas the words are probably familiar, the actions and interactions they invite might not be. For example the idea of “Friend” should be obvious enough, but used in the context of social media it can refer to people you have never met but that are in your network (another term in need of decoding), a person to whom you can send certain types of content and interact with in certain ways that are not available to people who are not Friends. And a Friend will be someone who likely has access to your Profile in a way that someone who isn’t a Friend will not.

The apparent simplicity of these terms, taken for granted by long term users of social media platforms, obscures the potential for these terms to confuse and alienate. Building blocks of Web 2.0 can become stumbling blocks. In making assumptions about the media and technological literacy and life worlds of older population’s Web 2.0 platforms, we risk diminishing their accessibility. Indeed, the view of accessibility we should be using is one that includes not just usability, but also the way that the concepts these apparently simple terms like “Friend” denote resonate with the lives of older users.

Conceptual understanding should precede functional benefits. If the former do not exist, then the latter will be unlikely to materialize. Something that is confusing to the user is unlikely to become of value to them.

If Boyd is right to attest that the web has become “an information ecology-organized around friends,” then it should be little surprise that many Web 2.0 platforms focus around Friends or Contact Lists. To a large degree the initial stages of interacting with such a platform involves populating these lists with members of one’s social network. Friends either have to be invited onto the system, or found within it-for example with the use of a “Friend Finder” tool. While this is now a common activity for the Web 2.0 generation, it is less certain that it is either a straightforward or meaningful activity for those with more limited prior technical exposure, and possibly diminishing social networks.

The research Intel’s Digital Health Group has conducted in more than thirty countries suggests that, as an activity without a clearly articulated benefit, friend finding or filling is not likely to make a great deal of sense. One of the benefits of having found and filled friends on Web 2.0 platforms is that “Friends” become special types of social objects (and subjects). Users can perform actions with their friends-invite them to events, send messages and recommendations to them. They might also be able, as they are on Facebook, to narrowcast to friends or even friends of friends, information about themselves, their movements, thoughts, or emotions.

Such “Feeds” offer tools for the ongoing construction of an online identity-through the gradual accumulation of such self-representation networks become visible and online personalities become discernible. For one type of user this sort of “always on,” highly performative form of activity and interactivity has become a norm of online life. I question the extent to which a population new to computing responds to, and interacts in this way. If we are designing technology to support or augment the social experience of older populations we need to focus on a series of alignments. We need to think about the ways in which the concepts and practices our terms suggest might (or might not) match the worlds and practices of the intended users. For example, what is a friend and what do friends do with, or to, each other online. Research in Ireland on older users and their social networks revealed that for many the term network itself had little resonance even outside of conversations about technology; the term community was more meaningful. Imagine how such people would respond to the idea of “Friend Finding” to fill their “Networks.”

text advertisement for BOOMj
Figure 3. BOOMj member start page.

Using User Profiles

Much of the potential of Web 2.0 platform comes from the existence of rich profiles for users. Profiles are the engines of social oriented Web 2.0 platforms. Profiles can be used to tailor the content, service, or “events”  that are pushed to the user. Profiles can be explicit (such as on Facebook) or inferred from our past actions and activities (on sites such as Amazon). Explicit profiles require understanding from the user about how such information might be used and how it might be visible to different types of system users. Can we assume that older users will be comfortable creating and managing an online identity in this way, if they have limited online or PC experience?

One further assumption that lies behind Web 2.0 is the “always on” user who might access the platform from a couple of different devices. There are two aspects of this assumption: one is that “always on” and “always connected” are relevant ways to think about the use of technology by older people. Are these more relevant when thinking about younger users managing online identities. Our research with older populations suggests that often, online tools are seen as exactly that-good ways to augment or enable offline activity, not a substitute for it. The research indicates that socializing and activities offline can be more highly valued than immersive, continuous online experiences; less “Twitter stream” and more conversation over a cup of tea. But technology has a role to play in enabling these social interactions for older people.

Secondly, the broader question of connectivity needs to be examined. In the U.S., 32 percent of adults over 65 years of age use the Internet. Meanwhile, in the UK, 49 percent of people over 65 years old do not have access to an internet-connected PC. Such statistics demonstrate that any assumptions about access to, and use of, the Internet might need to be re-evaluated. Moreover, “narrowband” practices, such as turning off the PC between uses, are likely to mitigate against the sort of always on, highly connected experiences implied by Web 2.0 services.

user profile
Figure 4. BOOMj user profile.

Getting 2.0 Right

If we are to address the challenges and opportunities that a rapidly aging population presents, and to enable the sort of connected, social experiences that younger users of the Web demand, what do we need to differently? If social inclusion, information and service provision and social opportunities for older people are to be enabled through technology, the philosophical and architectural approaches of Web 2.0 will be invaluable. But our research suggests that we need to be cautious about instantiating ideas imbued with Web 2.0 features for this older population.

A first step is to thoroughly research this group of users to ensure that their technological literacy and sense of ability and confidence with technology is understood by designers. It is important that the online and offline practices of older people are addressed so that the technologies we develop strike the right balance between replacing and augmenting offline activity. For example, do we really need older users to dispose or duplicate their pocket calendar or the diary on the refrigerator with an online version built into a Web 2.0 platforms? How will our ideas work if they resist such transposing and duplication?

Secondly, researching the intended user is no substitute for their continued involvement in the design process. Those little additional features that we think make our inventions attractive might easily repel or complicate. What do these additions do to the user experience and how do they impact the cognitive burden of usage. What can we strip out and how can we pare down the features we offer to give the optimal experience? Ultimately, these questions are not best answered by designers and technologists but by older people-the intended users. Too often user research can end just at the time when its impact has the most potential to keep our innovations grounded.

Finally, we need to think of accessibility as a holistic concept, taking into account not just usability, but all aspects of a user’s life. The concepts embedded into our technologies need to resonate with the life worlds of older people. In that context, research and testing needs to be imagined as something that occurs over the longer period of time, naturalistically. It should be more than a short test of feature usability within a lab environment. We need to allow people to grasp concepts and the associated features as they appear on a system. This means allowing longer for people to ingest the platforms-to give people time to take them onboard and ourselves the time to see if they fit with their lives and needs.

Collaborating Across Cultures: Designing UX Studies for Japan

Years ago, when I came to Japan to teach English, it was the first time I had ever left the United States. I was on my own and I didn’t really know anything about the local culture or language. In my head I thought itcouldn’t be much different than America; after all, Japan was the third largest economy in the world, and surely a very international place.

I was aware, almost immediately, of how wrong I had been. Even so, it took me a long time to fully adapt. The way that Japanese people behaved and communicated was completely different from anything I had previously experienced. It was also not as internationally diverse or as English-friendly as I was expecting. For example, in the first town I lived in, out of about 300,000 people, there were only a handful of non-Japanese and very few Japanese who could converse in English. In fact, Japan remains one of the most homogeneous countries in the world, and fluent English speakers are a small minority of the population.

Not only was the cultural difference evident to me through teaching English, but also when I began to learn Japanese. When I thought about directly translating my feelings from English to Japanese, I often stumbled. For example, saying “no” in Japanese is rarely straightforward. There is a word for “no,” but it’s not often used. You’re more likely to hear the word for “it’s difficult.” My first experience of this was when I proposed a new idea to a former boss. The conversation went something like this:

Me: “Hey, do you think we could try X?”

My boss: “Ah that sounds a bit difficult, right?” (This means we can’t do it.)

Me: “Yeah, it’s probably not easy.”

(long pause)

Me: “So, can we do it or not?”

My boss: “Uh…”

 Picture shows a bird’s eye view of Central Tokyo at dusk.
Figure 1. Bird’s eye view of central Tokyo.

Friends of mine advised me that I needed to change the way I thought. They suggested it would be impossible to learn Japanese if I thought about the language only through the eyes of an English-speaking Westerner. I shrugged them off initially,  but ultimately, they were right. In order to fully understand, I needed to unlearn what I had learned.

Conducting Research in Foreign Countries

So why am I telling you about being an English teacher during my first years in Japan? I think teaching in a foreign country has a lot in common with international researchers conducting user testing in Japan. For outsiders looking in, it’s nearly impossible to understand the complexities and nuances.

 

The world may be globalizing at a rapid rate, but people are still very much separated by language and behavior. No matter how much we progress as a global society, it’s still possible to err by viewing the rest of the world only through the prism of our own experiences.

When we design global studies, we have to be sensitive to the ways other people live, how they communicate, and what’s important to them. For our studies and designs to be truly user-centric, they need to be fully localized to the markets in which we test.

At Mitsue-links, we often get international clients (usually UX professionals from agencies or large companies) who are looking to collaborate with us for local UX research. Occasionally, however, clients have created the study beforehand and ask us to perform a translation and carry out the research. We suggest that overseas clients involve us, the local partner, in the earlier planning stages of research. This helps ensure that we are aligned with the full scope of the research, as many things can potentially go amiss in UX testing.

As a case in point, I’ll examine two keys aspects of Japanese culture that can affect UX research in Japan. I’ll also provide a list of helpful tips for doing research in Japan.

Expression and Logic

You may have heard the famous Japanese phrase, “The nail that sticks out gets hammered down.” While this idea may be an overly simplified characterization of Japan, it’s not without its merits.

When I was a teacher, I had the opportunity to interact with a lot of Japanese people, from kids to adults, from casual learners to business professionals. And of course while no two people acted exactly the same, there was a very obvious sense of adherence to established social protocols and ways of thinking. These cultural norms manifested in a variety of ways, but one significant aspect was a lack of public personal expression.

Take Japanese schools as an example. One of the first things you might notice when observing a classroom is how the teacher lectures the students. It’s not an open discussion-based environment. Even if the teacher does ask the students if they have any questions, nine out of 10 times no one will raise a hand.

The reason for this is the assumption that things are quite straightforward. There isn’t much of a reason to ask questions, because if there was anything else to know, the teacher would have said so. And if you didn’t understand, it might be improper to trouble anyone else with your problems.

In user testing, social norms about expression can mean users may be less willing to give their opinions or think aloud. When giving opinions, they may only be neutral responses, which is far more culturally appropriate than being overly negative or overly positive.

Participants are also less likely to think abstractly or respond when they are unsure of a clear answer. Mostly likely, they will need to probe for meaning, thereby consuming valuable interview time. The Japanese language can often require a lot of reading between the lines, so interview questions need to be carefully planned to alleviate any potential misunderstandings.

To understand this logic, we can compare these two basic equations:

2 + 2 = __

vs.

__ + __ = 4

The first question has only one answer, but the second could have multiple answers. In Japanese schools, you would not commonly come across the second question. The logic with the first question is that the answer is always clear and there is no need for you to interpret or think abstractly. All you need to do is memorize the information that you are given. The answer to the second is more ambiguous.

Differences in thinking can lead to dissonance in the expectations of researchers. One key aspect to consider is the importance of localizing research materials—not just translating them directly. Understanding upfront that changes will likely be necessary can save everyone from frantic last-minute changes. When local researchers recommend a change, it is not because they want to devalue the global study, it is because they see issues about how the material will be received by the local culture.

Uncertainty Aversion

Another important thing to consider about Japan is its high level of uncertainty aversion. This can be understood as behaviors and decisions that seek to minimize the potential for risk.

When learning English, students often adhere to very strict expectations about grammar and spelling. In the classroom or on a test, students’ answers are either perfect or they are wrong. This fact lends itself to students’ perceptions that they cannot speak English.

This thought can linger into adulthood. In Japan, you might notice that asking people for help may lead to the immediate response of, “I don’t speak English.” In actuality, many people understand English to varying degrees, but the avoidance of what they deem as failure often inhibits them from trying.

Aversion can be a big hindrance for international UX researchers in Japan because it can affect the full spectrum of circumstances surrounding user testing, from the selection of participants, to the local partner, to any of the agencies your local partner may contract as a third party.

It can also affect the ability to recruit for certain types of tests, or influence the structure of the test. Because of strict privacy laws, it is unusual for participants to test with their own smartphones, credit cards, or accounts. This is primarily to avoid any risk of user privacy being leaked—something Japanese people take seriously.

There are many other ways that aversion can affect testing, including affecting your relationship with your local partner. Let’s use an example of two companies working together. One company doesn’t concern itself with uncertainty—in fact it expects it to be an ingrained part of the process. Their partner, however, does not accept uncertainty. As you may guess, the potential for frustration and misunderstanding is high.

Picture shows two Japanese business people exchanging business cards, a common start to conducting business in Japan.
Figure 2. Japanese business introductions almost always begin with the exchange of business cards.

Consider also the way Japanese domestic user tests would likely be conducted. The specifications of the study would be diligently planned and finalized before the recruitment even starts. From there, testing would proceed with little to no changes.

So, when you think about repeatedly asking your local partner to make changes to the study in the middle of the research, consider that you may be overstepping a culture boundary. This is not to say that Japanese people are unwilling to be flexible, but this is something that should be explicit at the beginning of the study. Avoiding the surprise of unexpected change requests can go a long way towards relieving possible tension.

By working closely with local partners during the planning stages of a project, you can ensure a much smoother process for the research as a whole. It is imperative that both partners reach an equal level of understanding from the onset.

Tips for User Research in Japan

Figure 3. Creating a good atmosphere in user testing sessions requires a balance of politeness and sensitivity.

It’s good to remember that the aforementioned is by no means exhaustive of the differences in culture or the issues that can arise during user testing, so here are some tips for UX in Japan.

  • In global studies that need to be consistent across countries, do some background research first, or talk to your local partner to understand any cultural differences that may affect how the study is run. For example, if you’re doing a banking study to understand how users log in to their accounts to transfer money to pay their utility bills, make sure that paying utilities via bank transfer is actually an available option in that country.
  • Always plan for extra time for local partners to give feedback on the study methodology, research schedule, and participants you want to recruit. For example, in a website usability test with software engineers, your local partner might advise the study be conducted on weekends due to the difficulty of recruiting full-time engineers to participate during weekdays.
  • Try to send your interview discussion guide (even if it is just a rough draft) at the same time you send over your recruitment screener and other materials. This will let the Japanese team check content and give feedback about the appropriateness of the proposed target participants you want to recruit. For example, if you want to do a benchmark study between PlayStation and Xbox and want to recruit gamers for one-to-one interviews, your Japanese partner might tell you that only two percent of Japanese gamers own or play Xbox, so your study might not even be feasible.
  • Recruitment in Japan is done via online surveys, so screeners need to be converted into a multiple choice answer format. The number of questions should also be ideally limited to between 15– 20 questions to avoid large dropout rates that reduce the chances of getting a good pool of potential candidates to choose from.
  • Testing sessions are strictly kept to schedule. This is to avoid inconveniencing participants. In a lot of cases, if a session starts late, it still must end on time.
  • Be open-minded about cultural limitations as to what is possible or not possible when doing research. For example, when doing home visit interviews in Japan, it is extremely challenging to take any client observers or researchers with us. Many Japanese homes are extremely small and not designed (think 13–20m square total) to accommodate more than one or two people. Furthermore, having foreign observers at a home visit can be distracting and make the Japanese participant feel overly self-conscious and awkward.
  • Moderating in English via an English-to-Japanese interpreter is not recommended. In the past, due to the lack of opportunities to speak English with native speakers, some Japanese participants have seen the interview session as a chance to practice their language skills despite a moderator’s attempts to remind them to please speak in Japanese.
  • In Japan, English may be “cool,” but it is not well spoken. Testing with sites or apps that have been half translated or are still in English is not recommended. Users would likely waste a lot of valuable time commenting on or complaining about how the site is not in Japanese.

Conclusion

In user research, everything boils down to research partners being able to put their faith in each other, a faith that should be developed through open-minded cooperation and trust. This is particularly relevant in working across cultures when we need to be aware of and responsive to potential misunderstandings, limitations, or compromises.  By including partners in the early stages of research and clearly communicating expectations on both sides, we can alleviate issues and build more effective relationships. In the end, the level of collaboration between partners could prove to be the difference between a successful study and a bust.

More than Just Eye Candy: Top Ten Misconceptions about Eye Tracking

image of heatmap over a webpage
Figure 1. Heatmap representing the aggregate number of fixations across eighteen participants looking for movies they would be interested in renting (red = 10+ fixations).

Eye tracking is no longer a novel addition to the user experience (UX) research toolbox, used by only a few specialists. As more UX professionals incorporate eye tracking into their studies, many misconceptions are being created and perpetuated. These false beliefs and questionable practices have given eye tracking an undeserved bad reputation. It is time to start the process of change. This article describes
common misconceptions about eye tracking in the UX field and attempts to clarify its proper application.

MISCONCEPTION #1

All usability research can benefit from eye tracking.

One common belief is that any usability study will provide better insight if accompanied by eye tracking. However, a simple cost-benefit analysis of the insight gained versus the amount of additional time and resources that eye tracking requires will show that eye tracking is not always appropriate.

In formative usability testing, the ratio of insight to cost for eye tracking is very small; most usability issues can be uncovered with traditional usability research. For example, if a study goal is to improve the overall user experience on a new website or to make instructions for a medical device easier to understand, eye tracking may be of little value.

However, eye tracking will benefit studies that aim to answer specific questions that arose from previous testing (for example, Why are users struggling to find drug storage instructions on a package insert?). Eye tracking can also be useful in summative testing as it provides additional measures to help quantify the user experience.

MISCONCEPTION #2

Eye tracking is all about heatmaps.

Some UX researchers may not be aware that visual attention can be quantified in a variety of ways, and seem to believe that the goal of eye tracking is simply to produce heatmaps, such as the one in Figure 1. This is why many stakeholders do not consider eye tracking to be a valuable tool in the field. It is hard to blame them—all they have seen as output are pictures that provide no definitive answers.

Even practitioners who try to quantify attention often use just one or two measures, not realizing that there are a multitude of measures to choose from, each providing insight into different cognitive processes.

It is, therefore, important to analyze the measures that address the study questions. For example, to understand the interest that a particular object creates, the number of fixations on the object should be used instead of the average fixation duration (which indicates information processing difficulty, among other things) or the time to first fixation on the object (which indicates its discoverability).

MISCONCEPTION #3

Eye tracking results are easy to interpret.

“We just want to know where they are looking,” is often mentioned as a research objective. The people saying that seem to believe that if they knew where users were looking, they would know how the interface should be improved. Many stakeholders will be quite happy to receive heatmaps and/or a description of where the study participants looked. However, some will eventually realize that there is no easy translation between attention distribution and design recommendations, which will lead to the infamous “so what?” question. It is often overlooked that eye tracking results reveal the “what,” but not the “why.”

For example, the DVD New Release promo for the movie 21 in the center of the Blockbuster homepage shown in Figure 1 attracted significantly more fixations than the other four promos in the carousel. Did the promo for 21 get more attention because the movie received more hype than the other three movies and was thus of more interest to the participants? Did it get more attention because the promo had human faces in it while the others did not? Were participants who came to the website to look for movies to rent simply more interested in seeing DVD releases than what was in theaters or on TV? In studies with no systematic variable manipulation, the “why” is very difficult to tease out and the researcher can only speculate.

MISCONCEPTION #4

There is only one way to look at something.

Not all practitioners realize that context (task, background information, time allotted, etc.) can significantly change the way a stimulus is viewed. People do not look at anything in only one way. Asking someone to find a specific product on a web page will produce completely different results than having him look for the company contact information. If participants are not given specific tasks, it will be difficult or impossible to interpret the results in a meaningful way.

Also, results (both quantitative data and their visualizations) must be presented with the proper context or they will be meaningless. Heatmaps accompanied by statements such as, “This is how participants looked at the page,” must all beg the question, “While doing what?”

MISCONCEPTION #5

Let’s track and see what comes out!

Some studies are conducted without a clear understanding of how eye tracking data will address the research objectives. The researcher collects data, and then goes on a “fishing expedition” trying to retrofit results to objectives. Let’s say that the goal of a study is to determine which of two designs is “better.” Unless “better” is defined prior to data collection, the researcher may look to see if there are any differences between the designs, which introduces bias.

A lack of preparation for eye tracking studies can also result in “could have/should have” moments during analysis. Not thinking ahead may cause the need for a follow-up study. In addition, a poorly prepared study makes analysis more difficult and time consuming, especially if the recorded eye movements are not divided by task. Prior to the study, the researcher must select a clear start and end point for every task, and set up the study in a way that makes tasks easy to identify and compare later on.

MISCONCEPTION #6

There is a magic sample size for all eye tracking UX studies.

We often hear claims that you need thirty participants to conduct an eye tracking study. However, this is an oversimplification. There is no one sample size appropriate for all eye tracking studies. As in any other type of study, the sample size depends on multiple factors including research objectives and study design. Before the sample size can be decided, the researchers should determine if they are comparing different conditions or trying to generalize a particular score to the population. If two or more conditions are being compared, will each participant experience all of them (within-subject design) or just one (between-subjects design)? Also, what effect size should the study be able to detect?

Thirty participants are more than sufficient for a qualitative study in which the eye movement data are used to illustrate certain usability findings. Thirty should also be enough for a within-subjects study with a large expected effect size. However, thirty is not sufficient if five different designs are tested in a between-subjects study and definitive (that is, statistically significant) rather than directional results are required.

MISCONCEPTION #7

Eye movement analysis can be done by watching gaze videos in real time.

Some practitioners believe that they do not need to conduct a formal data analysis. Instead, they base their findings on what they saw while watching the participants’ gaze point, typically denoted by a moving crosshair or dot overlay on the stimuli.

However, people make several eye fixations per second. One minute of eye tracking can result in 200-300 data points! Therefore, it is impossible for a human to process and remember the data for one participant, let alone reliably aggregate data across several participants, by watching videos of their eye movements in real time.

Further exacerbating the problem is researcher bias. Since researchers know the objectives of the study, they may overemphasize the amount of attention on areas of interest to the study. They may also prioritize gaze patterns that make sense to them while downplaying gaze patterns that may be more representative of the study’s participants.

Real-time videos are good for illustration of findings but not for analysis. The data needs to be systematically aggregated before the results can be objectively determined.

MISCONCEPTION #8

The dot indicates exactly where a person looked and what they saw.

When viewing gaze replays or fixation plots, some practitioners incorrectly believe that the dot (or crosshair) indicates exactly where the participant was looking.

First of all, the eye trackers typically used in the UX field are less accurate than practitioners may assume. It is not uncommon for there to be a difference of up to a centimeter (on the recording) between the recorded gaze point and what the participant was actually focusing on. This difference tends to increase as the eye tracking session progresses due to changes in the relative position between the participant and the infrared cameras that capture the image of the eye. This can happen, for example, when a participant starts slouching while sitting in front of a remote eye tracker, or when a wearable eye tracker moves on the participant’s head. Figure 2 shows a few frames from a recording made with a wearable eye tracker. The participant seems to be reading the text on the baby monitor packaging even though the crosshair indicating his gaze point is sometimes outside of the box.

Second, the area of foveal vision (where people have the highest acuity) is often larger than the mark indicating a fixation. For example, when looking at a computer screen, the area a person sees with high resolution could be twice the size of their thumbnail.

Both the imperfect tracking accuracy and the properties of human vision should be taken into account during data analysis. One way to do this is by making areas of interest large enough to ensure that all relevant fixations are captured. When areas of interest are very close together, it is important to recognize that fixations captured within one area could belong to the other one.

Another reason there is no perfect relationship between the recorded gaze point and what people see is parafoveal and peripheral vision. Though objects outside of the fovea appear as blurry, they can be registered if they are large and/or familiar enough. For example, in Figure 3, it can be assumed with a high degree of certainty that the participant saw the phone in the packaging even though she did not look directly at it. Therefore, we should never say that someone did not see something, only that they did not directly fixate on it.

Frames with focal point for gaze of participant
Figure 2. Frames from a recording of a participant examining baby monitor packaging. The crosshair indicates his point of gaze.

Image of a gaze plot
Figure 3. Gaze plot of a participant trying to determine if the prepaid phone was web-enabled. The red circles indicates fixations – where the eyes were relatively still and the person was focusing on a particular spot on the packaging.

MISCONCEPTION #9

All data collected should be analyzed.

Eye tracking data is no different from the data obtained with other research methods in the way it should be treated. However, data cleansing procedures do not seem to be commonly used in eye tracking. The inclusion of outliers and participants who did not calibrate or track sufficiently well in the analysis may lead to an inaccurate representation of the data.

Just like researchers may exclude usability test data for a participant who took ten minutes to complete a task while everyone else took less than a minute, they should consider excluding participants with unusual (for example, three standard deviations from the mean) eye movement data. These data may indicate that task instructions were not followed.

Including data for participants who tracked only part of the time may also confound the results because a lack of attention can be meaningful. Without carefully examining gaze replays, it is impossible to know if these participants did not have any fixations on a particular area due to lost tracking or because the area did not attract their attention.

MISCONCEPTION #10

Anyone can do eye tracking.

Manufacturers of eye trackers do their best to make their hardware and software easy to use. After all, if anyone can do eye tracking, more systems will be sold. There is nothing wrong with tools that are easy to use and make researchers more efficient. However, just because someone can operate an eye tracker does not mean they can do eye tracking. Quality eye tracking research involves a lot more than clicking on buttons. In addition to common sense, it involves some knowledge of how the eye works, visual perception processes, previous research, research methodology, and statistics, among other things.

By applying more rigorous practices that are often used with other research techniques, UX professionals can ensure that eye tracking does not become a circus sideshow, but continues to generate valuable insights into the cognitive processes of the users, thus helping improve the user experience.眼动跟踪不再是用户体验研究工具箱中仅由少数几个专家使用的新奇之物。随着越来越多用户体验 (UX) 专业人员在研究中采用眼动跟踪方法,由此便产生并延续了许多误解。这些虚假信念和令人质疑的做法给眼动跟踪蒙上了不当的坏名声。现在是开始改变的时候了。这篇文章介绍UX 领域中有关眼动跟踪的常见误区,并尝试阐述其正确的应用。

这篇文章中所讨论的十大误区包括:

1.    所有 可用性研究都可从眼动跟踪获益。
2.    眼动跟踪全依赖于热图。
3.    眼动跟踪结果易于解释。
4.    了解事物只有 一种方法。
5.    让我们跟踪并了解到底出现了什么情况!
6.    对所有眼动跟踪研究都存在一个“魔力样本量”。
7.    视线运动分析可以通过观看凝视视频来完成。
8.    视点表示人们所看到的确切 位置和所看到的内容。
9.    应该对收集到的所有数据进行分析。
10.    任何人 都可以进行眼动跟踪研究。

文章全文为英文版アイトラッキングはもはや、ユーザエクスペリエンス研究のツールとして一握りの専門家が使用するような新奇なものではなくなった。より多くのUX専門家がアイトラッキングを調査に取り入れるにつれ、多くの誤解が生じ、それが蔓延している。これらの間違った考えや疑わしい測定方法はアイトラッキングに不当な評価を与える原因となっており、変化のプロセスのスタートを切る時期に来ている。この記事は、UXの分野で使われるアイトラッキングに関する一般的な誤解を説明し、その適切な利用について明らかにしようとするものだ。

この記事で述べられている10の誤解には以下が含まれる:

1. アイトラッキングは全てのユーザビリティ調査で役立つ。
2. アイトラッキングはヒートマップが全てである。
3. アイトラッキングの結果を解釈するのは簡単である。
4. 何かを見るにはたった一通りの方法しかない。
5. まずトラックしてどういう結果になるか見てみよう!
6. アイトラッキングの調査には「魔法のサンプルサイズ」がある。
7. 眼球運動分析は凝視を記録したビデオを観ることで行える。
8. ドット(点)がまさに被験者が見たところであって、何を見たかを示すものである。
9. 集めた全てのデータを分析するべきである。
10. 誰でもトラッキングが行える。

原文は英語だけになります

UX Increases Revenue: Two Case Studies

What do you do when you need to show someone in your organization, perhaps a skeptic in upper management, that the user experience of your website directly impacts the bottom line? While an eight to ten person lab-based usability study or focus group can yield a list of key usability defects or areas for improvement, neither truly demonstrates the business impact of user experience improvements. In order to persuade the skeptics, one generally needs a methodology that includes large sample sizes and allows them to see clear differences in the bottom line before and after a redesign. The two case studies presented in this article share such a methodology.

La Quinta

Situation Overview

La Quinta is a limited service hotel chain which had over 370 properties in thirty-three American states at the time of the project. The process of online bookings (at www.LQ.com) became increasingly important to La Quinta as the Internet grew, and website managers realized that customer loyalty to LQ.com had become a key component of La Quinta’s profitability.

The website management team needed to understand the behavior of their site visitors and identify opportunities to increase brand loyalty and online bookings. La Quinta hired Usability Sciences to perform a website assessment and established the following goals:

  • Determine who is visiting the website
  • Establish primary visitor intent
  • Establish site awareness
  • Measure site visitor success and satisfaction
  • Measure brand affinity
  • Measure likelihood to return
  • Capture visitor-suggested changes to the website
  • Improve each site visitor’s experience

Research Solution

Usability Sciences deployed WebIQ TM to capture site visitors’ demographic and attitudinal measures using an online survey with click-stream data. Visitors saw a survey at the beginning of their visit and responded with information about demographics, their purpose for visiting the site that day, and how they had found the site in the first place. Visitors then used the site, uninterrupted, however they wished, and their click-stream data was captured in the background.

At the conclusion of their visit (when they navigated away from LQ.com or closed the browser), visitors answered a series of questions regarding the success of their visit, their brand affinity, and their next intended action. They also suggested changes to the site . The project was implemented over the course of thirty days. A random sampling algorithm resulted in approximately one out of every three actual site visitors being invited to participate in the project and in 885 responses collected.

After data collection, a comprehensive examination of the data segmented the responses and click-stream data by user type, satisfaction with the site, visit intent, and visit success. Then we were able to recommend a variety of changes to the website that would increase visitor success. For example, first-time visitors who rated their visit a success were much more likely to return to the site than those who rated their visit unsuccessful. The study helped to identify an opportunity to improve conversion rates by addressing the issues raised by these visitors, such as:

  • Members of La Quinta’s loyalty program, Returns, who initially became members offline at a hotel, did not have an option for obtaining online account access.
  • Each time site visitors wished to view a hotel rate, they had to reselect their date and preferences. (The site did not remember visitors’ previously entered preferences.)
  • Visitors frequently requested more pictures of the hotel in the details of each property.

Results

Over the ensuing eight months, La Quinta implemented site enhancements based on the research. Then the same methodology was repeated, with the objective of measuring the impact of the site enhancements. Using a random sampling algorithm, approximately one in eight site visitors was invited to respond to the same question set, and 933 responses were collected.

Analysis of the data collected during the second run demonstrated substantial improvement in results. When participants’ responses from round two of the research were compared to those from round one, every metric of success and satisfaction on LQ.com was raised considerably. The most significant user experience changes to the site for La Quinta were:

  • Success improved by 48 percent
  • Satisfaction improved by 28 percent
  • Likelihood to return improved 17 percent
  • Brand affinity improved 50 percent

Translating these user experience metrics into bottom-line dollars, however, was most important to La Quinta management. During the same time period, marketing campaigns drove an increase in overall site traffic, which would naturally lead to an increase in revenue. Sufficient sample sizes in each round of online research allowed us to generalize the success rates in the research to the overall population of site visitors.

To determine the revenue growth that could be attributed to visitors being more successful in making reservations on the site, we followed a three-step process:

  1. We assumed that if there had been no user experience improvements to the site, the success rate of reservation-seekers would have been the same during the second round of the research as it had been during the first round.
  2. Using the first-round success rate, we calculated the number of reservation-seekers who would likely have booked during the second round, if there had been no user-experience improvements to the site. Multiplying this number by the average value of an online booking allowed us to estimate the amount of revenue generated due to marketing efforts driving more visitors to the site.
  3. Subtracting the revenue generated by marketing efforts from the total revenue in round two gave us a number that could be compared to the revenue generated for the same period during round one.

We determined that, due to user experience improvements, LQ.com saw a year-over-year revenue growth of 83 percent. Other branded websites within the industry saw a growth of 33 percent for the same time period.

American Heart Association

Situation Overview

The American Heart Association (AHA) is a leading non-profit institution for education and research on heart disease and stroke. AHA’s online donation site, like those of most large non-profits, has become an increasingly vital part of the organization. AHA needed to understand how well their visitors could use the online donation process, and AHA management was concerned about the percentage of site visitors who entered the online donation section of the site but did not complete the donation process. So they hired us to investigate how the site was being used and to look for ways of improving the design and functionality.

We designed a research project with the following objectives:

  • Determine the type of individuals visiting the donation section of the website
  • Understand the behavior of donation-section visitors and what contributes to successful or unsuccessful completion of the donation process
  • Document problem areas for the donation section
  • Before going into the development and launch phases, validate design and functional changes to gauge whether abandonment and failure rates were corrected

Research Solution

Using a method similar to the La Quinta case, we deployed our online research solution to capture demographic and attitudinal measures from an online survey, as well as click-stream data. As visitors began their visit to the site, they responded to survey questions about their demographic profile, their purpose for visiting the site that day, and their past donation history. Visitors then used the site uninterrupted, however they wished, with click-stream captured in the background.

At the conclusion of their visit, visitors answered a series of questions about the success of their visit, their satisfaction with the site, their reasons for not making a donation (if applicable), and suggested changes to the site . During the sixty-day implementation, every site visitor was invited to participate in the project, and 738 responses were collected.

Afterward, we examined the data, segmenting responses and click-stream data by user type, satisfaction with the site, visit intent, and visit success. Based on the data, we recommended changes to the website to increase donations. One such recommendation was that participants needed more flexibility in the donation process, such as an acknowledgement via email, the ability to customize the acknowledgement card, the ability to donate in the name of a company rather than an individual, and the ability to input a non-USA address.

Five specific areas for improvement of the online donation section of the website were used to build a high-fidelity prototype, which was then tested in a lab-based usability study. When compared with the original design, this prototype included a donation process that had half the number of pages, took half the amount of time to complete, and created a situation where people felt better about donating.

Results

AHA implemented the recommendations from the online research and the usability lab test during a period of “donor fatigue,” which followed after several natural disasters occurred in a relatively short period of time (including the Indian Ocean tsunami and Hurricanes Katrina and Rita). During this same period, other charitable organizations saw a decrease in donations. AHA conducted no marketing or promotional campaigns to increase donations or drive more visitors to the site. Even in this climate, they saw the following results appearing as soon as the redesigned site went live:

  • 60 percent year-over-year increase in online donations
  • Increase in the number of monthly donors
  • Increased average gift per donor
  • Improved visitor satisfaction with the online donation process
  • Increased likelihood to donate again
  • Increased likelihood to recommend donating to AHA online to others

AHA management also gained a higher appreciation for user research and user-centered design.

Ongoing Analysis

This research methodology has become an integral part of both La Quinta’s and AHA’s ongoing website enhancement process. The WebIQ solution helps both businesses prioritize web improvement efforts and then measures the impact of those improvements. Both organizations also use the results from online research to build and enhance prototypes that are tested in lab-based studies. The process of measuring, understanding problems, making improvements, and then measuring the impact of those improvements is the foundation of improving site visitor success, and thus the businesses’ bottom lines.

Ethnographic Research: Business Value

Ethnography has become the new marketing buzzword, a must-have in a designer toolkit, and a simple shorthand for qualitative research. Doing “ethnography” usually means using people-studying tools as part of a mixed bag of observation and interviews for data collection. If design is social, and we all seem to agree that it is, then ethnography is the means to access this sociality.

What is often not audible above the industry buzz is just how doing in-field ethnographic research translates into actual design ideas or guidelines, or adds value to the design process. We make two related propositions: first, that ethnographic insight derives not merely from designing methodologies for field research, but from orienting ourselves analytically to the topics of our study in ways that get at cultural dynamics—big and subtle—at all stages of research and analyses. And second, closely following from our first proposition, that ethnography is, fundamentally, a method of cultural analysis, or method of discovering and explicating cultural processes. Asking the right questions in the right ways means that we move towards grasping these processes, concepts, analogies, and ways of thinking—that can then serve frameworks for design. The idea is not merely to deploy ethnography as the next most valuable data gathering methodology, but to think ethnographically about the process of design itself.

But what constitutes an ethnographic orientation? How could cultural analysis lead us more effectively towards meaningful, sensitive, user-centered design?

Our first step is to conceptualize, however roughly at first, the microcosms or lifeworlds we are seeking to understand. In the beginning, there is often not a user group or even a practice, but an object or an artifact that centers our investigations: a phone, a washing machine, an e-commerce website, a food product. “What” usually comes before “who,” “where,” or “how,” but if we were to think along the lines suggested by anthropologist Arjun Appadurai in The Social Life of Things, objects have social lives, too. They are owned, loved, used, consumed, cared for, or neglected. They circulate, create value(s), identifications, and bind or break relationships. Our first task, then, is to conceptualize this lifeworld, or what we call an ecosystem, and to allow it to lead us toward communities of users and their social dynamics.

Ever since James F. Moore first proposed the idea of the “business ecosystem” in the Harvard Business Review (1993), the idea of the ecosystem has expanded to refer to other more-or-less self-contained “worlds” in which participants compete, cooperate, and co-evolve. Cultural ecosystems can revolve around products, services, or communities/groups. They contain not only task flow processes, but also the human social interactions that take place, are avoided, or enabled around—or as a result of—task flows. These, in turn, point to symbolic, aspirational, or other cultural themes that conceptually define how a given ecosystem works.

Within this space, a series of questions first map the terrain, and next investigate its logic, its mechanisms, and its cultural underpinnings. What is happening in the different environments of a given ecosystem? Who are the participants? How are they interacting? What are they feeling and experiencing in these various engagements? What are the objects that circulate in this ecosystem and how do these acquire significance? What local realities or cultural elements are at play? (Figure 1)

Different users with speech balloons showing different ways of talking about their mobile devices.
Figure 1. Example of a model of a phone ecosystem that shows how different participants (different user groups, developers, regulators, etc.) layer meanings onto the object, facilitating both brand value creation and complex personal meaning-making, derived from research done in India. (Image credit: Deepa S. Reddy)

Ethnographers answer these questions in two ways:

  1. Close observations and dynamic interview scripts that help to identify what is happening and how it is happening.
  2. Interpretive strategies of cultural analysis that identify why things are happening the way they are.

The anthropologist’s “ecosystem” is thus not just a space in which people live and function, but one in which their values, identifications, and aspirations are writ large, to be uncovered by cultural analysis.

These are some of the many strategies that anthropologists use to uncover local particularities and the symbolic dimensions of social interaction.

A Cultivated Naiveté

The classic ethnographic strategy of “defamiliarization” deliberately cultivates naiveté as a research strategy so as not to presume the meaning of any artifact or practice, but to lay bare its meanings. Anthropologists Rita Denny and Patricia Sutherland ask, for example, “What is an office?” and “What is coffee?” to investigate physical-metaphoric dimensions of workspace organization and consumption practices. The former question has implications for translating physical desktops into digital ones (Figure 2); the latter for redesigning “social places” and for product positioning.

Desk and monitors in a new-model bank branch in South Africa.
Figure 2. What’s a bank? Johannesburg, South Africa. New model of retail banking that seemed to trouble customers. (Image credit: Anjali Bhatia and Diane Devar)

Cultural Practices: Objects as Metaphors

Metaphors offer a second approach to cultural analysis (a little different from the methods sometimes used in UX design). We often experience some kinds of things in terms of other kinds of things: gadgets as personal assistants, bodies as machines, and clothing as identity. As a tool of cultural analysis, the metaphor allows us to map the connections between apparently unconnected things. Cultural views of “snacks” can tell stories of time between meals: of boredom, isolation, health, as well as socialization. Lifelogging (figure 3), technology-inflected self-experimentations can be experienced as practices of “mindfulness.” Metaphors reveal the cultural associations that make objects or practices meaningful in those particular contexts. Mapping ideas and associations at this subtle, experiential level means that products can be situated meaningfully—and imaginatively—in specific cultural landscapes.

Screenshots from a Lookback slide show on Facebook.
Figure 3. Lifelogging: Facebook Lookback (Screenshots from Ananya Roy’s Facebook profile)

The Jeweler’s Eye View

When a jeweler looks through his loupe, what does he see? Taking inspiration from Michael Fischer’s articulation, this “means not only the ability to bring out the different facets of cultural variability, but also a constant back and forth movement between close-up viewings and sitting back for a more global view of the settings.” Ethnography’s attention to rich, specific cultural detail is paired with more distanced “big picture” global views—such that micro- and macro-realities can be meaningfully connected. Preserving certain offerings globally (as McDonald’s once did with its broad commitment to accessible toilets), or understanding how global landscapes produce their own metaphors and dynamics, can be as critical to business success as understanding where to localize products and services. (Figure 4)

Barista preparing coffee at a cafe.
Figure 4. Serving mocha lattes at an upscale eatery in Nelspruit, South Africa. What is the restaurant worker’s relationship to the globally recognizable drink he prepares for patrons—but cannot himself afford? (Image credit: Deepa S. Reddy)

These strategies provide insights into how ethnography as a method of cultural analysis can help us not just document practices, but understand their symbolic and conceptual meanings. Knowing why people act, feel, think, experience, and choose the way they do gives us rich cultural insights to inform design or to build meaningful, empathetic user experiences. But how do such insights emerge?

The following three sections illustrate cultural insights that can be generated and applied using the above strategies, based on our own research experiences, as well as those of other ethnographic studies. Each begins from a position of cultural naiveté, presents cultural insights uncovered via ethnographic study, and proposes ideas for design application.

Romancing the Phone

A row of passengers on a train. Each passenger uses a mobile device.
Figure 5. A digitized public on the Mass Transit Rail in Hong Kong. (Image credit: Deepa S. Reddy)

In the familiar scene shown above in Figure 5, passengers in Hong Kong’s Mass Transit Rail are each absorbed by activities on their mobile devices. What each individual phone user is doing with their device is not discernible from afar, but when people are traveling or walking together, it is often evident that young couples (for example) are romancing their phones. Even while face-to-face, they are sharing photos, sending texts, scrolling chat screens together, and getting to know each other via their respective social networking profiles. What the phone is, in this instance, is an object of intimacy and courtship.

We know now intuitively what social theorist Anthony Giddens once noted: “A person may be on the telephone to someone twelve thousand miles away and for the duration of the conversation be more closely bound up with the responses of that distant individual than with others sitting in the same room.” Phones simulate proximity—to the point of displacing proximal relationships. But where is the need for such simulation when people are face to face? This particular use of the phone offers a critical insight: the simulation of proximity does not equal the negation of distance. A little distance, or a little mediation, can be electric, or a way of proceeding with caution. Both are vital catalysts of modern romance and relationship.

[greybox]

Design insights

  • Personal devices and other contextual products can help expand the sensory experience of love and romance. The experience of psychological nearness through voice, text, and symbols can be broadened to create sensual nearness even when people are apart, through the simulation of physical experiences like a hand squeeze, a sensuous breath, a heartbeat, a kiss, or even a partner’s snoring.
  • Appearance, sound, smell, touch, light, projection, holographics, and even apparently mundane elements like screen guards can be used to extreme-personalize one’s devices, present one’s self in the most desirable way, and engage the senses of an existing or potential beau. After a break-up, apps and life logging tools can track one’s moods and vitals to provide counseling and therapeutic crutches.

Design scenario

Angela has gifted her live-in boyfriend Zhao a Love Phone. It smells of her favorite perfume and alerts him to wake up every morning in her voice (while she is in a different place in the same apartment, getting ready or having breakfast). Zhao can play Angela’s live heartbeat on a pre-loaded app to ease himself to sleep after a grueling day while Angela stays up reading. On their joint commute back home they “selfie” themselves against exotic settings and share these images within their friend circle. Friends’ comments on how close they are, in spite of their busy schedules and personal interests, makes Angela proud of her Love Phone enhanced relationship.

[/greybox]

Disruptive Exchanges

Gifts and exchanges are managed with much care and finesse in Japan. What is less obvious to casual observers are the precise mechanisms by which these are handled, and that they can cause acute embarrassment and social disruption.

Take the expression “arigata-meiwaku,” which refers to excessive, unsolicited acts of kindness which place recipients in a position of great obligation, even shame, for gestures which cannot be reciprocated. (An example is that of an American tourist who returns a dropped note to a fellow train passenger, only to have the grateful recipient of this courtesy return the favor by going hours out of his way to escort her to her final destination.) Meiwaku (annoyance, disturbance, or commotion) is to be avoided at all costs, as it creates or imposes obligations on others in ways that disrupt the fine balance of exchange practices.

Japan’s unique adaptation of Valentine’s Day gifting offers another example of a carefully managed disruption. The event is an occasion for women to give or strategically denygiri-choco” or “obligation chocolates” to male family members, teachers, and particularly bosses, as a sign of amae or love, respect, and dependence—all values associated with women. (These gifts are then returned on “White Day” a month later via white confections.) The western origins of Valentine’s Day celebrations allow Japanese women to draw on ideas of women’s empowerment and individualism attributed to the “West,” briefly abandon conventional restraint, and express themselves openly in their workplaces, much to the embarrassment of ungifted superiors.

Clearly, even in a cultural context that so meticulously manages social exchanges, disruptions are not only possible, they are sometimes necessary to maintain a healthy social order. Clues to how disruptions work, where they might be bothersome, where they are useful, and in what quantum, are details to be identified only in systematic ecosystem research.

[greybox]

Design insights

  • In “From Meiwaku to Tokushita!: Lessons for Digital Money Design From Japan,” Bill Maurer, Scott Mainwaring, and Wendy  March suggest that digital money design should result in a “net decrease in commotion.” There should be neither any “new burdens [nor a decrease] in friction to a point of individual spending.”
  • Piggyback on other distinctively western ideas that are already adopted and acceptable to capitalize on “Western” cachet and introduce controlled disruptions.
  • Living with social hierarchies is, at times, not easy. Look out for acceptable opportunities to introduce relief, or to allow for rules to be subverted.
  • Gamify the practices of gifts and obligations in fantasy spaces in order to test their limits!

Design scenario

The latest craze in Kyoko’s life is a virtual reality world called Big Meiwaku that allows one’s avatar to break every gift giving and reciprocation custom ever conceived. The more rude, unconventional, and devoid of etiquette one is on Big Meiwaku, the quicker they will move up the levels and be invited into elite clubs specializing in different brands of unsocial behavior. She and her friends spend their tea breaks and lunch hour glued to it, relishing the fantasies they know they won’t be able to satisfy in the real world for a long time to come.

[/greybox]

Doing the Laundry

A mother and son share their views on home appliance brands with a researcher.
Figure 6. Ecosystem research on washing machines in India: a family shares their views on home appliance brands. (Image credit: Ananya Roy, 2011)

Other ethnographic studies have shown that laundry practices the world over express affective or emotional dimensions, attachments, and care—not just for the clothes—but for the people who will wear them (Figure 6). They can also express care for the environment, attitudes toward water and energy management, and associations with health, hygiene, freshness, cleanliness, tidiness, and more. Laundry is also a chore and a daily drudgery, but research with families invariably reveals that reducing a washing machine or a detergent to its efficiency, functionality, or even its convenience is to grossly understate its value.

Our own 2011 ecosystem research study with Indian households in Delhi, Mumbai, and Chennai found that the washing machine was not just another home appliance that reduced workload. One group of women from one of the three cities, typically from affluent-but-conservative extended families with little say in household decision making, expressed their self-worth by operating the washing machine. Interestingly, they had a preference for semi-automatic machines as these assured them substantial involvement and control of the process.

A second group of respondents from lower income nuclear families in another city, saw the washing machine as a tool of class mobility. The machine liberated them from the chore of cleaning and managing clothes and allowed them to use the extra time to attend to their personal development: higher studies, vocational courses, or a part-time job. This group preferred fully automatic machines.

Somewhat similarly, a third group of mothers of tweens to late-teens saw the washing machine as clearing time to help their children get ahead. The machine ensured their kids were well-groomed, and the mother was free to help with her children’s studies, run a well-organized, non-chaotic household, and breed well-balanced, successful, motivated children!

[greybox]

Design insights

  • It’s not sufficient to emphasize machine functionality, or efficiency. Target features and appliances at different user groups, based on their motivations and aspirations.
  • Design machines that allow users to visualize water or detergent use, or to see the volume of wastewater, in order to appeal to their sense of control (and guide them toward certain behaviors). Rather than assuming the need for pre-set washes and simple controls, return the decision-making power to users.
  • Allow the machine to tell a story about life—your This is both a branding strategy and a more personal record-keeping mechanism. On one level the machine allows you to live the life you choose. On another, perhaps the machine could record the details of daily usage and produce a record, suggest fixes, educate, and generate encouragements.
  • Smart machines, which communicate with personal devices, could aid in the above strategies.

Design scenario

Maya is a super busy mom of two school kids in Mumbai. Based on the first few days of use, her Mobile-Synced Microwave Washing Machine detected how dirty the kids’ clothes get, customized a washing option (detergent/duration/washing style/rinse combo) for her, and sent her the suggestion through its mobile app. Maya allowed the machine to go ahead and use the suggestion, and the clothes turned out so clean that she immediately messages the option to her other mom friends. Now she loads the clothes in the machine in the morning and initiates washing remotely through her mobile, at her own convenience, while she’s picking up her kids from school or waiting for them to finish violin lessons. She simply “messages” the machine to use the microwave function to repair clothes damaged during a sports event, or dry clothes on an especially rainy Mumbai day.

[/greybox]

Conclusion

The examples above point to the following conclusions:

  • Ethnography is more than a style of data-gathering. It is an analytical approach to making sense of the data gathered in close, immersive field studies.
  • The many cultural insights gained from ethnographic studies in specific ecosystems are not obtainable through straightforward market research-style questioning.
  • The ultimate value of ecosystem research for business and design lies in reading the ecosystem as a symbolic, conceptual universe, as well as a space of practical human interactions. Our needs and aspirations are most vividly expressed through the web of meanings that we create in our social exchanges, from which new ideas for design and innovation can most meaningfully emerge.

While there are no cut-and-dried formulae for extracting insights from ethnographic research, the strategies outlined above are useful starting points in the process of making sense of culture in ways that render it readable and open up the possibilities of further cultural analysis. Through these, our attention begins to be directed at uncovering the details of how interactions and relationships in the ecosystem are structured, how exchanges happen, how aspirations or dissatisfactions are expressed, and how native points of view are encapsulated in cultural forms and practices. Whether we are planning to get to field-based studies or are in the thick of field research, the objective is to identify the mechanisms by which human interactions generate culturally specific meanings—which are the crucial hooks to design and innovation.

所有用户研究的目的都是为了了解人们可能使用我们产品的方式和原因。人种学的研究策略,诸如培养纯真心态、换位思考和考虑文化习俗等,可以提供基于文化见解的设计构思。人种学方法不仅仅是一种数据收集方式,它还是一种分析手段,有助于理解在近距离融入式的实地研究中所收集数据的意义。

文章全文为英文版

모든 사용자 리서치의 목표는 사람들이 어떻게, 그리고 왜 우리 제품을 사용하는가를 이해하는 것입니다. 민족지 리서치 전략들, 즉 친근함을 키우고, 당신의 관점을 바꾸고, 문화적 관행을 돌아보는 것은 문화적 통찰력에 기반한 디자인 아이디어를 제공합니다. 민족지는 단순한 데이터 수집 방법 이상입니다. 이것은 친근하고 실감나는 현장 연구에서 수집한 데이터를 이해하는 분석적 접근법입니다.

전체 기사는 영어로만 제공됩니다.

O objetivo de toda pesquisa com o usuário é compreender como e por que as pessoas usariam nossos produtos. As estratégias de pesquisa etnográfica tais como, cultivar ingenuidade, mudar seu ponto de vista e pensar sobre práticas culturais, podem fornecer ideias de projeto baseadas em insights culturais. A etnografia é mais do que um estilo de coleta de dados. Trata-se de um método analítico para compreender dados coletados em estudos de campo detalhados e imersivos.

O artigo completo está disponível somente em inglês.

すべてのユーザー調査の目的は、人々が我々の製品をなぜ、どのように利用するのかを理解することにある。素朴さの発見、視点の切り替え、文化的慣行の考慮といった民族誌学的研究戦略により、文化的考察に基づいたデザインの発想につながる可能性がある。民族誌学は単にデータ収集の一つのやり方というだけではなく、現地の生活に密着して得られた調査結果を活用するための分析的なアプローチなのである。

原文は英語だけになります

El objetivo de toda investigación del usuario es comprender cómo y por qué las personas usarían nuestros productos. Las estrategias de investigación etnográfica como cultivar la ingenuidad, cambiar el punto de vista y pensar acerca de prácticas culturales puede proporcionar ideas de diseño basadas en conocimientos culturales. La etnografía es más que un estilo de recopilación de datos. Es un enfoque analítico destinado a dar sentido a los datos recopilados a través de estudios de campo minuciosos y exhaustivos.

La versión completa de este artículo está sólo disponible en inglés

Bridging Differences Across Cultures (Book Review)

[greybox]

A review of

Global Social Media Design: Bridging Differences Across Cultures

by Huatong Sun

Book Website

Publisher: Oxford Series on Human-Technology Interaction, Oxford University Press

272 pages, 7 chapters

About this book

A good reference for Methods/How-To, UX Theory, and Case Studies

Primary audience: Researchers and designers who are new to the topic or have some or significant experience with the topic.

Writing style: Academic

Text density: Mostly text

Learn more about our book review guidelines

[/greybox]

Global Social Media Design: Bridging Differences Across Cultures is the second ground-breaking publication by Dr Huatong Sun, Associate Professor of Digital Media & Global Design at the University of Washington, on cross-cultural technology design. In this new work, she approaches social-media design from a Global South perspective and teaches us how to apply her approach to global design as a whole. This book is thought provoking, timely, and relevant for students, researchers, and design practitioners at any level of experience. It has taken on new urgency in light of The Wall Street Journal’s revealing four-part series, The Facebook Files.

Dr Sun combines theory and practice through extensive field research and case studies. She grounds us in concepts and challenges that we, as a user experience research and design community, must understand and face in a globally distributed world. She discusses Human-Computer Interaction for Development (HCI4D) (i.e., HCI for under-represented populations) and how social media and our design practice can impact the developing world in a meaningful way if we understand the ways in which the products we design simultaneously are increasingly automated (i.e., divorced from human interaction) and increasingly social (i.e., supporting our need for social connection). We must meet the challenges of this dichotomy head on and strive for what she calls “cultural sustainability.” For Dr Sun, cultural sustainability seeks to design technologies “that [are] usable, meaningful, and empowering for culturally diverse users in this increasingly globalized world” (5).

The solution to this challenge is her CLUE2 approach. Her Culturally Localized User Engagement and Empowerment approach teaches us how to take our useful, usable solutions and make them meaningful in globally diverse and local cultures.

In an academic style, the first three chapters provide the theory behind the practice, grounding us in the research and history upon which her approach is based with deep-dive discussions into the relevance of cross-cultural studies and media studies and a close examination of power, agency, and structure in an ever more technocratic world. The next three chapters take on a more matter-of-fact tone and focus on her transnational fieldwork and the advanced design concepts essential to implementing Dr Sun’s approach (more on the case studies below). The final chapter sums up her thinking and pushes us to turn design crossroads into globally interconnected virtual town squares where people from all backgrounds and cultures have a place to meet and speak openly and safely. It’s a story of balance. In Dr Sun’s words, we should “place social practice at the center of the stage, [where] neither human actors nor computing technologies occupy [a] privileged position” (192).

More About the Case Studies

The first of three case studies appears in Chapter 4. Dr Sun looks at Facebook in Japan, and how its approach of networked individualism in America failed to capture Japanese audiences due to their preferences for online anonymity. She discusses how a culturally-sensitive and localized approach to political economy and discursive affordances could have produced a different outcome for Facebook in Japan, and she describes how we can use these affordances in our own work to create meaningful and successful culturally localized user experiences.

Chapter 5 looks at Weibo in China as an example of how copycat solutions replicated in different geographies have differing success rates. Weibo started as the Chinese Twitter but was banned because of the rising levels of political viewpoints posted in opposition to the Chinese government. It later morphed into a celebrity fan and ecommerce space, becoming so successful investors started referring to Twitter as the “Weibo of the West.” Dr Sun uses Weibo as an example of local uptake through culturally-sensitive localized reinvention.

Chapter 6 examines the competition between four short messaging services (SMS): WhatsApp, WeChat, LINE, and KakaoTalk and how they were adopted globally and within Dr Sun’s home region of the Pacific Northwest, which has a large East Asian immigrant population. She discusses the uses of the four platforms and how they contribute to mobility—technological, physical, political, economic, and social—via culturally sustainable value propositions.

Overall, Dr Sun’s book is jammed with information and insights that will help the user experience community design and build successful and meaningful social platforms, and a culturally sustainable Internet of Things.

 

 

Table 3.1 Design Matrix for Productive Engagement With Differences

Design Continuum

Design With Differences

Design Across Differences

Design for Differences

Applies to

Design in general

Cross-cultural, multicultural, and global design

Cross-cultural, multicultural, and global design

Goals

-Penetrate obscurity of everyday life

-Introduce innovative changes

-Decode cultural differences

-Empathize for differences

-Cross the cultural divide

-Champion for different values

-Explode stereotypes

-Shake the status quo and undermine normalization

Approach

 

Cultural competency

Cultural humility

Aggregating Insights: Survey Templates Build Cumulative Knowledge in Iterative Product Development

Surveying to Evaluate Feature Launches

As UX researchers, our input helps product teams decide which features to build, how to design those features to meet user needs, and when new features are ready to launch. The final stages of the development cycle typically involve running surveys to evaluate how a new product or feature is landing with users. In our experience, researchers usually write bespoke surveys to evaluate each new feature individually. While this approach allows teams to tailor survey questions to the feature in question, it introduces several problems:

  • Repeat work as researchers draft and test similar but new questions for each feature
  • Slower launch cycles due to the time required for survey development
  • Inability to compare user reactions across features that were evaluated using different questions
  • Limited benchmarks for user sentiment (for instance, uncertainty about what constitutes a “good” versus “bad” rating), which makes it difficult to reach decisions based on survey data

UX researchers can overcome these problems by creating feature-agnostic survey templates to evaluate all new features in a product space and conducting meta-analyses that aggregate responses from previous launches. In combination, these tools enable product teams to build a cumulative body of knowledge with deeper insights than any single study could provide.

This article shares best practices we developed while using survey templates and meta-analysis on several teams at Google™. We outline the benefits of this approach and offer practical suggestions for researchers interested in aggregating user insights across multiple launches.

Replication and Aggregation

Replication is a cornerstone of scientific research. Insights build upon one another, with researchers developing more accurate theories by replicating and extending prior work. As per Alison Ledgerwood’s article, the most precise, reliable, and trustworthy conclusions come from the aggregation of insights across multiple studies.

Replication and aggregation are critical because every study has limitations. For example, most UX research studies collect responses from a sample of participants to reach conclusions about a larger population. Because the sample only captures a subset of the total population, results necessarily include some amount of error, for instance, discrepancies between the sample and the true population. Even well-designed studies have some amount of error due to random differences between the sample and the population, which limits the conclusions researchers can draw from a single dataset.

Researchers can overcome these limitations using meta-analysis, which combines data from previous studies to test whether the findings replicate across multiple samples. By aggregating numerous studies in a single analysis, this approach increases the precision of research conclusions.

Although standard in academic research, meta-analysis is not widely adopted among UX researchers in industry because product teams often use bespoke survey questions to evaluate each feature. For example, a team testing a grammar-correction feature might ask users whether they felt the feature improved the accuracy of their writing, whereas a team testing a collaboration feature might ask whether the feature made it easier to work with colleagues. Measuring different constructs for different launches stands in the way of aggregating insights over time using meta-analysis. It also makes it difficult to answer questions like: “What’s a good score?” Without a comparative baseline from previous launches, researchers must make decisions based on data from a single sample.

Developing and deploying standard survey questions to measure user sentiment would allow UX research to become a more cumulative science. It would enable researchers to aggregate findings across multiple studies via meta-analysis and develop success benchmarks based on previous launches. In the remainder of this article, we walk through an example of how our teams used survey templates and meta-analysis to develop products for Google Workspace™.

Developing a Survey Template

In 2019, Google Workspace was developing a variety of features that used artificial intelligence and machine learning to help users be productive. Our teams needed a framework to evaluate user sentiment toward these assistive features, but we encountered several challenges: 

  • There were multiple product teams working on assistive features simultaneously.
  • Although the teams worked on similar user problems, they had distinct stakeholders.
  • Assistive intelligence was a new space, so we didn’t have metric benchmarks from previous launches. Instead, teams were relying on surveys to make launch decisions.

Figure 1: Example of assistive technology. Smart Canvas™ automatically completes phrases while users type.

We saw an opportunity to develop a standard set of survey questions to guide launch decisions. After getting stakeholder buy-in for a unified approach to sentiment analysis, we constructed the template in four steps following best practices from quantitative assessment.

Figure 2: Steps in a unified approach to building surveys for sentiment analysis.

Working in a fast-paced environment, we couldn’t fully develop the template before implementation. Instead, we crafted the template alongside ongoing feature development. We used open-ended questions when we first began, but as we gathered more data and confidence, we began converting them into more closed-ended questions. We then deployed potential survey questions in small experiments and iterated based on the results. Iteratively testing and refining the template ensured we captured the needs of various teams.

We eventually settled on several core metrics to determine the launch-readiness of new features, including frequency of exposure, satisfaction, distraction, and usefulness (see the Appendix for the complete template). Over time, the template became widely adopted because it streamlined research and provided shared language for making decisions about assistive features. In fact, teams used the template to evaluate more than 40 launches over the course of 2 years, resulting in over 75,000 participants responses. Such a large dataset provided new learning opportunities for our teams.

Meta-Analysis for Product Teams

By assessing the same constructs for various assistive features, the template enabled us to aggregate survey data using meta-analysis. There is robust literature explaining the methods and best practices of meta-analysis. Many of the tools were developed to enable researchers to conduct meta-analyses without access to underlying data by using only the summary statistics provided in published papers (see Further Reading).

In our case, we had access to the original data, so we simply combined participant responses into one large dataset from each study that used the survey template. While creating the combined dataset, we captured information about each of the surveys that we could use in the analysis, for example, which type of user the survey targeted (internal users versus external users).

Other variables included these:

  • User group—Which users were sampled? (internal, consumer, enterprise, or education)
  • Geography—Which country were users sampled from?
  • Survey delivery mechanism—How was the survey presented to participants? (within-product or email)
  • Launch stage—When in the launch process was the survey administered? (dogfood, alpha, beta, or public launch)

Once we had created a single dataset containing responses from previous surveys, we dug into the analysis. We developed two types of insights for the team. First, we provided benchmarks for each question in the survey template to show the average score and distribution of ratings from prior launches. These insights have been useful in answering questions such as: “What is a good satisfaction score for an assistive feature before launch?” Rather than setting arbitrary success criteria, our teams could now set goals and make launch decisions based on data-driven benchmarks.

Figure 3: A graph of simulated satisfaction ratings across multiple launches.

Second, we compared survey responses across key subgroups (for example, enterprise versus consumer users). These insights have been useful in contextualizing findings and setting nuanced expectations for survey results based on characteristics of the sample. They also helped dispel some common myths about survey responses. For example, prior to the meta-analysis, there was a widespread assumption that internal (dogfood) survey participants were more likely than external users to be critical of new features. However, our meta-analysis suggested this assumption was wrong. In fact, there was a small but significant effect in the opposite direction! If anything, internal users provided more positive feedback than external users. By correcting this assumption across product teams, meta-analysis unlocked more accurate data-driven decisions.

Conclusion

Academic researchers regularly aggregate insights through replication and meta-analysis to develop more precise estimates of the phenomena they study. UX researchers working in industry can benefit from the same process by developing survey templates that standardize the questions product development teams use to assess the launch-readiness of new features. Survey templates save time and create consistency across teams, which enables researchers to aggregate insights across multiple studies. Such cumulative knowledge allows product development teams to build a more robust understanding of user attitudes, set reasonable benchmarks for launch-readiness based on prior data, and make fine-grained comparisons about whether and how attitudes differ across contexts. We hope this guide makes it easier for other teams to enjoy the benefits of survey templates and meta-analysis.

Appendix: Survey Template

ConstructQuestionResponse Options
Self-Reported ExposureIn the last week, how often did you notice [feature] in [product]?1 = I haven’t noticed it; 2 = 1-10 times; 3 = 11-50 times; 4 = More than 50 times
SatisfactionOverall, how satisfied are you with [feature] in [product]?1 = Very dissatisfied; 2 = Somewhat dissatisfied; 3 = Neither satisfied nor dissatisfied; 4 = Somewhat satisfied; 5 = Very satisfied
Satisfaction (Why)Why did you choose that satisfaction rating?Free response
Perceived FrequencyIn the last week, how frequent were [feature suggestions] in [product]?1 = Far too infrequent; 2 = Too infrequent; 3 = Just right; 4 = Too frequent; 5 = Far too frequent
Perceived Recall (Corrective Features)In the last week, how many of [error type] has [feature] detected in your work?1 = None of them; 2 = A few of them; 3 = Some of them; 4 = Most of them; 5 = All of them
Perceived Recall (Non-Corrective Features)In the last week, how many suggestions has [feature] made for your work?1 = None; 2 = A few; 3 = Some; 4 = Many; 5 = A lot
Perceived PrecisionIn the last week, how many of the suggestions [feature] made in [product] were correct?1 = None of them; 2 = A few of them; 3 = Some of them; 4 = Most of them; 5 = All of them
UsefulnessIn the last week, how useful were [feature suggestions] in [product]?1 = Not at all useful; 2 = Slightly useful; 3 = Moderately useful; 4 = Very useful; 5 = Extremely useful
DistractionIn the last week, how distracting were [feature suggestions] in [product]?1 = Not at all distracting; 2 = Slightly distracting; 3 = Moderately distracting; 4 = Very distracting; 5 = Extremely distracting
Product ExperienceOverall, has [feature] made [product] better or worse?1 = Much worse; 2 = Slightly worse; 3 = Neither better nor worse; 4 = Slightly better; 5 = Much better

Further Reading

“Meta-Analysis” by Larry V. Hedges in the Journal of Educational and Behavioral Statistics 17(4)

Introduction to the Special Section on Moving toward a Cumulative Science: Maximizing What Our Research Can Tell Us” by Alison Ledgerwood, et al., in Perspectives on Psychological Science 9(6)

Practical Meta-Analysis by M. W. Lipsey and D. B. Wilson

“Meta-Analysis: Recent Developments in Quantitative Methods for Literature Reviews” by Robert Rosenthal and M. Robin DiMatteo in the Annual Review of Psychology 52(1)