Skip to content Skip to footer

Search Results for: diary studies – Page 3

Book Review: Of Testing and Techniques

Beyond the Usability Lab: Conducting Large - Scale Online User Experience StudiesA review of
Beyond the Usability Lab: Conducting Large-Scale Online User Experience Studies
By Bill Albert, Tom Tullis, Donna Tedesco.
Morgan Kaufmann, 2010

In 2008, UXMagazine devoted an issue to remote usability testing guest edited by Tomer Sharon. As managing editor at the time, I confess to having had a rather skeptical view of the value of remote testing.

One of the articles that made me re-think my attitude was a case study comparing the relative usability of two sets of information about the Apollo program, one provided by NASA, and one by Wikipedia. That case study was written by Tom Tullis, one of the authors of Beyond the Usability Lab, which takes the themes introduced in that issue (Vol 7, Issue 3, 2008) and explores them in detail.

Traditional usability testing has been focused on small sets of users, and the book does not take issue with this approach. Indeed Bill Albert said to me in an interview,  “…if the goal is really just to identify usability issues, I think I’d fall in line with a lot of other people, saying six to eight users is plenty to identify the significant usability issues with a particular design.” (User Experience podcast, episode 55, www.uxpod.com).

What the 2010 book concerns is the fact that web technology “enables us to move beyond the lab and efficiently conduct user experience studies with a much larger sample of users.”

The book begins with an introduction describing what the authors mean by “online usability studies,” including a description of when such studies are appropriate, what one can expect to achieve, and the strengths and limitations. Remote studies are good for comparing designs, for collecting detailed and scalable usability metrics, and for exploring design issues in the users’ own environment, with all its attendant complexities. On the other hand, there are many instances (such as identifying the major usability issues with early prototypes) when other methods are preferred.

The book is well-structured for the practitioner. After the introduction, the following three chapters explore planning, designing, piloting, and launching the study. The hands-on approach is reminiscent of Rubin’s (and now Chisnell’s) classic Handbook of Usability Testing, in that it contains sufficient detail to enable a practitioner to engage the method with a degree of confidence.

Chapters 5 and 6 discuss data preparation, analysis, and presentation.  Chapter 7 provides good in-depth analysis of specific tools (Loop11, RelevantView, UserZoom, and WebEffective) that can be used to conduct remote studies, as well as advice on choosing the appropriate tool for your own study. Chapter 8 discusses discount methods, including “totally homegrown techniques for the adventurous.”

Chapter 9 presents seven case studies of remote research conducted with between 24-300 users with a range of tools.

Throughout the book, specific examples illustrate concepts and methods. The authors provide detailed instructions for using Microsoft Excel to calculate appropriate averages and confidence intervals. There is also advice on dealing with data gathered from open-ended questions (when simple numerical analysis is not adequate).

The authors describe how to identify and deal with data from “flat-liners”—participants who complete studies as quickly as possible to obtain the associated incentive.

It’s a real pleasure to encounter a book that not only takes the reader on a journey through the rich possibilities of technique, but does so in a manner that is clear, readable, and accessible. I was particularly pleased with the simple explanations of statistical techniques, which are so often presented as incomprehensible.

If you’re interested in any of the following questions, you can look to this book for practical and effective answers:

  • Should you conduct a between-subjects or within-subject study?
  • What variables do you need to consider?
  • How can you deal with outliers?
  • How can you calculate and display confidence intervals?

The book does not shy away from the difficulties involved in conducting remote research. For example, if you want click-stream data, it may be necessary to have participants install or allow a plug-in, which may mean you can’t test with so-called novice users.

If I were to complain, it would be about the need for a chapter specifically on conducting studies on mobile devices—an area that is ripe for a similarly detailed “how-to” guide.

Whether you’ve conducted remote studies in the past and want to extend your capability and knowledge, or you are a complete newcomer, this excellent book is a necessary companion on your journey from the lab into the world outside. You will refer to it often, and it will alert you to opportunities and dangers. What more could you ask of a book?

Mechanical Turk: Quickly Scale Research Without Breaking the Bank

In 2017, design cycles move faster, and target audiences are more diverse and widespread than ever before. At the same time, we have an ever-increasing need for research. While this is great for job security, it also presents some challenges. For instance, budget and recruiting speeds don’t always improve at the same pace as our design cycles or our need for research.

We’ve heard the following problem discussed several times: A design team faces a brand-new research question and could really use some insights from niche target users before the end of their design sprint. However, the recruiting process alone will not complete until close to, or after, the end of their sprint.

While there are several solutions to this problem, many practitioners long for a solution that enables them to recruit for, and complete, a spontaneous ad-hoc research project, all while fitting budget, deadline, and recruitment needs.

It wasn’t until we attended Marc Schwarz’s “Mechanical Turk Under the Hood” talk at UXPA International Conference in 2016 that we seriously considered Mechanical Turk as a solution to this problem.

After a year of constantly experimenting and integrating Mechanical Turk into our process, we went back to the UXPA International Conference in 2017 to discuss our experience with a tool that we now find to be a powerful, low-cost, internationally reaching, scalable, and fast solution.

What Is Mechanical Turk? 

For those who are unfamiliar, Mechanical Turk is an Amazon offering that is marketed as access to an on-demand, scalable workforce. Those hoping to leverage this workforce can simply post a Human Intelligence Task (HIT), and people around the world (known as “Workers” or “Turkers”) are able to complete the task for some amount of compensation determined by those requesting the work.

The types of HITs include, but are not limited to, transcription, data collection, image tagging/flagging, writing, and answering surveys. Amazon uses the term “Artificial Artificial Intelligence” to describe Mechanical Turk, as its primary goal was to facilitate tasks that were difficult for artificial intelligence (AI) to perform by offloading the tasks to real people, or Workers.

Researching with Mechanical Turk

One thing that should be made clear up front is that while Mechanical Turk is in no way a full-fledged user research platform, it can be linked to one. As can be seen in Figure 1, a variety of HIT types are available, but our team has relied on only one to accomplish a wide range of research activities, and that is the Survey Link option.

As you aren’t truly limited to linking Workers to just surveys, this option essentially acts as a link to any research or testing platform you may already be using. For example, while our team has primarily linked Mechanical Turk to surveys, we have also successfully sent Turkers to Optimal Workshop to accomplish a card sorting exercise to test the information architecture of one of our offerings.

 Image of a Mechanical Turk project types.
Figure 1. Mechanical Turk offers a variety of project types to choose from.

Benefits of Researching with Mechanical Turk

It didn’t take us long to realize the potential upside of this tool after some initial pilot studies. We got up and running with some surveys and were immediately impressed with the results.

International reach

Our company designs products that are to be used in markets internationally, but we had previously struggled with finding many research participants outside of North America who met our target demographic. Although a slight majority of Mechanical Turk Workers are located in the US, the platform is a great way to reach an international audience and find a lot of variation across several demographics. Soon after launching our first study, we were getting responses from across the globe.

Scalability

After a couple of smaller pilot studies, we wanted to see just how many target users we could find on this platform. The verdict: It’s really easy to scale up studies with this platform. Thousands of Workers can be reached for a given study, which helped us screen large enough groups of participants for entire qualitative studies, and also enabled us to achieve statistical power when desired.

Most of our early studies required tens or hundreds of participants to complete, yet we usually finished data collection in a day or two. Since our initial use, Mechanical Turk has continued to be a go-to platform when we need immediate results for research questions.

Low cost

Budget wasn’t one of our top recruiting concerns at the time, but we felt pretty good knowing that we could conduct research on a low budget. Recruiting fees on the platform are fair (20% of the incentive for N ≤ 10; 40% of the incentive for N > 10) and the average incentive cost is low. Some activities pay only cents per completion. Despite this, we try to follow local incentive norms rather than taking advantage of the low incentives on the market.

Biggest Challenges Faced and How We Overcame Them

Although the platform has numerous unique strengths, our experiences with Mechanical Turk haven’t exactly allowed us to sit back and relax. We discovered a lot of challenges that required careful consideration and clever techniques to mitigate.

Low-quality responses

Unthoughtful, brief, and dishonest responses were constantly a problem when we first started out. We began requiring word count minimums and only publishing activities to Workers with 95% approval ratings and higher. It wasn’t a perfect solution, as we still had some issues, but they were usually a lot easier to spot and much less frequent.

“Friends? I work in IT. The only friends I have are here and we try to talk about anything but it.” (Worker, faking a response after being asked to describe their job as if speaking to a friend)

The main reason for a lot of the problems we faced is that many Workers are on Mechanical Turk for extra cash and are likely to speed through multiple HITs quickly to maximize their profits. To alleviate that, we started offering bonuses for thoughtful responses.

“Hello Dear Sir or Miss, could you pay me please? I need the money to buy something. Thank you.” (Worker, awaiting payment)

Shared answers on public forums

Workers frequently communicate with one another on several forums and Reddit. After doing some searching, we found that we were part of some discussions, which gave us unwanted attention. For example, at least one Worker was actively trying to get through our screener and share results with the community. To our benefit, they couldn’t figure it out and assumed that our study was broken. Publicity is generally not a good thing when trying to hide what you are truly screening for. We’ve since learned to avoid posting publicly when possible, offering too large of incentives (publicly), or offering too many HITs at once, as these can attract unwanted attention.

Finding Our Target Users Through Screening

As we had noticed that participants were able to reverse-engineer the “right” answers to get through our screeners more easily than we would have liked, we went to great lengths to give zero indication as to what we were looking for. In a perfect world, all respondents would answer screener questions honestly, but this often is not the case when the monetary incentive is the primary motivator. With this in mind, we wanted to try to make it harder for them to guess their way into qualifying for our activities. While this is a common goal when screening, to accomplish this we had to get a little more creative with this particularly persistent participant pool.

Open-ended questions

We originally began relying more on multiple-choice questions, which are much easier to guess or luck your way through. Once it became obvious that this wouldn’t suffice, we started looking to leveraging more open-ended questions. One particularly successful way we managed to screen participants out by job title was to simply ask participants to provide their job title in an open-field question.

While this would be easy enough to go through manually with a smaller sample size, we needed a way to automate the screening as we scaled up, so we managed to come up with a little clever logic on the back-end. SurveyGizmo allowed us to qualify or disqualify based on whether or not an open-ended answer contained a number of word strings; that way we simply disqualified any answer that didn’t include any of the job titles we were interested in. As shown in Figure 2, participants see no indication as to what we are looking for.

Image of a what survey respondents see.
Figure 2. Participants are given no indication as to what we are looking for when we ask open-ended questions like “What is your job title?”

Our solution was to build in a good deal of logic, which can be seen in Figure 3. While this sounds simple enough, we discovered it took quite a bit of iteration to ensure we weren’t screening out anyone we actually would like to take part in our research. There are likely a variety of titles that you may overlook, but you can always add and iterate on your list of logic.

Image of a screener logic.
Figure 3. Researchers can screen without giving away any clues as to what they are looking for by attaching logic to open-ended questions.

Knowledge probes

We also relied on open-ended questions to act as knowledge probes to ensure participants weren’t just lucking their way into our study. This turned out to be an effective way of determining who was actually a good fit for our study, or more importantly, who was not. While it became fairly easy to quickly screen out the duds, we did find that this measure didn’t lock out all the fakers.

Although we were relying on some of the automated screening logic, we still did manual reviews to ensure that no one got through who shouldn’t have. One day while doing this we discovered two identical answers to one of our knowledge probes. The question asked participants to explain a particular topic in their own words, and it turned out two participants had used the same exact words. After a quick Google search, we discovered both of these participants had Googled the topic and pasted in one of the first explanations they had found.

After this discovery we were able to adapt an approach similar to the one we took for screening based on job titles. Whenever we used one of these types of knowledge probe questions, we would do a couple of Google searches around the topic to collect snippets of explanations and include them in the screening logic for the open-ended knowledge probe. Surprisingly, this method caught a number of people.

Creative use of multiple choice

While we had shifted toward using more open-ended screener questions, we didn’t drop multiple choice completely. This too has been an effective tool for screening out the fakes for us. One particular implementation of a multiple-choice question that has become a staple in our screeners revolves around the tooling participants use. We ask participants to select from a list of tools to indicate the ones they are using, but we include a few red herring answers that aren’t actual tools. You would be surprised how many fakers this tooling question disqualifies from your activity. We attribute this to the number of oddly named tech tools out there these days.

Motivating Participants Through Qualification Surveys

Using these screening techniques, we’ve been able to screen out thousands of participants to get to a subset of a few hundred target participants. One useful feature of Mechanical Turk is its qualification surveys, in which you offer to pay a low compensation to all who participate, regardless of whether or not they qualify, with the promise of high-paying activities in the future for those who do qualify. You can do this instead of offering a large amount only to those who qualify while you screen out a number of participants who receive no compensation in the process. We’ve found that promise of getting access to higher paying surveys has been an effective incentive to get participants to take these qualification surveys.

Qualified user panels for quick launches

Through the use of these qualification surveys you are able to start creating groups of Workers that have already been deemed fit for your research studies. Once you have these established panels, it becomes much easier to quickly launch to a group you are confident matches your qualifications. One thing to note with the use of panels is the question of retention. We have experienced some issues where we launched to one of these panels and did not get the response rate we were hoping for. There appears to be some level of churn that has to be accounted for, so you will want to continually grow and curate your panels to avoid this low response rate.

Policies We Created Along the Way

Over time, we’ve formed policies to help improve our research methods, ethics, and data quality when using Mechanical Turk. Here’s a list of our top takeaways:

  • When in doubt, approve Worker submissions. Rejections hurt a Worker’s status. It’s usually easier to simply remove the data and/or the person from your panel.
  • Never ask for personal identifiable information from a Worker, as it violates Mechanical Turk policy.
  • Publish to Workers with high approval ratings (95% and higher) rather than requiring Master Workers, to increase data quality without substantially sacrificing scale.
  • Keep screeners as vague as possible so people can’t possibly guess what you’re looking for. Otherwise, clever cheaters will reverse-engineer your screener and complete studies they are unqualified for.
  • Don’t pay too little, but be cautious about publishing public HITs with high incentives as they draw unwanted attention from cheaters.
  • Where possible, incentivize thoughtful and honest responses by offering Grant bonuses.
  • Keep a database outside of Mechanical Turk for panels and Worker data. Protect the data as though it includes identifiable information.
  • Modify your survey link using custom code to mask the link until it is accepted and to send Worker IDs to your survey tool. (https://research-tricks.blogspot.com/2014/08/how-to-transfer-mturk-workers-ids-to.html.) This increases security by preventing Workers from attempting your screener before accepting the HIT. Also, when sent to your survey tool, Worker IDs can be used for payment processing and to prevent duplicate attempts.
  • Use automation to screen out masked IP addresses, duplicate or blank Worker IDs.
  • Closely monitor activities and provide Worker feedback when they reach out. We use an integration so that every Worker email is automatically responded to and updates our researchers in a Slack channel.
  • Since you cannot ask for contact information, use bonuses to follow up with a Worker (for example, $.01 and a message about a research activity, followed by another bonus for their incentive).

The Evolution of Our Usage

When we first started out, we recruited a lot of IT Workers and subject matter experts for various studies and panels. Overall, we were pretty successful at that. If we came across a research question and needed fast insights or a large sample size, we could usually rely on Mechanical Turk to get the job done. We’d continuously increase the size of our panels such that we’d have participants on-demand at any point in the design cycle and could feasibly get lightning fast results for most studies.

Since then, however, our usage of Mechanical Turk has shifted a little bit. As a result of our learning and the continuous improvement of our tool chain, we now leverage it without as high a regard for subject matter expertise as before.

Perhaps even more interesting is our newest use case, which takes advantage of our ability to reach such a diverse and broad audience. We’ve surveyed thousands of Workers about their physical and cognitive abilities. Since collecting this data, we have had the ability to publish studies to pre-qualified participant panels with varying abilities, enabling us to ensure that we are designing more accessible products.

Despite our changing needs, the scale, diversity, speed, and cost of Mechanical Turk offer so many potential uses that we have found innovative ways to integrate it into our process regardless of what else exists in our tool chain. Mechanical Turk has brought us closer to our enterprise software users by enabling us to fit unmoderated research into our quick development cycles, reach new and international audiences, and leverage large panels of prequalified target users.

Hold the Phone: A Primer on Remote Mobile Usability Testing

In recent years, remote usability testing of user interactions has flourished. The ability to run tests from a distance has undoubtedly broadened the horizons of many a UXer and strengthened the design of many interfaces. Even though mobile devices continue to proliferate, testing mobile interactions remotely has only recently become technologically possible. We took a closer look at several of the tools and methods currently available for remote mobile testing and put them to the test in a real world usability study. This article discusses our findings and recommendations for practitioners conducting similar tests.

History of Remote Usability Testing

Moderated remote usability testing consists of a usability evaluation where researchers and participants are located in two different geographical areas. The first remote usability evaluations of computer applications and websites were conducted in the late 1990s. These studies were almost exclusively automated and were neither moderated nor observed in real-time. Qualitative remote user testing was also conducted, but the research was asynchronous—users were prompted with pre-formulated questionnaires and researchers reviewed their responses afterward.

Remote user research has come a long way since this time. Researchers can use today’s internet to communicate with participants in a richer and more flexible way than ever before. Web conferencing software and screen sharing tools have made initiating a moderated remote test on a PC as simple as sharing a link.

Pros and Cons of Remote Testing

In deciding whether a remote usability test is right for a particular project, researchers must consider the benefits the methodology affords as well as the drawbacks. Table 1 details this comparison.

Table 1 Benefits and drawbacks of remote usability testing

Benefits of Remote Testing

Drawbacks of Remote Testing

Enhanced Research Validity

+ Improved ecological validity (e.g. user’s own device)

+ More naturalistic environment; real-world use case

Reduction in Quality of Data

– Inherent latency in participant/moderator interactions

– Difficult to control testing environment (distractions)

Lower Cost & Increased Efficiency

+ Less travel and fewer travel-related expenses

+ Decreased need for lab and/or equipment rental

Expanded Spectrum of Technical Issues

– Increased reliance on quality of Internet connection

– Greater exposure to hardware variability

Greater Convenience

+ Ability to conduct global research from one location

+ No participant travel to and from the lab

Diminished Participant-Researcher Interaction

– Restricted view of participant body language

– Sometimes difficult to establish rapport

Expanded Recruitment Capability

+ Increased access to diverse participant sample

+ Decreased costs may allow for more participants

Reduced Scope of Research

– Typically limited to software testing

– Shorter recommended session duration

Remote Usability Testing with Mobile Devices

With mobile experiences increasingly dominating the UX field, it seems natural that UX researchers would want to expand their remote usability testing capabilities to mobile devices. However, many of the mobile versions of the tools commonly used in desktop remote testing (for example, GoToMeeting and WebEx) don’t support screen sharing on mobile devices. Similar tools designed specifically for mobile platforms just haven’t been available until fairly recently.

As a result, researchers have traditionally been forced to shoehorn remote functionality into their mobile test protocols. Options were limited to impromptu methods such as resizing a desktop browser to mobile dimensions, or implementing the “laptop hug” technique where users are asked to turn their laptop around and use the built-in web cam to capture their interactions with a mobile device for the researcher to observe.

Unique Challenges of Testing on Mobile Devices

In addition to the limitations of common remote usability testing tools, other unique challenges are inherent in tests with mobile devices. First, operating systems vary widely—and change rapidly—among the mobile devices on the market. Second, the tactile interaction with mobile devices cannot be tracked and captured as readily as long-established mouse and keyboard interactions. Third, mobile devices are, by their nature, wireless, meaning reduced speed and reliability when transferring data. Due to the unique challenges of testing mobile devices, the tools currently available on the market still struggle to meet all the needs of remote mobile usability tests.

Overview of the Tools

In many moderated remote testing scenarios focusing on desktop and laptop PCs, researchers can easily view a live video stream of the participant’s computer screen or conversely, the remotely located participant can control the researcher’s PC from afar. Until recently, neither scenario was possible for testing focused on mobile devices.

In the last decade, improvements in both portable processing architectures and wireless networking protocols have paved the way for consumer-grade mobile screen streaming. As a result, researchers are beginning to gain a means of conducting remote mobile user testing accompanied by the same rich visuals they’ve grown used to on PCs.

Tool configurations

At present, moderated remote testing on mobile devices can be accomplished in a number of ways. These methods represent a variety of software and hardware configurations and are characterized by varying degrees of complexity. Figure 1 depicts four of the most common remote mobile software configurations that exist today.

A diagram depicting four researcher-participant software configurations for accomplishing remote mobile user testing. See text for detailed explanation.
Figure 1. Four configurations for remote mobile testing
  • Configuration A: First, the participant installs one tool on both their mobile device and computer. This enables them to mirror their mobile screen onto their PC. Then, both the participant and the researcher install one web conferencing tool on each of their PCs. This enables the researcher to see the participant’s mirrored mobile screen shared from the participant’s PC. Example: Mirroring 360.
  • Configuration B: First, the participant installs one tool on their PC. The native screen mirroring technology on their mobile device (for example, AirPlay or Google Cast) works with the tool on their PC so they do not need to install an app on their phone. Then, both the participant and the researcher install one web conferencing tool on each of their PCs. This enables the researcher to see the participant’s mirrored mobile screen shared from the participant’s PC. Examples include: Reflector 2, Air Server, X-Mirage
  • Configuration C: Both the participant and the researcher install one web conferencing tool on each of their PCs. The native screen mirroring technology on the participant’s mobile device (for example, AirPlay or Google Cast) works with the tool on their PC so they do not need to install an app on their phone. In addition, this tool enables the researcher to see the participant’s mirrored mobile screen shared from the participant’s PC. Example: Zoom
  • Configuration D: The participant installs one tool on their mobile device. The researcher installs the same tool on their PC. The tool enables the participant’s mobile screen to be shared directly from their mobile device to the researcher’s PC via the Internet. Examples include: me, Mobizen, TeamViewer, GoToAssist

As researchers, we typically want to make life easy for our test participants. Here we would do that by minimizing the number of downloads and installations for the participant. As a result, having a single instance on the remotely located participant’s end (configurations C and D) are clearly preferable over multiple such installations (configurations A and B). While configuration C does not require the participant to download an app on their mobile device, it does require them to have a computer handy during the session. Configuration D, on the other hand, does not require the participant to use a computer, but requires them to download an app on their mobile device.

Characteristics of the ideal tool

There are many tools that claim to support features that might aid in remote mobile testing. Unfortunately, when evaluated, most of these applications either did not function as described, or functioned in a way that was not helpful for our remote testing purposes.

As we began to sift through the array of software applications available in app stores and on the web, we quickly realized that we needed to come up with a set of criteria to assess the options. The table below summarizes our take on the ideal characteristics of remote usability testing tools for mobile devices

Table 2: Descriptions of the ideal characteristics

Characteristic Description
Low cost ·       The cheaper the better if all else is comparable
Easy to use ·       Simple to install on participant’s device(s), easy to remove

·       Painless for participants to set up and use; not intimidating

·       Quick and simple to initiate so as to minimize time spent on non-research activities

·      Allows for remote mobile mirroring without a local computer as an intermediary

High performing ·       Minimal lag time between participant action and moderator perception

·       Can run alongside other applications without impacting experience/performance

·       Precise, one-to-one mobile screen mirroring, streaming, and capture

·       Accurate representation of participants’ actions and gestures

Feature-rich ·       Ability to carry out other vital aspects of research in addition to screen sharing (for example, web conferencing, in-app communication, recording)

·       Platform agnostic: fully functional on all major mobile platforms, particularly iOS and Android.

·       Allows participants to make phone calls while mirroring screen

·       Protects participant privacy:

o   Allows participants to remotely control researcher’s mobile device via their own

OR, if participant must share their own screen:

o   Considers participant privacy by clearly warning when mirroring begins

o   Provides participant with complete control over start and stop of screen sharing

o   Snoozes device notifications while sharing screen

o   Shares only one application rather than mirroring the whole screen

How we evaluated the tools

Of the numerous tools we uncovered during our market survey, we identified six which represented all four of the configuration types and also embodied at least some of the aforementioned ideal characteristics. Based on in-house trial runs, we subjectively rated these six tools across five categories to more easily compare and contrast their strengths and weaknesses. The five categories that encompassed our ideal characteristics included:

  1. Affordability
  2. Participant ease-of-use
  3. Moderator convenience
  4. Performance and reliability
  5. Range of features

The rating scale was from 1 to 10, where 1 was the least favorable and 10 was the most favorable. The spider charts below display the results of our evaluation. As the colored fill (for instance, the “web”) expands outward toward each category name, it indicates a more favorable rating for that characteristic. In other words, a tool rated 10 in all categories would have a web that fills the entire graph.

We are not affiliated with any of these tools or their developers, nor are we endorsing any of them. The summaries of each tool were accurate when this research was done in early 2016. 

Mobizen and Team Viewer each had strengths in affordability and ease of use, respectively. However, Join.me and Zoom notably fared the best on the five dimensions overall.

Having done this analysis, when it came time to conduct an actual study, we had the information we needed to select the right tool.

Case Study with a Federal Government Client

Ultimately, we can test tools until we run out of tools to test (trust us, we have). However, we also wanted to determine how they actually work with real participants, real project requirements, and real prototypes to produce real data. We had the opportunity to run a remote mobile usability test with a federal government client to further validate our findings.

Our client was interested in testing an early prototype of their newly redesigned responsive website. As a federal government site, it was important to include a mix of participants from geographies across the U.S. The participants also had a specialized skill set, meaning recruiting would be a challenge. As a result, we proposed using these newly researched tools to remotely capture feedback and observe natural interactions on the mobile version of the prototype.

We chose two tools to conduct the study: Zoom (for participants with iOS devices) and Join.me (for participants with Android devices). We chose these tools because, as demonstrated by our tool analysis, they met our needs and were the most reliable and robust of the tools we tested for each platform.

To minimize the possibility of technical difficulties during the study itself, we walked participants through the installation of the tools and demonstrated the process to them in the days prior to the session. This time allowed us to address any issues with firewalls and network permissions that are bound to come up when working with web conferencing tools.

Using this method, we successfully recruited seven participants to test the mobile version of the prototype (as well as eight participants to test the desktop version). We pre-tested the setup with three participants whom we ultimately had to transfer to the desktop testing group due to technical issues with their mobile devices. Dealing with these issues and changes during the week prior to the study ensured that the actual data collection went smoothly.

Lessons Learned About Remote Mobile Testing

Not surprisingly, we learned a lot from this first real world usability study using these methods and tools.

  • Planning ahead is key. Testing the software setup with the participants prior to their scheduled session alleviated a great deal of stress during an already stressful few days of data collection. For example, our experience was that AirPlay does not work on enterprise networks. We were able to address this issue well in advance of the study.
  • Practice makes perfect. Becoming intimately familiar with the tools to be used during the session allows you to more easily troubleshoot any issues that may arise. In particular, becoming familiar with what the participant sees on their end can be useful.
  • Always have a backup. When the technical issues arise, it’s always good to have a backup. We knew that if the phone screen sharing didn’t work during the session, we could quickly relegate our testing method to one of the less optimal, but still valid mobile testing methods, such as re-sizing the browser to a mobile device-sized screen. If Zoom or Join.me didn’t work at all, we knew we could revert to our more reliable and commonly used tool for sharing desktops remotely, GoToMeeting. Fortunately, we didn’t need to use either of these options in our study.
  • Put participants at ease. Give participants a verbal overview of the process and walk them through it on the phone, rather than sending them a list of complex steps for them to complete on their own.
  • Tailor recruiting. By limiting recruiting to either iOS or Android (not both), you will only need to support one screen sharing tool. In addition, recruit participants who already possess basic mobile device interaction skills, such as being able to switch from one app to another. These tech savvy participants may be more representative of the types of users who would be using the product you are testing.

The Future of Remote Testing of Mobile Devices

While we’re optimistic about the future of remote mobile usability testing, there is certainly room for improvement in the tools currently available. Many of the tools mentioned in our analysis are relatively new, and most were not developed specifically for use in user testing. As such, these technologies have a long way to go before they meet the specifications of our “ideal tool.”

To our knowledge, certain characteristics have yet to be fulfilled by any tool on the market. In particular, we have yet to find an adequate means of allowing a participant to control the researcher’s mobile device from their own mobile device, nor a tool that screen shares only a single app on the user’s phone or tablet, rather than the whole screen. Finally, and perhaps most importantly, we have yet to find a tool that works reliably with both Android and iOS, not to mention other platforms.

Nevertheless, mobile devices certainly aren’t going anywhere and demand for better mobile experiences will only increase. As technology improves and the need for more robust tools is recognized, it’s our belief that testing mobile devices will only get easier.

 

Author’s Note: The information contained in this article reflects the state of the tools as reviewed when this article was written. Since that time, the technologies presented have evolved, and will likely continue to do so.

Of particular note, one of the tools discussed, Zoom, has added new mirroring capabilities for Android devices. Although a Zoom app for Android was available when we reviewed the tools in early 2016, screen mirroring from Android devices was not supported. Therefore, this functionality is not reflected in its ratings.

We urge readers to be conscious of the rapidly changing state of modern technologies, and to be aware of the potential for new developments in all of the tools discussed.

 

 

[bluebox]

More Reading

A Brief History of Usability by Jeff Sauro, MeasuringU, February 11, 2013

An empirical comparison of lab and remote usability testing of web sites, Tom Tullis, Stan Fleischman, Michelle McNulty, Carrie Cianchette, and Marguerite Bergel. Usability Professionals Association Conference, July 2002

Laptop Hugging, Ethnio Blog, October 29, 2011

Remote Evaluation: The Network as an Extension of the Usability Laboratory, H. Rex Hartson, Jose C. Castillo, John Kelso, Wayne C. Neale, CHI 1996, 228–235

Internet communication and qualitative research: A handbook for researching online by Chris Mann and Fiona Stewart ,Sage, 2000.

[/bluebox]

 

 

 

 

 

 

UX in Southeast Asia: Examples Across Current UX Maturity Levels

In Asian countries, especially in Southeast Asia, user experience or UX until relatively recently has been thought to be just focused on the look and feel of an interface. Not only did the majority of the senior management and influential decision-makers wrongly believe this, but also a significant number of design experts. However, recently, the perception of UX has evolved in Southeast Asia. Many now see it as being focused on creating a strong functional foundation within a designed interface or creating simple and delightful product experiences for customers.

The practice of UX design and research has largely been emerging and developing in South Asia, especially in Bangladesh, India, Pakistan, and Myanmar. Inspired through the measurable outcomes in different industries and rigorous case studies, UX is widely welcomed and practiced across the Asian markets. At least, I have observed these transformations in a couple of prominent countries in Southeast Asia. From my experiences and collaborations, I have observed people start to embrace collaboration and co-creation not only with stakeholders and designers but also with actual users and customers.

UX in Emerging Asia

In contrast to Europe and North America, Asia, particularly South Asia, has not had many professionals in the UX and Service Design fields. However, given the global reach of the culture of digital innovation, small digital startups, buyers’ online positive behavior, and local prioritization of digital transformation, UX design principles have started to emerge in South Asia, particularly in India, Bangladesh, Pakistan, and Myanmar.

Although India already has a growing and mature culture around UX, South Asian countries like Bangladesh and Myanmar have substantial areas of opportunity. These countries already have an incredible community of programmers, developers, and graphic designers who have started getting into the UX and Service Design fields. Scattered individual practitioners have worked independently because there is no strong practitioner community in place. Professional communities such as UXPA, IxDA, and SIGCHI are currently well established in India, Bangladesh, and Pakistan, whereas those communities do not exist in Nepal, Sri Lanka, and Bhutan. Though community focus derives from a larger number of experts and practitioners, collaboration can significantly enhance the growth potential of UX across those states.

A growing number of managers in less-established communities understand the value of UX. As a practitioner, I have been privileged to work, learn, and collaborate with some user-driven organizations and cultures across Southeast Asia. For instance, Asif Saleh, the Executive Director of BRAC, where I currently work, strongly advocates for human-centered design (HCD) and user experience (UX). He thinks humans are at the center of any design solution, and we must put rigorous effort to understand their problems before defining solutions.

“It’s imperative to have the capabilities to bring enough rigor to define and understand the problem, analyze, develop and test properly before delivering a well-defined solution.” – Asif Saleh, Executive Director, BRAC, Bangladesh

Similarly, Yasir Azman, the CEO of Grameenphone, the largest telecom service provider in Bangladesh, is thinking about how to leverage existing and potential services to solve user problems. He thinks, for instance, while talking about emerging 5G technology in Bangladesh and the need for e-SIM among the customers, that it will be possible to reduce waiting time and travel hassle.

“Users will be able to activate and change subscriptions without having to wait for a new sim-card to arrive by mail or go to a store to pick it up.” – Yasir Azman, CEO, Grameenphone, Bangladesh

So, the advent of UX will contribute to future technology and human driven products and services as well as transform the mindsets of key business executives in a wide variety of industries across Southeast Asia.

UX in India: Advent of Mature UX Practices

If someone asks about where the most mature and sophisticated UX culture is in Southeast Asia, undoubtedly the answer would be in India. Because of the large number of IT giants who have development operations in India, it has the most mature community in Southeast Asia, at least in major IT hubs.

Photo of a group of people sitting in a circle in active discussion. 

Figure 1. Collaborative ideation among UX practitioners to design solution prototypes. (Credit: HFI, India)

One of the prominent UX advocates in India is Human Factors International (HFI). During my CUATM and CXATM certification at HFI, I was privileged to meet some UX practitioners and industry leaders both from HFI and other tech giants such as Infosys, IBM, Microsoft, and Capgemini. According to them, UX is largely prioritized inside those organizations and fosters capacity building for their human resources. One UX designer from Infosys said that her company requires domain expertise on UX for each project. This person involves actual users alongside key stakeholders to develop sustainable and robust products and services.

Photo of a group of people smiling at the camera.

Figure 2. Teaming up with UX practitioners from different organizations with diversified backgrounds during HFI CUATM Certification training in Bangalore. (Credit: HFI, India)

Photo of a paper mock-up design taped to a wall; there are people discussing what is on the design.

Figure 3. Portraying the full user-centered design (UCD) workout from ideation and user research to low fidelity prototyping at HFI CUATM Certification training. (Credit: HFI, India)

Another common issue that is widespread in Southeast Asia is defining a solution rather than defining the problem first. Jay Dutta, SVP of UX Design at MakeMyTrip India and my peer at IxDA, said in one of his published articles that “Falling in love with the problem, not a solution, is a powerful paradigm-changing notion.” He was pointing to UX as a relationship toward users’ problems rather than a direct solution mindset. The widespread practice of defining the solution first before understanding the problem seems to be common in Southeast Asia so much so that a paradigm shift seems to be necessary. Proper education in UX is a tool that can be used to aid this shift.

However, as is true around the world, the major challenges UX experts and practitioners face in Southeast Asia largely include advocacy from top management, budgetary approval for UX activities, time and financial resources to conduct UX research, and most importantly, collaborating with programmers. For example, one of my close peers currently working at Capgemini says that it has been really challenging to collaborate with different technical stakeholders, such as IT, developers, and engineers to embed UX processes because of time and budget requirements. In her opinion, UX designers are mostly expected to deliver within limited timeframes that are often inadequate.

Even if there are lots of challenges and hindrances, UX practice is growing across India. Professional communities like UXPA, IxDA, SIGCHI, and others are active in major cities like Mumbai, Pune, Bangalore, Chennai, Delhi, and Kolkata. Also, the annual DesignUp Conference offers a learning opportunity from the practical knowledge and experiences from different industry practitioners and experts with a wide variety of experiences in UX.

Key Learnings and Takeaways

To sum up, experience and learning from UX practice in India results in some fruitful learnings and takeaways:

  • UX Advocates: Companies are now creating UX advocates to represent users and customers to foster user-centered product or service development.
  • Top management buy-in: To generate adequate buy-in not only to obtain budget and adjust timelines but also to keep the user-centered design in their decision-making process, it is necessary to educate the top management of organizations
  • Collaborative acumen: UX practitioners must collaborate with technical stakeholders to deliver the products or services at the right time with the right goals.
  • Widespread communities: Professional communities of UX practitioners develop and support more passionate learners and practitioners. Emerging workshops and conferences present opportunities to collaborate.

UX in Bangladesh: An Emerging and Growing Industry Trend

With the growth of startups, technological interventions, self-service technologies, and simplification of customer experience, UX has seen a growing demand in different industries in Bangladesh. Specifically, in the top technology, banking, and service industries, we have seen user-centered, self-service app development; customer journey design; prototyping; and testing of products or services before final launching.

Photo of a group of people meeting.

Figure 4. A focused group discussion as part of user research to conceptualize a service. (Credit: Grameenphone, Bangladesh)

For instance, the dedicated UX team at the largest telecom service provider, Grameenphone, has contributed to the simplification of user journeys, the design of a self-service app, and user research that details user requirements. The company has developed contextually appropriate UX methods and usability metrics to expedite mature UX practice across Grameenphone. For example, as it considered users’ pain points and complex journeys, the team recognized the imperative to redesign basic services such as the wider portfolio of value-added services, the core direct SMS-based communications to users, and the digital platform for basic service. The team also has had experiences beyond basic services; they have redesigned ideal user journeys of core services, developed and published UX guidelines, and performed rigorous iterations of prototyping resulting in the delivery of full-stack, self-service apps like MyGP (customer app) and OneGP (employee app). We have seen similar activities in other telecom and technology-based services companies in Bangladesh.

Two people working together while being video recorded.

Figure 5. Our UX expert Shadman Rahman conducting a usability test session to validate the prototype with a target user while designing a web-based service at Grameenphone. (Credit: Grameenphone, Bangladesh)

UX plays a vital role in the success of some prominent and growing startups in Bangladesh. Pathao, a ride-sharing service, and Chaldal.com, a grocery delivery service, have both leveraged UX practices in developing their multi-channel experiences. Sheba.xyz, a demand-based utility service for consumers, developed a multichannel user experience for customers—call center, app, website—to enhance the ease of use of their services. The largest mobile financial service company bKash has a full-stack design and development team who followed an Agile project management method to deliver a user-centered product experience. Instead of a myopic product-centric focus, major tech firms now tend to follow a user-centric process to design and develop their services which clearly shows a rapid scale-up of UX.

Photo of two researches asking locals questions.

Figure 6. A user research and ethnographic field visit in a remote rural area of Bangladesh to create empathy with user needs and motivation. (Credit: BRAC, Bangladesh)

The impact of UX work can been seen in some sectors that one might not expect, such as NGOs, training, and capacity building organizations. For example, BRAC, the largest NGO in the world that originated in Bangladesh, now uses a human-centered design (HCD) methodology to design and develop their different programs and interventions.

Photo of a person showing and discussing a picture with two other people.

Figure 7. To build empathy, the researcher tests a first level prototype with a target user at rural water-confined area of Bangladesh as part of an HCD project at BRAC. (Credit: BRAC, Bangladesh)

Photo of a group of people in a discussion. There is a written plan with lots of notes attached to the wall behind them.

Figure 8. Co-creation workshop during the synthesis step of HCD to consolidate the insights and design of the final prototype. (Credit: BRAC, Bangladesh)

Some of the recent initiatives, such as rapid prototyping solutions for rural water-confined inhabitants, designing digital financial services for rural women, and prototyping for a climate change program, used an HCD methodology driven by BRAC’s dedicated design team at the Social Innovation Lab. BRAC often collaborates with global design partners on combined initiatives; in one example, we partnered with the Gates Foundation and IDEO.org to work on digital financial services for rural women.

Photo of multiple groups of people working in their groups.

Figure 9. Ideation workshop between BRAC and IDEO.org to conceptualize the prototype. (Credit: BRAC and IDEO.org)

Apart from that, some tech-based training and capacity development organizations are now providing UX courses and workshops. Examples include large-scale training organizations, such as BITM (BASIS Institute of Technology & Management), as well as small-scale startups, such as Upskill.

 Photo of a group of people working together.

Figure 10. Participants create a design strategy during a UX Design course at Upskill, Bangladesh. (Credit: Upskill, Bangladesh)

Finally, global user experience and interaction design communities, such as UXPA, IxDA, IDF, and SIGCHI, now have local chapters in Bangladesh, where a substantial number of designers, researchers, programmers, and enthusiasts are connected to share their experiences, thoughts, and learnings. As a founder of local chapters of UXPA and IxDA, I have been personally privileged to collaborate with individuals with different experiences, industry leaders, CXOs, and passionate UX professionals. A recent celebration of World Interaction Design Day 2019 in Bangladesh hosted by IxDA and Adobe was a perfect reflection of this kind of meet-up and collaboration.

Photo of a large group of attendees.

Figure 11. World Interaction Design Day 2019 meet-up organized by IxDA Bangladesh Chapter powered by IxDA global and Adobe. (Credit: IxDA, Bangladesh Chapter)

Key Learnings and Takeaways

To summarize, UX is widely being practiced and promoted in Bangladesh. The key learnings and takeaways are the following:

  • Startup proliferation: Startups are growing, especially those focused on tech-based solutions. UX work appears to be growing geometrically based on demand and UX community development.
  • Mature practices: Some organizations foster mature UX practices and consider it as their part of regular product design process. However, similar to India, budget and time constraints are a limitation.
  • Certified professionals: Although there are fewer certified UX professionals and practitioners, people are learning and getting domain expertise at a growing pace.
  • Extended UX communities: Due to the large number of designers, practitioners, and developers, UX communities are developing and therefore able to collaborate on learning and experience opportunities together.
  • UX as critical competency: Due to the small number of certified UX professionals and domain experts, there is a great need and demand for people with solid UX acumen.

Growing Potential of UX in Other Southeast Asian Countries

The influence of mature UX practices and institutionalization has spread in the last couple of years. Other countries such as Myanmar, Pakistan, Sri Lanka, and Nepal have some significant evidence of a growing UX culture and practices. The technology, communication, and banking services industries reflect substantial user centricity and collaboration.

In Myanmar, Telenor, the second largest telecom operator locally and subsidiary of the European telecom service Telenor Group, demonstrates UX practices to foster seamless experience for its subscriber base. The initiative has largely been driven by a well-established UX team made up of designers, researchers, and managers. In addition, regular learning opportunities and workshops promote the user-centered mindset across the organization. Some of their key products, such as the MyTelenor app and Facebook chatbot, reflect the use of rapid prototyping, testing with actual users, and UX-based collaborations.

Photo researchers asking questions.

Figure 12. Moment of capturing user insights during a user research field visit at a local shopping mall in Yangon, Myanmar. (Credit: Telenor, Myanmar)

Photo of groups of people in discussion.

Figure 13. A synthesis co-creation workshop after gathering the user insights through user research. (Credit: Telenor, Myanmar)

Photo of a person working on an ideation board.

Figure 14. A portrayal of ideation while an empathy map dashboard is being refreshed by a UX researcher to depict the final user insights from user research. (Credit: Telenor, Myanmar)

Telenor aspires to institutionalize mature UX practice not only to create empathy-driven products and services, but also to foster an innovation culture across the organization. Niklas Lind, former SVP of Digital Products, thinks that proper institutionalization of mature UX practice alongside product managers will enable collaborative teams to develop products and services that customers truly love. In another example, Thi Han Tin Aye, former head of Customer Experience, believes in UX as it supports the creation of an innovation culture in the telecom industry. Telenor sets an example of distinctive UX practices in the telecom industry.

Photo of two people working together.

Figure 15. One of the usability test experts at Telenor, Hnin Lwin, conducts a usability test session to validate a prototype. (Credit: Telenor, Myanmar)

Photo of a hand-written prototype.

Figure 16. Sample of a low fidelity prototype sketch during the redesign of user journey of a particular feature in the MyTelenor app. (Credit: Telenor, Myanmar)

Yoma Bank, a growing banking company in Myanmar highly focused on digitizing their experiences, is well known for their service development and delivery. They have a dedicated design team whose scope of work largely includes designing user-centered products and services, evaluating existing services to improve customer experience, promoting UX across the organization, and educating stakeholders about UX. Phyu Mon, the Senior UX Designer at Yoma Bank, promotes UX with a variety of relevant activities such as creating UX guidelines, establishing core flows for services, educating in-house stakeholders, and mentoring interested learners across the organization. Those initiatives have had a significant impact on Yoma Bank’s overall products and services in last couple of quarters.

Photo of two people smiling for the camera while sharing a meal.

Figure 17. An in-depth interview session with a banking professional at Yoma Bank Myanmar to understand digital banking user behavior. (Credit: Yoma Bank, Myanmar)

Another farming technology-based NGO in Myanmar, Proximity Designs, has more sophisticated design-driven processes to develop their products and services that are primarily focused on local farmers. Partnering with Stanford Design School, they developed several innovative farm-tech solutions, scaling up the financial feasibility and affordance for rural farmers. In addition, some local initiatives and learning opportunities have been created by UX promoters. Super Campus, a local startup, offers different design and UX based courses along with some free and voluntary learning opportunities.

In Pakistan, UX is widely promoted and, in some organizations, this has been driven by top management. Design communities such as IxDA and UXPA are well connected and feature significant numbers of design professionals.

In Nepal and Sri Lanka, a growing culture of UX is best exemplified by startups and social innovation initiatives. For example, ishopping.pk is a startup that provides an online shopping service. Their focus is on creating a clean and friendly user interface and ensuring proper information architecture on their website to facilitate a hassle-free shopping experience. Edopedia provides online education services through a website that has free tips, tutorials, how-to guides, and a variety of content related to computer learning and technology. Because the whole service platform is based on website content, they regularly conduct user research to get ideas on how to present that content.

Key Learnings and Takeaways

Southeast Asian countries like Myanmar, Pakistan, Nepal, and Sri Lanka would benefit from more UX domain expertise as they are not as mature from a UX perspective as India. With a growing need to understand users, especially in the context of tech-based products and services, UX appears to be a critical skillset for the next decade. The key takeaways and learnings are the following:

  • UX communities: Despite the mature and well-established communities in Pakistan, the other countries mentioned do not have established UX communities. This a significant area of opportunity.
  • Lack of local talent: Due to the lack of local expertise, roles in user-centered product development are largely filled by repatriates from nearby Asian countries, mostly from Singapore.
  • UX promotion: The many ways UX can improve products and services should be promoted. The prevailing misconception of UX as only visual or graphic design is highly observed, especially in Myanmar. There is an opportunity to educate the community on proper UX methods and practices.

Conclusion

In user experience and user centered design, everything boils down to teams and partners fostering mature collaboration and co-creation practices in a relationship that encourages open-minded acceptance of user needs, motivations, and desires. The growth of that mindset across Southeast Asia reflects the receptivity of organizations to the idea that UX practice pays substantial dividends. In India, the growth in UX over the coming years will look like other countries in Asia, Europe, or North America. On the other hand, UX is still maturing in countries like Bangladesh and Myanmar. Considering the learning, practice, and mindset of user-centered product or service development, UX seems to be growing at a rapid pace in Southeast Asia across a variety of different industries. And, who knows, UX practice in Southeast Asia in 10 years may develop to the same level of maturity that we see globally!

ICT and People with Cognitive Disabilities: Variations in Assistive Technology

When Jane (not her real name) was in her fifties, the supply of oxygen to the area of her brain responsible for language processing was interrupted. The stroke left Jane with aphasia, impairment in using language. Jane can understand spoken language well, but speaking and reading are very difficult. Like so many people, Jane uses the Web to access information, but the process is slow and painful. As Jane expresses it, when confronted with a cluttered page of information, “I have to read everything.” That is, Jane has lost the ability to skim text visually. Watching her work her way, word by word, through irrelevant text, looking for the information she needs, makes one sharply aware of how heavily computer use relies on rapid skimming, an ability most of us use without ever being aware of it.

In a consultation on ways that improved technology could help people with aphasia use computers more effectively, Jane had her first experience with text-to-speech software. The technology allowed her to select text on the screen and have it read aloud. While the software didn’t restore her ability to skim, it greatly enhanced her ability to understand the text she needed to read. Her excitement was visible in expression and body language as she struggled to say, “I haven’t seen these words since my stroke.”

Over 22 million people in the United States have difficulty with some cognitive functions, including memory, language, and learning issues. World demographic numbers for cognitive disabilities are uncertain, but 500 million is a plausible estimate. Information and communication technology (ICT) promises substantial benefits for these people, many of whom are computer users; a study by Microsoft indicates that 16 percent of computer users in the U.S. have a cognitive disability. Happily, many of the techniques that improve access to ICT for other groups also deliver value for people with cognitive disabilities. Here are some of the approaches that usability professionals can take to improve the usefulness of their products for these users.

Text-to-speech reading aids are valuable.

Screen readers for the blind play an obvious role in accessibility for those users. Less obviously, as Jane’s case shows, related tools can help people with cognitive disabilities process text. Many people, like Jane, have trouble reading, not because of visual problems, but because of trouble decoding the text. Hearing text read is a big help. A corollary is that attention to those aspects of application design that support screen reader access—not rendering text as a graphic for example—is important for users with cognitive disabilities as well as for those with visual impairments.

Clear, simple language pays off.

Writing that is clear, to the point, and expressed without unnecessarily fancy vocabulary makes applications easier to use for people with cognitive disabilities (and for other people, too). Unfortunately, this basic design attribute is tricky to measure or express quantitatively. Reading level formulae may seem like good tools here, but as Ginny Redish has pointed out, they can backfire seriously. There are two different reasons for this:

  • First, the tools don’t distinguish the situations in which big words are actually needed from those in which they are not. Trying to replace standard technical terms with words that seem simpler, but are unfamiliar, will make text harder to understand, not easier.
  • Second, reading level formulae work by correlations between various attributes of text, like sentence length and readability, but the correlations are based on naturally occurring text, not text that has been modified artificially to change the attributes. While it is true that among naturally occurring texts, those with shorter sentences tend to be easier to understand; if you take a text with long sentences and chop the sentences up, the sentences will be shorter. But the resulting text is often harder, not easier, to understand.

Usability professionals are well placed to respond to these difficulties. They are used to the reality that design guidelines are only a good start, and that user testing and attention to user feedback is essential to high-quality results. The same principles work here, too: see if people can actually understand the text that is included in your applications.

Another design feature that broadens the range of users who can use an application is glossary support. An easy way to see an explanation of a jargon term or acronym will be appreciated by many users.

Text structure and layout are also important. People who read with difficulty benefit from headings and layout that make it easier to identify what text they really need to process. This is especially important for people who can’t skim, like Jane.

Simple, clear navigation is crucial.

While data from users with cognitive disabilities is in short supply, studies that have been done suggest that the usability problems such users encounter are similar to those typical users find, but with greater impact. For example, any user may be confused by a back button that doesn’t work on a web page, but users differ in how well they are able to recover from, and cope with, the problem. A person with a cognitive disability may not be able to get back on track. Here again, usability professionals already make it their business to identify and eliminate these and similar difficulties in navigation. And, as for comprehension problems, they recognize the need for user testing to achieve good quality.

The same considerations apply to other aspects of interaction besides navigation. The canons of good design for usability, including eliminating unnecessary choice points and providing clear feedback on progress, positive or negative, apply with greater force to users for whom problem solving is difficult.

Allow users to tailor the presentation.

This is another point of overlap with other aspects of inclusive design. Control of font sizes, colors, and contrast is helpful for the many people who have visual impairments as well as cognitive limitations, and usability professionals are already aware of the importance of supporting this. Going further, allowing users to control the amount of detail that is presented can also be helpful. A New York Times story in 2008 reported that some users are choosing to use the mobile versions of some websites because these versions present just the most important information and controls. A design that lets users request this kind of presentation directly would be step forward.

Some historical perspective may be useful here. Old timers will recall early design guidelines that called for very limited information to be presented per screen or menu, often following a mistaken interpretation of George Miller’s classic “Magical Number Seven Plus or Minus Two” paper. Now we understand that for typical users, placing as many options on a page as will fit is actually helpful because it avoids the problem of interpreting abstract descriptors that stand for subsidiary choices (does “clothing” include “shoes?”) and because it requires fewer user actions to complete a given task. Mainstream sites like Amazon.com demonstrate this design direction dramatically by placing literally hundreds of options on a single page, when dropdowns are included.

For some users, however, this design direction works poorly. If you have trouble skimming, as many users like Jane do, you will have trouble finding the option you need among hundreds of alternatives. A presentation option that lets you suppress the rarely needed options will be helpful, but configuring your system with the presentation options that work for you is a challenge. Some user interface technology projects, like Fluid (www.fluidproject.org), include efforts to create information presentation preference profiles that can be created once, perhaps with some help, and then automatically applied to many different web pages or applications.

Rely sparingly on mental models.

This is another area of historical evolution in UI design. A good deal of early work was done on the value of letting users think their way through problems in using a system by reasoning about a mental representation of how the system works. Today this approach is rare, at least as an explicit feature of design, reflecting the unwillingness of most users to invest the work needed to develop a workable mental model.

But too-optimistic assumptions about user understanding still do sometimes arise, especially where the world of the user intersects technical topics, like security, or wireless configuration. Here it is easy for designers to forget that users have little understanding of what is really happening, and offer choices that will mean little to most users. Do you want to accept this security certificate? What is a security certificate, anyway? What does it mean to accept it? What is a DNS?

A well-designed system should not require users to possess special knowledge to successfully use it. Good design, then, means shielding users from things that they don’t understand, or, if that isn’t possible, providing adequate explanations of their choices. We’ve all seen progress on this front where security is concerned; many systems make a serious effort to explain the choices users must make and what they mean in non-technical terms. And wireless configuration is increasingly automated, requiring few if any user choices in many situations.

These considerations are even more important for users with cognitive disabilities. As for other usability problems, an issue that is an inconvenience or annoyance for a typical user can be a showstopper for someone who finds it hard to understand complex situations or to quickly pick up unfamiliar concepts.

Attention to inclusive design is increasing around the world. The recently adopted UN Declaration on the Rights of Persons with Disabilities (www.un.org/disabilities/convention/conventionfull.shtml) marks a new global resolve to provide more support for full participation by people with disabilities, including in the realm of ICT. This resolve includes attention to the needs of people with cognitive disabilities. A few years ago, it was rare for these needs to be considered when accessibility is discussed, but now organizations like WebAIM (www.webaim.org), the Rehabilitation Engineering Research Center on Advancing Cognitive Technologies (www.rerc-act.org), and the Raising the Floor Initiative (raisingthefloor.net) are organizing work aimed directly at the needs of people with cognitive disabilities.

In responding to this challenge of inclusive design, there is a pressing need for more inclusion of people with disabilities, including cognitive disabilities, in the design and development process. Usability professionals should make a point of recruiting people with disabilities for focus groups and for user test panels. Shawn Henry, in her 2007 book Just Ask: Integrating Accessibility throughout Design (online at www.uiaccess.com/accessucd/index.html) has provided an excellent guide to doing this.

Usability professionals can do more than optimize the accessibility of their projects. This article draws on the work of an informal committee, convened in connection with the recent Web Content Accessibility Guidelines revision project of the Web Accessibility Initiative by W3C. Usability professionals can help push this work forward.

In particular, the committee needs examples of websites that demonstrate the full spectrum of inclusive interaction design, from sites that fully support people with disabilities, to sites that show little or no consideration for disabled people. If you want to help further this work, please contact me.

Acknowledgements

Preparation of this article was supported by the Rehabilitation Engineering Research Center for Advancing Cognitive Technologies and the Coleman Institute for Cognitive Disabilities.

Mobile Technology: Design for Social Change

Mobile phone penetration around the world continues to increase at phenomenal rates that are not comparable with any other technology to date. According to a 2010 Information Communication Technology (ICT) report, mobile penetration rates in developed markets have exceeded 116 percent, meaning, on average, every person has at least one mobile phone. In developing markets, mobile penetration has exceeded 67 percent, up from around just 4 percent in the year 2000.

In developed countries, the mobile phone has become an essential component of our everyday lives. We have become so dependent on mobile phones that many of us would find it very difficult to function without one. To fully understand the ubiquity of the device, next time you are standing on a subway platform or walking down a city street, stop and look around. You will likely observe myriad diverse interactions between people and their mobile phones. To further the point, the next time you go out, leave your mobile phone at home. Observe if at any point throughout the day or evening you feel lost without it. Or rather, count the number of times.

The UX Challenge

As designers of mobile products and services, we spend a great deal of time and effort trying to better understand our users, their expectations, behaviors, and experiences—who are the users, where are they using the product, how are they using it, and what are they using it to do? In most cases, we design products that will be used by people who are, in many ways, like us.

However, if we consider standard mobile UI design in a broader and more global context outside of what we find familiar, blurring socio-economic, cultural, and geographic boundaries, is there anyone we have neglected to consider? As designers, are we making assumptions that will inevitably exclude users who are not like us?

Most mobile phone UIs assume a basic level of literacy, an assumption that neglects to take into account a significant portion of the global population. Until recently, little research has been conducted to understand and design for the user experience of individuals who are unable to read or write. According to the United Nations, in 2008 there were 774 million illiterate adults worldwide, most of whom reside in rural regions and urban slums of the world’s poorest and most populous countries.

According to a 2009 U.N. Development Program Report, India is the world’s second most populous country and it is approximately 66 percent literate; approximately 77 percent of men in India are literate, compared with 55 percent of women. Literacy rates are lowest in rural areas where the discrepancy in literacy rates between men and women are even more pronounced.

Despite its relatively low rate of literacy—the world average is approximately 84 percent—India’s mobile phone subscription rates are staggering. Kevin Sullivan of the Washington Post reported that in the year 2000 there were 1.6 million mobile phone subscribers in India. Ten years later that number increased to 125 million, with 6 million new subscribers each month. Further, analysts predict that half of India’s 1.1 billion population will be connected via mobile phones within the next four years.

Due to the high rate of mobile subscription in developing countries, and even higher rate of mobile sharing (representing access to a mobile phone, as opposed to ownership), many poor and illiterate people across the developing world are gaining access to information. These are the people historically excluded from the benefit of ICT, due to issues such as prohibitive cost, lack of fixed line and broadband infrastructure, availability of consistent power supply, and inability to read; they have been the digitally disenfranchised or information impoverished. According to C. K. Prahalad, inability to access information has been a key contributor to perpetuation of global poverty throughout history. Proliferation of mobile phones is changing this pattern by making information available, indiscriminately, to people representing all walks of life.

Migrant workers in urban Indian slums, and people living in rural regions of the country who do not have bank accounts—the unbankedare using mobile phones to pay bills and send remittances, allowing them to bypass traditional, costly transfer services such as Western Union. Indian farmers are using mobile phones to access crop, weather, and pest reports, as well as real-time commodity prices—information that has been shown to help the farmers increase overall profits and decrease loss. Teachers on the subcontinent are using mobile phones to facilitate and support learning activities in poor, rural communities where the prospect of using the mobile phone (to communicate via SMS) motivates children to learn to read and write.

Phones for Healthcare

Mobile phones are also being used to improve health services to the poor. Mobile Health, or mHealth, represents the intersection of mobile technology and public health. Over the past several years, thousands of mHealth initiatives have been launched around the world to assist existing health services that are often failing to meet local need. In some cases, mobile apps are used by health workers to collect, store, and monitor data. In other cases, mobile phones are being used to disseminate health-related information directly to those in need.

In India, one of the most critical gaps in health services for the poor is lack of adequate medical facilities and the absence of qualified doctors. In the public sector there are approximately two doctors for every 10,000 people, which critically fails to meet the need, especially to the rural poor who are last in line when it comes to receiving medical care. While mobile phones cannot produce more doctors, they can be used to provide communication links between qualified doctors in urban hospitals and intermediary health workers, or community health workers (CHWs) in rural villages, presumably increasing overall access to health services and improving the quality and availability of care.

Training, education, and overall effectiveness of CHWs varies greatly from community to community. Most CHWs in India are women, and many are inadequately trained and represent varying levels of literacy. With this in mind, applications and interfaces to be used by CHWs for training, data collection, monitoring, and information retrieval should be designed for the lowest common denominator: low-to-non-literacy. Additionally, mHealth initiatives that involve dissemination of health information directly to those in need should also assume and design for low-to-non-literate users to minimize their exclusion.

Designing Solutions

There are a number of variables to consider when designing mobile UIs and solutions for use in developing markets. To account for low-to-non-literacy, researchers in India have been exploring multi-modal user interface options, such as multi-level, voice-based solutions that can be used in place of traditional text-based interfaces by low-to-non-literate users. As an example, voice-based mobile applications have been developed for use by CHWs in India to assist in diagnosing and treating ailments such as malaria. To use the system, the CHW dials into a voice menu system that prompts her to select, initially, from a broad range of menu items. Through multiple voice input selections, she is able to narrow down to the desired topic until she is given the specific information she is seeking, such as “symptoms of malaria.” This information can then be used to assist the CHW in making a diagnosis.

Research has also been conducted to explore the use of graphic and icon-based UIs, as well as combinations of modalities, such as use of voice and graphics. Pilot studies conducted with low-to-non-literate users have shown that users prefer (and have higher performance results using) voice-based interfaces as opposed to text or graphic interfaces. More research is needed, however, to continue to explore and refine alternative UI solutions for this population, taking into account the many and diverse requirements that make each group of users unique.

Aside from literacy, there are a number of other variables to take into account when designing mobile solutions for use in developing markets. In light of recent market projections, and in recognition of the vast growth potential for mobile and telecommunication companies within these markets, much research is being conducted to understand existing communicative ecologies, specific regional, socio-economic, infrastructure, and environmental challenges and requirements, as well as methods for motivating members of communities within these regions to adopt and utilize ICT.

In order to propose viable, scalable, and sustainable solutions as user experience designers, we must take steps to form an in-depth and multi-dimensional understanding of our target users in the context of their world, culture, and day-to-day experiences. If we focus carefully on the research problems we are trying to solve and adhere to the first rule of user experience design, “Know your users,” and then prepare and conduct our research accordingly, we should be able to achieve scalable and sustainable solutions for our users, whoever they are, wherever they are.与迄今为止的任何其他技术相比,移动技术的使用更快地扩展到全球。世界各地的人们使用移动电话进行大量交互;从与好友和家人沟通到处理繁忙的银行事务,到了解作物及天气预报,再到实时健康和危机监测。由于新兴市场人口在持续增长,并不断采用专门针对西方发达市场而设计的技术,因而需要对传统的西方研究方法和信息通信技术 (ICT) 解决方案进行修订,以满足这些地区用户的独特需求。

文章全文为英文版モバイルテクノロジーの利用はこれまでのどのテクノロジーよりも早く世界中に広まっている。携帯電話の使用により、友人や家族とのコミュニケーションに始まり、外出先からの銀行口座サービスの利用、作物の作柄予想や天気予報へのアクセス、リアルタイムでの健康や危機的な状況の監視まで、世界中の人々が行うおびただしい数のインタラクションが容易になっている。新興成長市場の人口が増え続け、先進欧米市場向けにデザインされたテクノロジーを採用し続けるに従って、新興成長地域のユーザに特有の要件を満たすために、従来の欧米的調査方法やICTソリューションには修正が必要となる。

原文は英語だけになります

Editor’s Note: User Experience Development: What? So What? Now What?

This special issue focusing on user experience strategies, tactics, models, and their relation to rapid software development techniques, owes its existence to several catalytic events. One is the direct influence of Rich Gunther’s workshop about user experience (UX) development methods and metrics which he ran at UPA 2006. He, Randy Sieffert, and I proposed a follow-on workshop about UX strategies, tactics, and methods for UPA 2009, but unfortunately, the workshop was not held. Another catalyst for this issue was a session on the subject of user experience strategies, tactics, and models that I chaired at HCI International 2009. This issue was also prompted by a survey I conducted among top-level managers at enterprise UX groups worldwide and published in the UPA 2009 Proceedings. Finally, Cindy Lu’s workshop on Agile methods at UPA 2009 became another generator of content and a driver for this special issue.

There are many fast-paced changes taking place in our industry today, as user experience professionals are called upon to assist and work with new product and service development processes. Our issue aims to explore some of these interactions and the themes of establishing and maintaining UX development groups.

UX development has grown in the last twenty to thirty years to encompass everything from heuristic evaluations, ethnographic field studies, and rapid prototyping, to numerous variations of user testing. Factors that affect which techniques might be used include the availability of appropriate technology, professionals, users, budgets, and time. The effects of a user-centered design approach are being evaluated by ever-more sophisticated metrics regarding customer satisfaction, end-user engagement, efficiency of development, relation to industry or company benchmarks, and return on investment.

Growing even faster are the number of universities and other organizations offering courses in UX-related topics; the number of professional organizations (for example, UPA, CHI, STC, HFES, AIGA) offering conferences, publications, and events centered around professional training;  and the number of disciplines and people that are involved in the UX of products and services worldwide. Many enterprises have made significant shifts in their allocation of budgets and personnel to support product/service development. Large-scale shifts have occurred: outsized human-factors groups are being re-assigned or re-directed; UX groups are incorporating anthropologists and ethnographers into projects as core team members, not subsidiary contributors; user interface design and interaction design teams are merging and morphing into new UX groups to serve software development groups.

Many enterprises have made significant advances in shifting their attention to UX development. For example, IBM gained significant fame over the last five or ten years for its ease-of-use website. The internal intranet available to IBM staff provides an archive of documents about specifications, processes, resources, and terminology that is formidable in size and scope. Several high-level IBM managers have been appointed to routinely visit centers of UXD in order to assess the quality and consistency of attention to best practices. One of them, Karel Vredenburg, has even published a book about their approach.

Some enterprises experiment with very centralized groups. Others use a more decentralized approach, while a third approach combines the two: a central hub with “spokes” of people sent out from headquarters and paid by headquarters to be “embedded with the troops” for a number of years before the satellite groups become separately funded centers of excellence supported by the local business units. What emerged as useful statistics in the survey are trends like the typical percentage of UX professionals working with software development groups, and the breakdown of time assignments for various phases of UX development.

In light of all this change, it is no surprise that one model becomes desirable, one that can encompass a description of what is possible, what is desirable, and where any individual group sits in an evolving process that might span five years—if not a decade.

Equally challenging is how to align the emerging UX and UCD methods with the rapidly changing scene of software development. Over the last ten years, new approaches to rapid, unified, coherent software development have emerged. Five to ten years ago, everyone was talking about Rational Unified Processes (from IBM) and other similar methods. Now it is specifically Agile methods, which themselves have sprung sects or subgroups of devotees. How UX professionals can best apply their carefully developed professional methods to a new set of tasks, people, processes, and terminology remains to be determined.

In this issue, we enter the maelstrom of these two rapidly developing, forcefully evolving whirlwinds and look for ways in which the energy and passion of both worlds—UX and software development—can be usefully synchronized, for the eventual beneficial result: better products and services, and better user experiences

Redesigning Centrelink Forms: A Case Study of Government Forms

From time to time, organizations need to review all their forms, a task often performed by inexperienced staff with limited resources. This article is based on the author’s recent experience in assisting with such a review for Centrelink, an agency of the Australian Department of Human Services that handles social security, veterans’, and similar types of payments.

Numerous studies of both government and non-government forms have shown typical rates of error by form fillers at 80 to 100 percent. Almost all of this organization’s completed forms had one or more errors in the data collected. Some errors may have been trivial, but they were still errors and were costing millions of dollars per year to correct.

Many of the forms were difficult for people to understand. The problem became so bad that Joe Hockey, Minister for Human Services, demanded that something be done. In 2006, his letter to the head of the agency said:

“Centrelink should strengthen its focus on improving communication with customers…I will continue to take a high level of interest in improvements you make to letters and forms to make them easier to understand and use for customers… In 2004-05, the percentage of customers rating the ease of completing Centrelink’s forms as ‘good’ or ‘very good’ was 58%. I expect you to significantly increase customers’ ease of providing necessary information to Centrelink…

“Centrelink’s culture must be responsive to the needs of citizens and stakeholders, not a culture that is unduly defensive and process orientated. The organization should be willing to question itself and its performance on an ongoing basis

As part of this you should fully explore whether any criticisms and customer concerns are valid and, if so, take appropriate corrective action….
“The challenge for executive management is to recognize potential weaknesses and ensure that arrangements for monitoring, assessment, reporting, and review are sensitive to the actual operating environment. In particular, the arrangements should provide for adequate and early feedback to enable corrective action by management, and there should be clear triggers for oversight and involvement at executive level. This requires the identification of problems and acknowledgement of those problems by senior management.”

Realizing the normal government approach to improving forms takes a long time to come to fruition, the minister commissioned a parliamentary committee headed by Senator Colbeck to oversee the project and to ensure that progress was not hindered by red tape. Having a parliamentary committee conduct oversight on the project at this level ensures that decisions are made at the highest management level.

Professional forms analysts spoke to the committee at its first meeting in July 2006. Many issues familiar to forms analysts were discussed, and I was impressed with committee members’ understanding and their desire to improve the lot of the public in filling out forms.
However, the initial reaction on the part of departmental staff was essentially one of fear and disbelief. They were to be given approximately twelve months to redesign all of the department’s major public-use paper forms and make them available for the public to use. With past lead times extending over vastly longer periods, the general consensus was that it was impossible for the project to be completed on time. There are some forms that couldn’t be completed due to technical difficulties, and the size of the task means that the project is still on-going. The initial work was so successful, however, that the new Labor government is continuing the project using the same approach, and not allowing politics to interfere with public usability.

Common Questions

The work is based on a great deal of scientific research into the way people use forms (see sidebar). The first tasks in 2006 began with a series of workshops on the most problematic forms, during which a lot of repetition became apparent. As a result, a series of common question sets were developed to be applied across the board. For example, most forms include questions asking for details such as name (and whether respondents had used other names), address, date of birth, sex, and contact details. Once the questions were worked out, they could be applied to any form asking for the same information.

The question sets have been designed for easy pasting into any form. Form colors change automatically to match those of each form, and all that needs to be done is to change the question numbers.

As an example, the set of name questions must allow for various cultural issues in today’s Australian society. Figure 1 is an example of how the questions appear in the common question set.

After further review, some sets of questions were amended, but they have remained substantially the same. With the common question sets in place, redesign of many of the forms is faster than originally estimated.

question set
Figure 1. Example of a common question set.

Layout Approach

The designs are based on the half-page column approach introduced in the UK in the early 1980s and subsequently used in many Australian forms. This layout results in a significant reduction in user error and, at the same time, saves a lot of previously wasted space. Figure 2 shows part of a page using the new half-page column design.

But just copying this layout style without understanding the reasons for the various components will invariably lead to trouble. When graphic designers latch onto appearance and copy styles without understanding the reasons for the design elements, the result is forms that often don’t work as required.

online form
Figure 2. Example of a half page column layout.

Signature block

A huge problem for many forms is respondents who don’t sign in the spaces provided. The tested solution was to include a declaration and signature block together as a numbered question, as shown in Figure 3.

screencap of web form
Figure 3. Example of statement and signature block as a numbered question.

Check list

Most forms had a check list at the end, and people ignored it. Again, the solution was to include it as a numbered question, as shown in Figure 4.

web form
Figure 4. Example of check list as a numbered question.

Additional pages

Questions requiring attachments had additional graphics to highlight them, as shown in Figure 5. The use of the paperclip icon was found to be effective in enabling form fillers to go back and locate the question.

web form
Figure 5. Example of use of paperclip icon to link to check list.

Reading path

The old designs sometimes didn’t allow for the way people read a form. Figure 6 shows a typical example. In it, the reader typically goes from the bold question text (ending in “private trust”) to the No/Yes ballot boxes, and misses the instructional text below.

Figure 7 shows a better approach. The addition of the bold header about reading the note achieves close to 100 percent success.

web form
Figure 6. Example of old design that ignores typical form filler reading path.

web form
Figure 7. An improved layout for instructions.

                                                                                                                                                                                                                                                                                                                                     Project Success

Aside from using proven design and language techniques, the greatest factor in the success of the project has been the use of error analysis and usability testing. Both methods help to reduce the political problems associated with changing someone else’s designs—a common forms management problem.

Error analysis

As an important prerequisite for data gathering, error analysis helps to identify potential problems to deal with during redesign and usability testing.

We conducted a number of error analysis studies in which we examined many completed forms to identify where people typically went wrong in filling them out. Then we were able to quantify the real cost of the forms. Most organizations consider only printing and processing costs, but when close to 100 percent of forms have errors, the cost can run into hundreds of thousands of dollars per form and, in at least one case, over a million dollars in one year—that’s the cost just to repair the errors.

These figures are typical of most organizations. We don’t yet have data on the revised government forms, but another example will illustrate the potential savings. We redesigned the application forms for one of our major life insurance companies. Prior to redesign, 100 percent of their forms came in with one or more errors. After usability testing and redesign in the same style as Centrelink, the company reported that the error rate had dropped to five percent, and most errors were minor. Besides reduced errors, form fillers in the testing took only twenty to thirty minutes to complete a 24-page application accurately and without assistance.

Usability testing

The agency previously used focus groups to test forms, but our experience over the past twenty years shows that focus groups are useless for testing usability. They provide some information about people’s opinions, but next to nothing about how a person actually uses a form. Time constraints and form complexity can lead to times when usability testing isn’t feasible, but these are rare and we prefer to test all forms prior to release.

During the Centrelink project, all testing was carried out in a motel room with the observer(s) sitting across the table from the form filler. With two, or occasionally three, observers, there was no indication of hindering the form filler. The agency has recently built a new testing facility with a specialized observation room and video capability. Recent testing still allows the main observer to sit with the respondent and for both to be recorded. The form itself is also recorded, which is very important when testing form filling.

All in all, the redesigned forms are expected to save the Australian government millions of dollars per year by reducing the need to contact customers, correct errors, and deal with bad data.

A summary of Lessons from Forms Research accompanied this article.

[greybox]

Lessons from Forms Research

By Robert Barnett

This summary accompanied the article Redesigning Centrelink Forms: A Case Study of Government Forms
The story of scientific forms research began in the mid-1970s and continued on through the 1980s and 1990s with UK researchers such as Robert Miller, Philip Barnard, Patricia Wright, and Robert Waller; Janice (Ginny) Redish and her team from the Document Design Center in Washington D.C.; David Frohlich; Robyn Penman, David Sless and others from the Communication Research Institute of Australia (CRIA).

We can never rest on our laurels and stop learning about forms. People who claim to know all there is to know about form design are kidding themselves. Methods for examining forms don’t have to stay in universities and research institutes. Any forms analyst can learn them and begin to apply the principles in day-to-day work. The following are lessons we’ve learned from research over the past thirty-plus years.

Patricia Wright and Philip Barnard came to the following conclusions:

  • Questions should deal with one thing at a time.
  • Forms should use familiar words.
  • Designers should consider alternatives to prose, for example: a string of conditions separated by conjunctions might be more easily understood if written as a list.
  • Provide adequate answer space.
  • Type size should be at least 8-point and preferably 10-point.
  • There should be good contrast between printing and background.

In the later 1970s, studies by Wright, Barnard, and Wilcox dealt with the constraints placed on legibility by the use of character separators, often referred to as delimiters. The computer world had introduced the idea of little boxes for each character, and later changed to small tick mark combs on the bottom of boxes to separate the characters. The research showed how the use of such marks slowed reading of the forms during data entry. (This research went hand-in-hand with my own observations that such marks even cause significant errors in reading.)
Patricia Wright’s early research taught us about our poor understanding of basic forms issues. A lengthy article in Visible Language reviewed research investigations into form design and usage up to 1980. One of the most significant conclusions was that, “those who seek simple recipes for designing adequate forms have failed to understand the complexities of the problem.”

In 1982, Grant, Exley, Lonsdale, and Goddard produced a major British government report, “Forms Under Control.” While it didn’t include much in the way of new design knowledge, the study showed the extent to which forms need to be controlled and the problems people faced with government forms. The report came up with sixty-two recommendations on the management of government forms.

In 1984, Robert Waller (now professor of information design in the Department of Typography & Graphic Communication at the University of Reading) reported on the design of a British government form,

“Claim for Supplementary Benefit.” The information from the project provided an excellent starting point for later researchers. Some conclusions would be modified by subsequent studies, but most of the general conclusions still apply to public-use forms:

  • Short documents are not necessarily easier than long ones.
  • All the textual variables in a form interact.
  • Test results require careful interpretation.
  • Where a form is drawn with two-column arrangements, the columns need to be visibly separate with space or a strong rule separating them. If these layouts are used and it’s necessary to include full-width items, these are best placed at the top so as not to interfere with the reading of the other columns.
  • With open-ended questions, the size of an answer space can indicate to the form filler how long the answer should be.
  • When a form contains hierarchical information, this is best indicated by the graphics rather than the text.
  • If color is needed, use mid-tones that are both legible and conspicuous.
  • Small page formats offer less design flexibility.

In 1985, Ginny Redish of the Document Design Center at the American Institutes for Research and Jack Selzer of Pennsylvania State University, published an important article on “The Place of Readability Formulas in Technical Communication.” The report highlighted how inappropriate the use of formulas such as the Flesch Reading Ease Scale and Gunning’s Fog Index are for technical communication.

In 1986, David Frohlich (now director of Digital World Research Centre at the University of Surrey and professor of interaction design) undertook one of the most important pioneering studies in forms research. His conclusions, recorded in the paper “On the Organisation of Form-filling Behaviour,” form the basis of the observational study approach we still use today. He summarizes his findings on the way people use forms as seven question principles:

  1. Linear progression: work through the questions in the order they appear on the form.
  2. Least reading effort: only read what seems to be necessary to maintain form-filling progress.
  3. Question routing: jump directly to a new question if the form tells you to.
  4. Question omission: miss out questions which don’t seem to apply to you.
  5. Question preview: if in doubt about the meaning of a current question, read the subsequent question.
  6. Question review: if in doubt about your interpretation of the previous question, review that question and the answer provided.
  7. Topic scan: if in doubt about the relevance of the current question topic, scan the local topic context.

We consistently find during usability testing that if any of these principles are violated, people tend to make errors in their form filling.

In 1990, the Communication Research Institute of Australia (CRIA) conducted the world’s first study on the way people use life insurance application forms. One hundred percent of the traditional forms produced one or more errors. After redesign and usability testing, error rates were reduced to fifteen percent, and most errors were trivial. The savings in processing time provided the funding to maintain the whole Forms Management Department. Robyn Penman and David Sless from CRIA reported on studies of insurance documents showing that just designing them by following rules of plain English was not good enough.

In 1999, Michael Tyler from Robert Barnett and Associates reported on a series of usability studies on forms produced by different government departments for aged people. The studies showed that many of the lessons learned from research with younger people didn’t apply. For example, simple form-filling processes such as sequential reading of questions, were replaced by random scanning of pages, and answer examples were often misinterpreted as being the only possible alternatives. The studies also highlighted the special needs of aged people.

[/greybox]

Automated Usability Testing: A Case Study

In our user experience team at Fidelity Investments, we’ve conducted over forty unmoderated remote usability tests over the past five years. We use them as an adjunct to traditional lab tests and remote, moderated usability tests. We’ve found that unmoderated remote tests reveal usability variations between different design solutions that typical lab tests generally don’t detect. The advantage of the unmoderated remote tests lies in the sheer number of participants. We usually have at least 500 participants in just a few days when we can use our own employees as participants in these tests, and it’s not uncommon to have over 1,000 participants. When performing evaluations with panels of our customers, we commonly have at least 200 participants in a week. These numbers provide tremendous data. We routinely get statistically significant differences in task completion rates, task times, and subjective ratings when comparing alternative designs. Even what appears to be a minor design difference (e.g., a different phrase to describe a single link on a website) can yield significant differences in usability measures.

A Sample Unmoderated Remote Usability Study

The best way to describe unmoderated remote usability tests is with an example, so I devised a test comparing two Apollo space program websites the official NASA site (Figure 1) and the Wikipedia site (Figure 2)

NASA homepage
Figure 1. Apollo program home page on NASA.
Wikipedia page on Apollo program
Figure 2. Apollo program home page on Wikipedia.

Participants in the study were randomly assigned to use only one of these sites. Most of the unmoderated remote studies I’ve conducted are this “between-subjects” design, where each participant uses only one of the alternatives being tested.

The next step was to develop tasks for the participants. I developed a set of candidate tasks before studying either site based on my own knowledge of the Apollo program. I then eliminated any tasks that I couldn’t find the answer to on both sites. That left nine tasks:

  1. How many legs did the Lunar Module (lander) have?
  2. Which Apollo mission brought back pieces of the Surveyor 3 spacecraft that had landed on the moon two years earlier?
  3. The famous photo called Earthrise, showing the Earth rising over the Moon, was taken on which Apollo mission?
  4. Which manned Apollo mission was struck by lightning shortly after launch?
  5. Who was the Command Module pilot for Apollo 14?
  6. Who were the last two men to walk on the moon?
  7. Which Apollo mission brought back the so-called Genesis Rock?
  8. What was the name of the Apollo 12 Lunar Module?
  9. Which area of the moon did Apollo 14 explore?

The best tasks have clearly defined correct answers. In this study, the participants chose the answer to each question from a dropdown list. We’ve also used free-form text entry for answers, but the results are more challenging to analyze.

We design most of our unmoderated remote usability studies so that most participants can complete them in under thirty minutes. One way to keep the time down is to randomly select a smaller number of tasks from the full set. Across many participants, this gives us good task coverage while minimizing each participant’s time. We gave each participant four randomly selected tasks out of the full set of nine, presented in a random order to minimize order effects.

When a potential participant went to the starting page (http://www.webusabilitystudy.com/Apollo/), an overview of the study displayed. When the user clicked “Next,” a set of instructions was shown. As explained in those instructions, when the user clicked “Begin Study,” two windows opened, filling the screen (Figure 3)

Apollo program homepage with usability study instructions at top of screen.
Figure 3. Screen and window configuration for an unmoderated remote usability study.

The small window at the top presents the tasks to perform; the larger window presents one of the two sites being evaluated. The users were free to use any of the features of the site; however, they were instructed not to use any other sites to find the answers (e.g., Google).

Each task included a dropdown list of possible answers, including “None of the above” and “Give Up.” Three to six other options were listed, one of which was the correct answer to the question. We required the user to select an answer (which could be “Give Up”) to continue to the next task. The participant was also asked to rate the task on a 5-point scale ranging from “Very Difficult” to “Very Easy.” We automatically recorded the time required to select an answer for each task, as well as the answer given.

After attempting all four tasks, we asked the participant to rate the site on two seven-point scales, each of which had an associated comment field:

  1. Overall, how easy or difficult was it to find the information you were looking for?
  2. Overall, how visually appealing do you think this website is?

We vary these rating scales from one study to another depending on the sites being tested and the study goals. We followed with two open-ended questions about any aspects of the website they found particularly challenging or frustrating, and any they thought were particularly effective or intuitive. We use these questions in most of our usability studies.

We also modified the System Usability Scale (SUS) to help evaluate websites. The original version of SUS was developed by John Brooke while working at Digital Equipment Corporation in 1986. We instructed participants to select the response that best describes their overall reactions to the website using each of ten rating scales (e.g., “I found this website unnecessarily complex,” or “I felt very confident using this website.”) Each statement was presented along with a 5-point scale of “Strongly Disagree” to “Strongly Agree”; half of the statements were positive and half negative.

Results of the Sample Study

The main purpose of the study was to illustrate the testing technique, not to seriously evaluate these particular sites. We posted a link to the unmoderated remote study on several usability-related email lists, and collected data from March 11 – 20, 2008. Many of the participants in the study work in the usability field or a related field, so they can’t be considered a random sample.

A total of 192 people began the study and 130 (68 percent) completed the tasks in some manner. Undoubtedly, some people simply wanted to see what the online study looked like and were not really interested in taking it.

One of the challenges with unmoderated remote studies is identifying participants who are not performing the tasks but simply clicking through them, answering randomly or choosing “Give Up.” They might be interested in the tasks or want to enter the drawing. In studies like this, about 10 percent of the participants usually fall into this category.

To identify these participants, I first completed all nine of the tasks myself several times using both sites, having first studied the sites to find exactly where the answers were. The best time I was able to achieve was an average of thirty seconds per task. I then eliminated thirteen (10 percent) participants who had an average time per task less than thirty seconds, bringing the total number of participants was 117. Of those, fifty-six used the NASA site and sixty-one used the Wikipedia site.

The basic findings of the study were that users of the Wikipedia site:

  • Got significantly more tasks correct than did users of the NASA site (71  percent vs. 58  percent, p=.03 by t-test).
  • Were marginally faster than users of the NASA site in doing their tasks (1.8 vs. 2.2 minutes per task, or about 23 seconds shorter, p=.07).
  • Rated the tasks as significantly easier on a 5-point scale than did users of the NASA site (3.1 vs. 2.6, p<.01).

One way to see an overall view of the task data for each site is to convert the accuracy, time, and rating data to percentages and then average those together. This provides an “overall usability score” for each task that gives equal weight to speed, accuracy, and task ease rating (Figure 4). With this score, if a given task had perfect accuracy, the fastest time, and a perfect rating of task ease, it would get an overall score of 100 percent. These results clearly show that Tasks 3 and 7 were the easiest, especially for the Wikipedia site, and Tasks 4 and 8 were among the most difficult.

bar graph
Figure 4. Average usability scores for each task and site, with equal weighting for accuracy, speed, and task ease.

After attempting their four tasks, the participants were asked to rate the site they had just used on two scales: Ease of Finding Information and Visual Appeal. The Wikipedia site received a significantly better rating for Ease of Finding Information (p<.01), while the NASA site received a marginally better rating for Visual Appeal (p=.06).

The final part of the study was the System Usability Scale (SUS), which consists of ten rating scales. A single SUS score was calculated for each participant by combining the ratings on the ten scales such that the best possible score is 100 and the worst is 0. Think of the SUS score as a percentage of the maximum possible score. The Wikipedia site received a significantly better SUS rating than the NASA site (64 vs. 40, p<.00001).

What about Usability Issues?

The study yielded a rich set of verbatim comments. The NASA site received 132 individual comments from the various open-ended questions while the Wikipedia site received 135. Some of these comments were distinctly negative (e.g., for the NASA site: “Search on this site is next to useless”) while others were quite positive (e.g., for the Wikipedia site: “The outlines for the pages were helpful in locating a specific section of the site to find the desired information.”)

The performance data, subjective ratings, and verbatim comments can be used to help identify usability issues within the test site. Verbatim comments often provide clues indicating why tasks for a given site yield particularly low success rates, long task times, or poor task ratings.

Strengths and Weaknesses

The primary strength of an unmoderated remote usability study is the potential for collecting data from a large number of participants in a short period of time. Since they all participate “in parallel” on the web, the number of participants is mainly limited by your resourcefulness in recruiting. Larger numbers provide additional advantages:

  • Unlike traditional moderated usability testing, there’s no significant increase in costs or resources with each participant.
  • The larger number of participants allows you to test a better cross-section of representative users, especially when the user base is large and diverse.
  • Since users participate from their own locations using their own systems, you potentially have more diverse environments (e.g., screen resolutions, monitor sizes, browsers, etc.)
  • Because of the larger sample sizes, you can potentially detect differences in usability metrics (task success rates, times, ratings, etc.) that you normally can’t detect in moderated tests.

Unmoderated remote usability studies are especially good at enabling comparisons between alternative designs. We’ve performed these studies where we simultaneously compared up to ten different designs. In just a few days we were able to test these designs with a large number of users and quickly identify the most promising designs.

Unmoderated remote usability studies aren’t always appropriate and some of the limitations of the technique follow:

  • The prototypes or designs to be tested must support the tasks at some level. Users need to be able to reasonably decide whether they have completed the task.
  • The prototypes need to be reasonably stable. Since the users are working on their own without a moderator to intervene if things go wrong, you don’t want any major surprises.

You need to be able to develop tasks that have relatively well defined end-states. Tasks like “find explicit information about this” work well.

Early exploratory studies, where you want to have an ongoing dialog with the participants about what they’re doing, are obviously not well suited to an unmoderated remote approach.

Unmoderated remote usability tests will never completely replace traditional moderated usability tests. A moderated test, with direct observation and the potential for interaction with the participant if needed, provides a much richer set of qualitative data from each session. But an unmoderated remote test can provide a surprisingly powerful set of data from a large number of users that often compensates for the lack of direct observation and interaction.

Designing Credible Studies: A Research Framework

Generally speaking, nobody wants to be an enabler of bad behavior.

To enable bad behavior means we are consciously aware of the reality that surrounds us and the impact that the negative has, but be unconsciously accepting of that reality as well. But why are we talking about bad behaviors? Ironically, we as researchers allow bad behaviors to dictate our work all the time. As industry-embedded researchers, we are often compelled to make compromises between decisions that affect the credibility of a study and those that feel best for the business. A common example is when researchers are tasked with designing and executing studies dictated by other processes, such as product development. This often leads to pinched timelines and constrained budgets, which are unreasonable for the due diligence required of credible research processes. In doing this, these two considerations become at odds, viewing them as almost mutually exclusive. This false division can force us to perpetuate bad research behaviors, whether they are workarounds, compensations, or worst practices.

Researchers often lack the words and constructs that help describe how powerful credible research can be for business goals. We have nothing that addresses the mutually exclusive perspective on business and research. There is no obvious, seamless way to bridge the gap between them. Through many experiences in which these gaps were prevalent, twig+fish crafted a framework that not only serves as an effective research-mapping tool but also focuses on the clarity of research value and what it means to a business.

The False Division: A Contributor to Bad Behaviors

Before we get into the framework and how it works, what are some of these observed bad behaviors we speak of? As on-the-ground practitioners and strategists, we are able to identify the source of various bad behaviors and what they look like in practice.

Our field is refreshingly multi-disciplinary, and we pride ourselves in being rooted first in our empathic abilities. Researchers in our field may understand the theoretical foundation of practices and have the ability to plan and execute a study within their business, but may lack the strategic knowledge to envision the long-term potential of research for a business. We have many practitioners in our field who are in essence doing research on auto-pilot—which can leave our industry vulnerable.

In studying people, there are many nuanced variables that influence data, most strikingly, the researcher’s ability to address business goals while remaining credible to qualitative research constructs. Human-centered researchers look for emotions, attitudes, aptitudes, and behaviors.

We are usually looking at these data points for the purpose of some particular end. As an interpretive science, human-centered research relies heavily on the researcher’s abilities to reveal and describe social and psychological phenomena we all take for granted. While many other domains like finance, design, and engineering have clear indicators of skill and credibility, interpretive work has fewer accountable measures of success.

The framework isolates the researcher as a variable impacting the study and makes a case for how the false division can be addressed. Without this framework we can only rely on business constructs (such as time and money) that so often dictate how we do research and determine our metrics of success.

The Bad Research Behaviors

We repeatedly see evidence of bad research behaviors resulting from an imbalanced emphasis on business goals at the expense of research practice. We are guilty of these behaviors, too!

  • We lead with method. Familiar methods garner stakeholder support because they offer a predictable process; sometimes we select a method before considering other study variables.
  • We believe we can ask people anything. Opportunities to meet with people can be rare and often opens the floodgates to a disorganized, biased, or haphazard set of questions.
  • We fail to align stakeholders on the research objective. Without a clear set of research goals that everyone is aligned on, we lead ourselves into the familiar trap of addressing research goals that service no one.
  • We expect research to always have a direct return on investment (ROI). An expectation of ROI is sometimes not a viable output of research; sometimes there is a greater benefit of research that is not quantifiable, that impacts the business in an intangible way.
  • We understand people based on their relationship with our offering. Existing or potential customers are the obvious subjects of research, which means we lack exposure to a deeper understanding of all people, or human behavior, generally.
  • We allow external variables to dictate our study designs. Whether it is repeated removal of research processes due to lack of time or end-of-year budgets, we relinquish good decision-making in favor of other processes (like design or development).
  • We expect research to answer everything. We rely heavily on research to answer questions that might have other or better answer sources.
  • We do not advocate for good research. Even when we become aware of bad behaviors, we allow them to continue in our businesses, sometimes creating workarounds that get the work done but at the expense of credibility.

These are just a few examples of realities that might exist within research and how it is juxtaposed in the business world. Though some businesses avoid these pitfalls and contribute to an industry-wide good, the reality is that many do not. We have seen both the good and bad behaviors, but no infrastructure to avoid succumbing to the pitfalls. The need for a framework focused on designing credible research studies, and also one that reveals the total landscape of research’s power potential, quickly became evident.

The Framework

The framework exposes tendencies of cross-functional teams that can amplify the divide between research and business. The first: What will the research do for the business? The second: What are considerations feeding into the study?

Research in industry rarely operates in a vacuum; it is always part of an agenda. As such, research output must be serviceable or actionable to the business. We consider the outcomes of research in terms of helping others do work: Will it be used to inspire creative, solution-oriented teams, or will it be used to inform tactical decisions.

Example:

INSPIRE – An e-commerce website design team needs to understand why people buy things (anything).

INFORM – Determining if the “Add to Cart” button on the redesigned website is well positioned.

Research in industry also has a number of disparate inputs and realities. The level of wiggle room of these inputs is another consideration in aligning business and research goals. If the inputs are more flexible, then fewer assumptions and constraints are introduced into the study. When more assumptions and constraints must remain true, then inputs are more fixed.

Example:

FLEXIBLE – We want to extend our offerings beyond our core product to something else.

FIXED – We know that we are developing a new website.

These four end-points form the framework in a simple two-by-two (see Figure 1).

Horizontal endpoints: Inspire (learn from people) to Inform (learn about offering). Vertical endpoints: Fixed (focused end) to Flexible (broad start).
Figure 1. The NCredible Framework known inputs and study outcomes. (Credit: twig+fish research practice)

Once stakeholders are aligned on the definitions of each spectrum, we then ask them to write down any and all research questions. We keep this request open so that they can capture exactly how they might ask the question without any influence. Each question is written on a single Post-It and then placed on the map. We work with the stakeholders to place their questions so we can begin a conversation about where they fit. This initial conversation is another key step in alignment.

What The Framework Reveals

Each quadrant has a research process associated with it. Different patterns emerge in this initial placement of questions. A typical result has questions all over the place, sometimes signaling misalignment. Another result is that all questions fall on the Inform side, which is also typical for businesses where human-centered research is embedded in a constant state of tactically answering questions with tight product scope. Questions are rarely in one quadrant—that signals alignment or a very closed perspective on the potential of research. Even more rarely do we see all questions on the Inspire side.

Taking a moment to reflect on question placement helps begin a discussion around the business tendencies, priorities, and expectations of research. The discussion is incredibly important in alignment. From there, we discuss each quadrant’s purpose.

Quadrant Descriptions

Quadrant Representation Process Example
Discovery Research

 

Inspire/Flexible

 

 

Gathering stories for deeper understanding to identify promising focus areas or opportunity spaces.

 

What is it like to live with diabetes?

 

 

Exploratory Research

 

Inspire/Fixed Honing in on a focus area to bring further description or clarity to the experience or lived reality. What are the meaningful characteristics of an ideal diabetes management tool?

 

Definition Research Inform/Flexible

 

Expanding on experiences and lived realities to reveal solution themes, concepts, or ideas. What are different ways to create a better, more meaningful diabetes management tool?

 

Validation Research

 

Inform/Fixed

 

Vetting the solution themes, concepts, or ideas to make improvements and measure changes.

 

What works and what does not work in our diabetes management smartphone app?

 

Horizontal endpoints: Inspire (learn from people) to Inform (learn about offering). Vertical endpoints: Fixed (focused end) to Flexible (broad start). Each quadrant represents a scope of research starting from the bottom left quadrant going to the top left, then to the bottom right and then upper right: Discovery research, exploratory research, definition research, and validation research. This forms an uppercase letter “N.”
Figure 2. The NCredible Framework with the four scopes of research. (Credit: twig+fish research practice)

What you might notice in Figure 2 is that there is a natural progression between quadrants beginning with Discovery and ending with Validation. Our claim is that following the path (the letter “N”) ensures that the full potential of research can be intentionally leveraged. We do all of this by staying true to the scope of each quadrant and its purpose. In doing so, we bolster our credibility as researchers while at the same time demonstrating alignment with business goals. Even more important is the team involvement in sharing all their questions and reflecting on what types of questions they tend to ask as a team and as an organization. The framework serves to create a research roadmap as well as a reflection tool on business research realities. Capturing any group reflections and assumptions helps reveal tendencies the internal team might have that can influence research practices.

How This Framework Can Address Bad Behaviors

As researchers, our goal is to focus on the human element and how it is positioned in any solution. We were able to raise a number of bad behaviors earlier that might resonate, and using this framework, can address process gaps that lead to bad behaviors.

With the framework applied:

  • Method is not the driver anymore. Specific methods flourish in specific quadrants. After all, what is a focus group? What is an interview? The methods themselves cause confusion since there is no one “clear” interpretation for what we call a method.
  • Questions are now organized into the quadrants. This leads to a more strategic approach to study design, one that maps to a larger story arc of research’s power potential. Research studies that cover more than one quadrant of questions are done so with intent and by design. The scope of the project and its outputs are clearly defined.
  • Hidden agendas are exposed by copious documentation of all questions; interpretations and misalignments cease to exist in someone’s head because they must be written down. Decisions are public and everyone has to align on semantics.
  • There are times to ask people for nuanced realities and there are times to ask them if something just works or not. Those lines are drawn clearly and with intention, avoiding confusion.
  • The lack of a certain type of research (and therefore, set of questions) is revealed, exposing weaknesses in business processes and gaps that can be addressed.
  • There is a greater understanding and appreciation for when customer insight is needed and when there is a need to be inspired by greater human behavior.
  • The output of each quadrant is no longer a surprise and can be tempered appropriately to expose the value of the research and the researcher.
  • Advocating for research and staying organized becomes attainable because there are implementation rules around each quadrant that are visibly understood by all team members

Each study is an opportunity to further support and communicate the power and potential of human-centered research. An added benefit is the recognition of how questions must be asked in order to yield the required output. Open discussion of this results in more empathy toward the team actually conducting the research. Contributors can appropriately home in on the intent of their questions without misinterpretation of what the question yields.

This framework is one of many tools that can be used to demonstrate the strategic contribution of our domain, while enabling researchers to design credible studies.