Listen Up: Do Voice Recognition Systems Help Drivers Focus on the Road

Posted on: 12:10 PM by Rachel Goddard

Over the past few years, auto manufacturers have created infotainment systems, integrating control of multiple devices (including cellular phone, climate control, audio, navigation, and other media) into a single user interface. However, most of these infotainment systems (BMW iDrive, Audi MMI, Mercedes-Benz Command) still require manual input and visual attention. Microsoft and Ford have collaborated on a new infotainment device, Sync, which allows drivers to interact with mobile devices using voice commands. This feature allows drivers to keep their eyes on the road and hands on the wheel.

Sync makes infotainment technology available to a broad market base, with Sync as a low-cost option on a wide range of models (including the entry-level Ford Focus). Ford estimates that by 2009 over one million units will have been sold. If successful, Sync may soon be the most influential infotainment technology on the market. The expected widespread distribution brings up some questions about the usability and impact on workload that infotainment devices and voice activation will have on drivers.

Sync’s voice recognition technology is its primary strength over previous infotainment systems because it virtually eliminates scrolling and searching through a visual interface. This strength, however, may actually be a weakness if it does not provide the driver an easy and efficient method to complete tasks. For example, speaking voice commands instead of pressing buttons may distract drivers less, but if making a cell phone call using voice commands takes twice as long as pressing buttons, then a driver will be distracted for a longer period of time resulting in greater risk to driver safety.

Methods

We performed a timeline analysis to see if completing two common in-vehicle tasks—calling a contact using a cell phone and playing a track using an iPod—were completed faster using Sync, compared to using the native device interfaces. We also compared the time it took to complete different portions of each task, because Sync’s voice recognition technology may facilitate some portions of the task but be inefficient during others. For example, to play a song on an iPod using Sync, a driver can simply say the title of the song. Sync’s voice commands eliminate searching and scrolling through a long list of songs, but after each command, users have to wait for Sync to respond and prompt for the next command, which may negate any time savings. Decomposing total task time allowed us to identify both the strengths and weaknesses of Sync’s voice recognition technology compared to manual interactions.

In addition to assessing the performance of Sync’s voice recognition technology, we were interested in determining how this technology may impact usability for a broad driver population. To evaluate Sync’s usability we decided to explore potential errors and outcomes that drivers may encounter using Sync.

We performed a Failure Modes and Effects Analysis (FMEA) to identify all possible errors for each task step, along with the source, likelihood, probable outcome, and severity of each error. From this information we were able to identify the types of errors that are most likely to occur and the mechanism that would produce each error. Identifying the underlying source of the errors allowed us to evaluate whether proper counter measures are in place to minimize the severity and aid in recovery. Error recovery features not only reduce the negative impact on the overall user experience, they also help to minimize the error’s impact on driver distraction and safety.

Analysis and Findings

Like other infotainment systems, Sync has a broad range of capabilities and supports a number of plausible in-vehicle tasks. Since we were not interested in all these functions, but rather general usability, we decided to evaluate Sync on two common tasks. We observed and recorded two drivers familiar with Sync as they called a contact on a cell phone and played a song from an iPod in a stationary Ford automobile. Then we compared task performance using Sync with the performance of two other testers performing the same tasks using a cell phone and an iPod. The video recordings were coded and time-stamped and used for the time-line analysis and FMEA.

Timeline Analysis

The timeline analysis allowed us to evaluate the relative efficiency of Sync interactions compared to manual interactions. Surprisingly, we found that, on average, tasks were completed faster using the iPod and cell phone than when using Sync. Users took one second longer to play a music track and fifteen seconds longer to call a contact using Sync. Users did not spend any time scrolling or searching through device menus using Sync’s voice commands and still completed the tasks more slowly.
Based on results from a hierarchical task analysis, we decomposed total task time into four sub-task goals:

  • Initialization – Any time users spent turning on or activating a device
  • Menu access – Time users spent navigating to or scrolling within device menus
  • Goal execution – Selecting desired function (pressing Send to place a call, locating a song) or issuing a voice command
  • System response – Time users had to wait for the device before continuing the task

After breaking total task time into these four categories, the source of Sync’s slow performance was clear: the majority of user interaction with Sync was spent waiting to proceed (i.e. system response; see Figure 1). In fact, if users had not had to wait for Sync, average task time with Sync would decrease by sixteen seconds, a 60 percent time saving.

The excessive waiting resulted from the turn-taking structure for completing tasks with Sync. After each voice command, Sync repeats the command to confirm it and then gives the user a prompt for the next command. This process lasts only three or four seconds, but as users navigate through multiple menus to complete the task, this delay accumulates and task efficiency diminishes.

Time-line analysis also highlights how the task structure differs when completing tasks with Sync and with direct interaction. When users worked directly with the iPod or cell phone, they performed each task in a single continuous interaction. Completing tasks with Sync, however, required alternate chunks of user command and Sync response. This turn-taking is a natural byproduct of user interaction via voice commands, and, in essence, users engage in a conversation with Sync.

Though Sync’s voice recognition system increased overall task time, Sync is not necessarily more distracting or harmful to driver safety. In fact, the system response delays may actually help drivers focus on the driving task. Consider a driver who is selecting a new song using an iPod while driving. Because the iPod allows continuous interaction, the driver must divert attention away from the road for an extended period of time. Continuous interaction and attention is not required with Sync; drivers have opportunities to focus all their attention on the road while completing distracting tasks. Interacting with mobile devices using Sync may be less distracting and may reduce the impact on driver safety.

Failure Modes and Effects Analysis

FMEA gave us a chance to identify Sync’s limitations by looking at the source, outcome, and probability of errors that users may encounter. All possible errors in both tasks were evaluated in the FMEA.

As an example, we discuss error scenarios during the goal execution stage (for example, “Call home”). On average, nine distinct errors could occur during the goal execution stage in our tasks. Nearly 75 percent of these errors resulted in task failure, requiring the user to restart. Most of the errors leading to task failure resulted from inaccurate processing of voice commands, incompatibilities between Sync and the paired mobile device, and general errors related to Sync’s voice recognition technology.

Errors caused by Sync’s voice recognition technology do not necessarily condemn a task to fail, but we found that Sync seemed to lack quick and easy methods for error recovery. As a result, voice recognition technology errors were more likely to result in task failure. For example, if drivers want to play a song by Bon Jovi, then during the goal execution stage they would say, “Play artist Bon Jovi.” Sync would then process the command and respond with the following, “Playing artist Bon Jovi.” But what does the user do if Sync responds incorrectly with, “Playing artist Billy Joel?” Currently, there is no way to interrupt and correct Sync or revise an incorrect command to fix the error. The only option is to repeat the task, prolonging the overall time the driver is distracted. The FMEA confirmed that overall user experience and driver safety are highly dependent upon the performance of Sync’s voice recognition technology.

Recommendations and Conclusions

Our evaluation of Sync supports voice recognition technology as an improvement over manual interactions with devices. Though tasks took longer to complete using Sync, the turn-taking conversation with Sync is more compatible with driving and provides more opportunities for the user to interleave driving with secondary tasks. Sync’s voice recognition technology is not without its flaws and would benefit from a broader menu structure that allows users to bypass menu levels or move directly to the desired function, especially at the higher levels of the menu hierarchy. On the other hand, this new feature could make Sync more complex because it expands the number of available voice commands at each menu level. This cost could be outweighed, though, by the benefit of reducing system response time and overall task time.

Based upon the FMEA findings, one feature missing from Sync was a quick and easy method to correct errors. Incorporating easy steps for error recovery is essential since Sync’s voice recognition technology will never perform flawlessly. A simple voice command—Back—might meet this need. If users make a mistake or if a voice recognition error occurs, users could say “Back” at any time to move to the prior task step. The “Back” command may not be the best or only option, but the spirit of this recommendation—an easy and readily available method for repeating a previous or current step in order to correct an error—would be a welcome improvement over the current requirement to restart the entire task.
In this article we set out to examine if Sync’s voice recognition technology improved upon previous manually based interactions. We also wanted to explore Sync’s potential impact on drivers by considering its usability. In general, Sync and voice recognition technology improve upon manual interactions by interleaving sub tasks with the driving task more efficiently than manual interactions and may help to reduce driver distraction. Though current voice recognition technology is not perfect, simple design features can be incorporated to mitigate the severity of errors and enhance the user experience.

Drivers everywhere should be excited about the widespread availability of Sync and voice recognition technology in their vehicles. Not only will they be able to show off new infotainment technology, but most importantly, they will be able to interact with all of their mobile devices in a safer, less distracting manner.

Timeline Analysis

To accomplish tasks using Sync, drivers navigate through a series of menus until they reach the desired function. Sync’s menus are arranged from general media devices (for example, audio system, phone, or radio) to more specific device functions (for example, “USB” or “Dial”) as users move deeper into the menu structure. The advantage of a hierarchical menu structure is that the number of choices at each level is reduced. The disadvantage is that the menu structure is deeper and users must navigate through more menu levels. Navigating through additional menu levels can be slow and cumbersome, and in Sync’s case, results in longer task time due to system response time after each voice command.

Instead of a hierarchical menu structure, Sync could adopt a broad menu structure at the highest levels that would allow users to bypass menu levels and go directly to the chosen device or function. A broad menu structure reduces the total number of voice commands and increases system efficiency. Here is an example of Sync voice commands that demonstrates how a broad menu structure can increase task efficiency over the hierarchical menu structure.

listen

Compared to Sync’s current hierarchical menu structure, the broad menu structure results in two fewer voice commands. Considering the three- or four-second system response time after each voice command, the broad menu structure would result in time saving of up to eight seconds. Incorporating a broader menu structure with Sync would reduce system response time, overall task time, and the potential impact of Sync on driver safety.