A review of
Practical Speech User Interface Design
By James R. Lewis
CRC Press, 2010
James R. Lewis’ Practical Speech User Interface Design is comprehensive, accessible, practical, and fascinating. As an IBM human factors engineer for some thirty years, Lewis brings a depth of practical experience to bear in this book. He has also added the breadth of the current state of research in the field.
As he states in the closing pages, some of the research he reviewed “confirmed my current design practices, but more importantly, other research has led me to make some changes in my design strategies.” It is this combination of openness and expertise that makes the book such an asset for anyone interested in speech user interfaces—and what UX practitioner isn’t? I’ve had the pleasure of working on a small number of speech systems, and only regret that I didn’t have this book prior to embarking on them.
Chapter 2 introduces speech technologies. Lewis describes two types of language models in current use—finite state grammars (in which all legal words and phrases are fully specified) and statistical language models (in which users may speak the words in any order). This chapter also describes methods of speech production and discusses formant text-to-speech (think Stephen Hawking) and concatenative text-to-speech. While intelligibility of synthetic voices was once problematic, current systems are generally easy to understand. However, the production of “convincingly natural and appealing” synthetic voices has been a challenge, and Lewis points out that businesses are reluctant to risk their brand images. Accordingly, most designs use recorded speech.
Lewis also discusses the use of speech biometrics (voiceprinting). However, he suggests that current accuracy means that it can only be used in low-security applications or when combined with other verification methods.
Chapter 3 is a fascinating discussion of human speech and its implications for design. It includes material on phonology (the study of the basic sounds of language), coarticulation (the phenomenon whereby we run words into each other), and prosody (intonational patterns). A section on discourse considers the patterns involved in our everyday conversations, and includes material on grammaticality, discourse markers (which signal conversational intents), timing and turn taking, and social considerations in conversation. Here, as throughout the book, the material is clearly presented, supported by critically considered research, and focused on being of practical use to the user interface designer.
This focus on the practical also means that the author largely avoids prescriptive statements; rather, he recognizes that all decisions must be subject to the constraints faced by the business and designer. Where prescriptiveness is possible, however, Lewis provides unambiguous advice by clearly identifying appropriate durations of wait times, silences, and pauses.
Chapter 4 considers self-service technologies, their advantages and disadvantages, and the propensity or willingness of people to use them and the factors affecting this willingness.
In the meaty Chapter 6, “Speech User Interface Development Methodology,” the steps listed are largely familiar (requirements, design, develop, test, deploy, tune). While user needs analysis is dear to the heart of any UX practitioner, Lewis describes the specifics of doing the analysis for speech. For example, he suggests that (in a project lacking countermanding data), “it is reasonable to design in accordance with the capabilities of older adults.”
Lewis covers creating detailed dialog specifications, prototyping, development, and testing. This chapter will serve a UX practitioner transitioning from GUI to SUI design particularly well, as it enables the application of existing knowledge to the new domain.
There is an interesting discussion on the use of personas. Lewis states that while personas can be useful, they should not consume a large part of the design effort, which will be better rewarded by application elsewhere.
Chapter 8 gets into the gritty detail of how to script introductions, whether to tell people to “listen carefully” because of changed menus (don’t), how to provide help, and a discussion of appropriate menu lengths. Lewis states that the common advice to limit menu length is a mistake (a misguideline to use his term), and that broad menus are more effective than shorter menus requiring greater depth. This might be seen as controversial (my eyebrows certainly went up), but, as always, the author quotes extensively and critically from the available research and his own work.
Each chapter in the book concludes with a summary, and a reader who does not need the details can simply read the summary to gain a fair grasp of the content of the earlier pages.
If you have even the vaguest interest in design of speech user interfaces, and whether you’re a student or a seasoned practitioner, read this book.