A review of
Voice Content and Usability
Jeffrey Zeldman
139 pages, 6 chapters
About this book
by Katie Swindler
A good reference for Methods/How-To and Case Studies
Primary audience: Researchers, designers, and those in technical roles who are new or have some experience with the topic
Writing style: Academic, Matter-of-fact
Text density: Mostly text
Voice Content and Usability, by Preston So, is a guidebook that explores the innovative realm of voice content design, celebrating its potential to revolutionize user interactions in the digital age. So writes, “Though voice interfaces have long been integral to the imagined future of humanity in science fiction, only recently have those lofty visions become fully realized in genuine voice interfaces” (2021, 8). So guides the reader through the history of the domain, provides a practical step-by-step process using a real-world example, and concludes with what the future holds for this development space. So equips readers to enter this new frontier confidently.
The book’s six chapters are well-structured and accessible to readers of all levels. Each chapter offers clear definitions, real-world examples, and practical illustrations. The information is easy to understand and apply.
The first chapter, “Conversations with Computers,” introduces the reader to voice interface design and the natural differences that exist between spoken and written interfaces. So explains how our uniquely curious and bewildering gift of speech is primordial. Spoken language, when we include face-to-face interaction, is nuanced and complex. This complexity is because of its vast ability to communicate with nonverbal and verbal cues and behaviors that the written language cannot attain. As So writes, “…our spoken language conveys much more than the written word could ever muster” (2021, 4).
The history of voice interface systems begins with outsourcing our beautiful human gift of spoken communication to computers. Computers naturally align with written communication. This tension or conflict opens an opportunity for UX designers to rethink best practices in designing voice interactions.
The journey of voice interface systems began with the emergence of text-to-speech (TTS) dictation programs and speech-enabled in-car systems in the early 1990s. These innovations led to the development of interactive voice response (IVR) systems, which provided the first true voice interfaces capable of engaging in conversational exchanges. IVR systems were designed to alleviate the burden on customer service representatives. The IVR systems quickly became widespread despite their clunky and often frustrating user experience.
Parallel to IVR development, screen readers began transcribing visual content into speech. The demand for more accessible tools surged with the growth of the web, leading to advancements such as semantic HTML and ARIA roles to enhance screen reader usability. Although it’s helpful for voice interface designers, screen reader technology is difficult to use, and the content is verbose.
The introduction of modern voice assistants, starting with Apple® Siri® in 2011, marked a significant evolution in voice interfaces. The technology offered more streamlined, assistive, and customizable user experiences. These voice assistants have since become normalized to everyday tasks. Examples include Microsoft™ Cortana™, Apple Siri, Amazon™ Alexa™, and Google™ Home™. Foreshadowing the next chapters, So concludes this history with, “As corporations like Amazon, Apple, Microsoft, and Google continue to stake their territory, they’re also selling and open-sourcing an unprecedented array of tools and frameworks for designers and developers that aim to make building voice interfaces as easy as possible, even without code” (2021, 12).
The goal of a voice interaction is to help a user reach a desired goal. To achieve that goal, voice interfaces follow three paths: transactional (a service provided or a product supplied), informational (quests for information), or prosocial (our human need to connect). So explains that although there are three types of communication needs, there are two genres that voice interfaces can easily accomplish: transactional voice interaction (“I need to order a large pizza.”) and informational voice interaction (“Which pizzeria in town has the best reviews for their pizza?”).
Chapters 2-5 detail So’s involvement in building an experimental, informational voice interface to answer Georgian residents’ questions about state government. These chapters detail helpful information about the project’s processes, setbacks, and successes. For example, So explains that when you begin a voice interface project, you must first decide if you will create or reuse existing content. Some web content has a natural, conversational cadence, thus easily lends to voice interfaces. Examples include Frequently Asked Questions sections (FAQs), instructional content, and multiple-step forms. So’s project utilized the FAQs that already existed on the Georgia.gov site because they were concise, free, and conversational in tone. FAQs have become increasingly popular due to their informational, conversational format. So shares this wisdom, “You may need to have an honest chat with stakeholders eager to author new content because introducing another version of channel-specific content, this time for voice, can lead to content silos and maintenance nightmares further down the road” (2021, 23).
So concludes the book with a look to the future. His final thoughts will be familiar to user experience designers. Technology evolves, allowing individuals to level up their skills. This disruption causes job descriptions to shift and evolve; the barriers to entry are lowered as new professionals utilize the technology. UX designers will need to rethink their methods for this technology. For example, voice interaction flow is linear and does not follow the traditional site map hub-and-spoke design. However, UX designers will continue to relentlessly seek user-centric designs that serve the user’s needs by addressing the standards of accessibility, inclusive design, and awareness of bias. As technology continues and new questions arise, So writes, “But debate continues about whether voice users of today want a conversational partner who is indistinguishable from a real human being” (2021, 109–110).
In conclusion, So’s book is content-rich yet concise. It provides a clear direction for creating voice content and design from inception to launch and addresses questions that arise. The book offers a historical foundation, definitions of key terms, and illustrations for clarity. So challenges readers to design inclusive, accessible, and helpful voice interfaces, and he equips them with the information needed to begin the journey.
Meg leverages 20-plus years in visual arts education and design to create UX that resonates. Meg is a passionate and insatiably curious freelance user experience researcher, user interface designer, and web developer. She specializes in creating user-centric solutions for clients in health, education, and nonprofits. Meg holds multiple master's degrees in art and certifications in UX, AI, and data analytics.