Touchless Interaction: Communication with Speech and Gesture

Humans have many ways to communicate—speech, gestures, direct touch—and as a result, communication between humans happens seamlessly…most of the time. But communication between humans and machines requires more formality. Most current technology systems rely on touch: we press buttons, type on keyboards, and swipe the screens of mobile devices. However, new systems are adding touchless control options to our vocabulary, enabling us to communicate with machines through speech and gesture.

Although gesture and sound-based input methods might seem more like a high-tech game than work, they are being taken seriously in industrial settings. These systems include touchless sensors that are useful both in simple forms, such as water faucets that turn on when we wave our hands under them, and in complex forms, such as data storage applications that use face recognition, iris detection, and voice-tracking for security and safety. An example from our daily lives is Google’s voice-based search application; if our hands are full, we can ask our phone a question and the application replies with a result. Although there are many examples of touchless controls in our daily lives, two examples of the importance of this growing market are industrial automation and healthcare.

Industrial Automation

An industrial plant can be controlled two ways. One is by HMI (Human Machine Interface) input devices that are placed near individual machines. A second is a Supervisory Control and Data Acquisition (SCADA) system in which the complete plant can be configured and controlled from a control room. Experts using these systems can give a single command to multiple devices or multiple commands to a single piece of equipment. Centralizing the control reduces the cost of production and improves unit quality and employee safety, especially when the shop floor is hazardous and time-to-market is important. The idea of the fully automated factory is not new; this concept has been around for more than thirty years. A New York Times article in 1981, for example, celebrated the “Manless Factory” as a new trend in Japan.

However, if interaction technologies are overly obtrusive or constraining, then the user’s experience with this synthetic automation world will be severely degraded. If the interaction draws attention to the technology, rather than the task, or imposes a high cognitive load on the user, it becomes a burden and obstacle to the process. Sometimes traditional mouse/keyboard-oriented graphical interfaces are not well suited to large, complex tasks. As a result, speech and gesture-based inputs are making industrial plant and process automation faster and more efficient by providing natural, efficient, powerful, and flexible interactions. Human gestures and languages are natural and flexible and may often be efficient and powerful, especially as compared with alternative interaction modes.

Robotic Systems and Simulation Environments

At a Volkswagen factory in Wolfsburg, Germany, many robotic hands, conveyor belts, and controls move simultaneously, each placing modules and completing their assigned tasks. Though the work is monotonous, automated processes do these jobs accurately and on time. Although engineers maintain machinery on the floor and manage hardware malfunctions, the close proximity required for touch-based communication is not always possible, and it can be difficult, costly, and time-consuming to rely on having an engineer go to a particular place and provide instructions. Sometimes this challenge is met by using batch processing techniques, but those solutions are specific to each application and tend to increase plant configuration costs. Imagine how much more efficient this factory could be if it was possible to manage it through touchless interactions.

One example of touchless automation can be seen at the Forth Valley Royal Hospital in Larbert, Scotland, where three robots handle a complete pharmaceutical process with touchless features. Drugs delivered to the hospital are tipped into a giant hopper and a conveyer belt moves them to a machine that reads the barcodes by using image processing techniques. The robots then stack the drugs on shelves—not in alphabetical order, but using a system that makes the best use of space—placing the most-frequently used packs for easy access.

The system continuously checks stock availability, and requests for medicines are instantly sent to robots that select and dispatch the drugs. A tablet computer has replaced the pharmacists’ prescription pads, and a color-coded screen on every ward tells medical staff exactly what stage each prescription has reached. Forth Valley Royal’s associate director of nursing, Helen Paterson, confirms that the paperless system has freed up nursing time, and hospital managers said that the £400,000 automated pharmacy has saved £700,000 from the hospital’s drug bill. The hospital is also implementing a robotic porter system where fleets of robot workers carry clinical waste, deliver food, clean the operation theatres, and dispense drugs by recognizing speech commands. Staff use handheld PDAs to call robots—which also respond to voice commands—to move meal trays, linen, or other supplies. The robot comes up in a service lift by itself, picks up the item, and returns to the lift by following a system of pre-programmed routes that use laser beams to tell the robot where it is and where it needs to go. Computers onboard the robots tell doors to open and sensors instruct the robots to stop if anything, or anyone, is in the way.

BMW is testing new robotic systems to work with human factory workers. According to a report in the MIT Technology Review, “BMW is testing even more sophisticated final assembly robots that are mobile and capable of collaborating directly with human colleagues. These robots, which should be introduced in the next few years, could conceivably hand their human colleague a wrench when he or she needs it.” This interaction could be prompted through a combination of speech recognition, image processing, and pattern recognition.

Ford Europe is also working on a fully automated plant that they call a Virtual Factory. It will be managed by a gesture-based system and augmented reality. According to Ford, “Virtual factories will enable Ford to preview and optimize the assembly of future models at any of our plants, anywhere in the world. With the advanced simulations and virtual environments we already have at our disposal, we believe this is something Ford can achieve in the very near future.”

Healthcare Automation

Healthcare offers another example of human-machine collaboration. Touch-based interaction methods are designed for people who can move physically to give input to a system, but patients may not have the mobility to interact with a touch-based system. Imagine a hospital with an automated patient-monitoring system that allows patients to communicate through speech or gesture to get the immediate attention of the nursing staff, or even robotic assistance. A CCTV camera and a wireless microphone could control these advanced automated monitoring systems. Even if patients can’t move from bed or chair, they can give instructions by voice or gesture movement to communicate their needs.

Speech and gesture can also be part of expert healthcare systems, such as diagnostic processes or medical instruments. Surgeons, for example, don’t like to touch non-sterile keyboards in the middle of surgery for sanitary and efficiency reasons. Enter the researcher willing to try something different.

Using Gestures

GestSure Technology, a Seattle-based start-up firm, uses Microsoft Kinect (most popular for its XBOX 360 gaming console) as its backend to allow surgeons to access MRI (Magnetic Resonance Imaging) and CT (Computed Tomography) scans during surgery without touching a keyboard or mouse. When activated, it follows hand movements by using three sensors to do depth analysis, and by doing so, can understand a human’s position in a room and a particular body part’s movement. As stated in an article in the Massachusetts Institute of Technology (MIT) Technical Review “Kinect hardware’s ability to perceive depth makes it easier to recognize and track objects, something very challenging using an ordinary camera.” The article claims that the gesture accuracy is better than voice recognition, which can only reach 95-98 percent accuracy (meaning it won’t work one time in fifty).

Another startup company, Jintronix, has created a Kinect-based application that guides someone recovering from a stroke through physical rehabilitation exercises. The system monitors their progress and supplies real-time guidance.

Interaction Issues

Although these systems offer a lot of promise, they can also be a challenge for people to interact with smoothly. In our work, we have observed usability problems with touchless interaction styles.

  • In the Valley Royal Hospital pharmaceutical department example, staff were concerned that the constant whirring of the robots and conveyor belts would make the pharmacy too noisy. In fact, it’s quieter than most hospital pharmacies where phones are constantly ringing. There is one problem, though: the robots can only handle small square or rectangular boxes. Still, pharmaceutical companies are already altering their packaging so that it’s suitable for a future where robotic pharmacies are the norm.
  • Touchless system users need to provide instructions in a specific pattern in order for the system to understand their commands. Most users need time-consuming training and practice to master this process.
  • The user’s commands can be ambiguous or misunderstood when multiple machines are waiting for human input. These systems need to provide a good way to indicate which particular machine is being addressed.

Building Intelligence and Standards

Gesture recognition systems depend on image processing algorithms and pattern recognition. In artificial intelligence, machines need to be programmed so that they understand a wide range of different gestures given by users. Researchers are trying to create algorithms that can learn the signal language from humans in runtime—using visual sensors or cameras, video streaming, etc.—and react accordingly, making the systems more accurate in less time. This is called “the building of intelligence.”

The main problem with speech recognition is regionalism. For example, in different countries the English language has different tones and pronunciation styles. In a normal sound recognition system, a voice or sound signal is compared with existing similar signals. Technically, this set of sound signals is known as signal datasets. To understand the different type of accents, different data sets need to be recognized by the application for it to react properly.

The scenario is different for gesture recognition, in which a common pattern is essential. Suppose a team is developing software where an application can be stopped by showing one hand with one open palm. On the other side of the world another group of developers is creating an application where two hands with open palms need to be shown for the same purpose. This is confusing, but can be solved by using a single pattern.

Future Challenges

Touchless communication is growing in maturity and spreading beyond entertainment into critical engineering fields; this trend will only continue in the future. But to be used in demanding fields such as factory automation or healthcare, human-machine communication must be more accurate and easier to use than current standards. Not only do the system themselves need to be smarter, but a common standard for gestures will be critical to the real people who must work with these systems. If this happens, it is possible to imagine a future in which touchless communication can be a widely used medium of communication between human and machine.

Chakraborty, I., Paul, T. (2014). Touchless Interaction: Communication with Speech and Gesture. User Experience Magazine, 14(1).
Retrieved from

Comments are closed.