About this Article
Written by: Seth Capistron
Written on: November 4th, 2005
Tags: electrical engineering, lifestyle, computer science
Thumbnail by: Alpha/Moto Q9h
About the Author
Seth Capistron was a junior Computer Science major at the time of publication. He enjoys golf, going to concerts, and spending time with friends.
Also in this Issue
Applying Nanotechnology to the Battle Against CancerWritten by: Simon Tse
Leonardo da Vinci: The Engineer Written by: Leallyn Murtagh
Look, No Hands!Written by: Steve Condoretti
Planning for Future GenerationsWritten by: Kari Hernandez
The Science Behind Tennis Racquet Performance and Choosing the Right RacquetWritten by: Yohan Chang
Stay Connected

Volume VIII Issue I > The Inner Workings of Speech Recognition
With research focusing on different ways for people to interact with computers, speech recognition is emerging as a very important technology. Whether it is using a voice controlled navigation system in a car, or a voice controlled system over the phone, speech recognition is bound to play a larger role in society. Ultimately, various theories on human speech and complementary processes to computational processing of language are currently being applied to this emerging technology.


When most people think of interacting with a computer, they think of a mouse and a keyboard. However, researchers are constantly thinking up new ways for users to interact with computers. For example, imagine being able to think about what you want your computer to do and then having your computer actually do it. Some of these incredible user interfaces are already in use today. Fifteen years ago, if you told someone that you could be able to get in your car, talk to a computer, and have it give you detailed directions, most people would not have believed you. This type of interaction with computers was made possible by the engineering behind speech recognition.
Speech recognition allows users to speak to a computer (see Fig. 1) and have their words interpreted as instructions or text. Although speech recognition has been around since virtually the beginning of the computing era, it has not been until recently that the quality has been reliable enough to justify wide-scale use. It has only been with the developments made by engineers that this technology has gone from science fiction to a tool of the present.
Alpha/Moto Q9h
Figure 1: Speech recognition software permits users to interact with technology, including computers.

Inherent Difficulties

Since speech is commonplace to human interaction, it may at first be hard to see why a computer would have difficulty interpreting it. The skill of being able to interpret what someone says with amazing accuracy is often taken for granted. Another skill that humans undervalue is the ability to understand sloppy speech.
Often times when we casually talk with our friends or coworkers, we lose some of the precision with regards to our pronunciation. Many times, words or phrases get strung together, or syllables get dropped [1]. This does not present a problem for humans because we are used to hearing it on a daily basis. If someone runs the two words ''did you'' together, we don't think twice about what that person was trying to say. The same, however, cannot be said for a computer trying to interpret someone's speech. When a phrase such as ''did you'' gets strung together, it is practical to assume the computer would see it as one word. Yet this problem can be corrected rather easily by requiring the user to better enunciate their words.
Another issue in speech recognition is the existence of multiple users. Every day humans interact with dozens of different speakers. Whether it is the anchorperson on the morning news, the DJ on the radio, or the boss at the office, people find it relatively easy to understand speech from multiple sources. Yet sometimes it is not as easy. Sometimes people have trouble understanding a foreign speaker. Even though the person may be speaking English, it is still difficult to understand what they are saying because they pronounce words much differently than we are used to hearing. Computers also experience similar problems, but it has been found that they are much more sensitive than humans. Researchers have shown that speaker-dependent speech recognition systems will have three to five times fewer errors than speaker independent systems [2]. This shows just how sensitive most speech recognition systems are to who is speaking (see Fig. 2).
Alpha/Moto Q9h
Figure 2: Some phones have speech recognition software built-in.
Another aspect speech recognition systems are very sensitive to is background noise. As an example, imagine you are in your car talking to your voice-controlled navigation system. A passenger sitting next to you can easily determine what you are saying from the many noises that they actually hear. They are able to filter out the kids in the back seat, the honking of horns, and the various noises of the road. This, however, isn't a given for the speech recognition feature in your car's navigation system. Early on in the development of speech recognition systems, environmental noise had to be kept to a bare minimum to avoid confusion. While engineering has vastly reduced the sensitivity of these systems, background noise is still an issue that prevents speech recognition systems from being deployed in certain situations. For example, it is hard to imagine every single person in a crowded office space using speech recognition to interact with his or her computer.
The final set of difficulties with speech recognition systems is vocabulary and grammar. The vocabulary of a speech recognition system is the range of words the system recognizes, while the grammar is the order of the words that it recognizes [2]. While many people say they do not have very large vocabularies, they are usually underestimating themselves. The sheer number of words that humans can hear and instantly recognize is amazing. It is also amazing that we can understand everything someone says, even though they may be speaking quickly or using complex sentence structures.
This is the same thing that engineers have been working very hard to replicate. Even the most advanced speech recognition systems in the world have limited vocabulary and grammar. However, the trick to creating a speech recognition system that can both recognize a user's input and be fast enough to keep up with the user's speech is to limit the range of possible inputs.