Voice/Speech Recognition/Synthesis

from: Human Perfornance Center (Navy)

Voice/speech recognition involves computers recognizing peoples' voices for the purpose of carrying out commands or directions to complete tasks. Voice recognition is identification of a person's voice and speech recognition is recognition of what someone is saying. Voice Synthesis involves computers generating human-like speech for the purpose of communicating with people. Voice recognition can be used when the trainees hands are already busy with tasks and another method for inputting information is needed.

For over five decades, researchers have worked toward computer speech recognition. While real time recognition of unconstrained human language hasn't been reached, the technology has made significant progress. As microprocessor technology continues to advance, speech recognition products are being made faster, better, and cheaper.

Advances in hardware speed and algorithm capability have made improved automatic speech recognition technology possible,” said Dr. John C. Kelly, interim chairperson of [North Carolina Agricultural and Technical State University] NC A&T’s Department of Electrical Engineering. “Unfortunately, though the technology components are in place, unconstrained speech recognition systems have advanced very little, due to the complexity and redundancy of language. (April 2001, The Use of Multi-Model Technology to Advance Automatic Speech Recognition , a project sponsored by Defense Advanced Research Projects Agency (DARPA).

Speech recognition systems are divided into two main categories:

speaker-dependent systems which require a user to train the system to recognize his/her voice and
speaker-independent systems which can recognize any speaker in the language for which the system was designed. In general, speaker-dependent systems are more accurate. But, the industry trend is for speaker- independent systems that require little or no training time.

There are a number of speech synthesis systems on the market. In artificial speech generation there is a tradeoff between intelligibility and naturalness. Currently, the industry has placed emphasis on naturalness, with the unfortunate consequence that even the high end systems are sometimes hard to understand. Speech synthesis is generally regarded as a secondary and much less complex issue when compared to speech recognition and understanding.

The current trends in the industry are for speaker-independent systems, software-based systems, and extensive use of post- processing. Spoken language systems that interpret spontaneous speech are also emerging. Speech recognition systems are being widely used by telephone companies, banks, and as dictation systems in many offices. These applications are highly constrained and sometimes require the user to pause between each word (isolated speech recognition). Other much more challenging applications have been fielded. Over the years, the Navy has purchased simulators that must operate under the most adverse conditions to speech recognition -- high noise, high stress, high personnel turnover, high need for accuracy, real time performance of about 300 msec, and out-of-phraseology speech.

Hardware and Software Issues:

Speed vs. Accuracy - As affordable computers increase in processing speed, more knowledge can be processed to increase speech recognition accuracy. Further, constraining what can be said significantly reduces complexity. Engineering of applications becomes a balance of speed and accuracy.

Noise - Speech recognizers are susceptible to coherent and random noise within the bandwidth of human speech . Since speaker-independent systems use speaker models that operate with a variety of speakers, the models are more susceptible to noisy environments than the speaker-dependent systems

Out-of-Phraseology Speech - Speech recognizers are not yet capable of understanding unconstrained human speech. Accordingly, applications are developed based on constrained vocabularies. But users often say words that are not in the legal vocabulary. Since the speech recognizer will try to make the best match, undetected out-of-phraseology speech could be processed with chaotic results. The challenge is to detect out-of-phraseology speech and reject it before it is post-processed.

Benefits and Risks:

The benefit of speech is that it is the most familiar way for humans to communicate. In the past, we have been required to interact with machines in the language of those machines. With the advent of speech recognition and synthesis technology, humans can communicate with machines using constrained natural language.

Speech recognition systems are not capable of perfect recognition. In general, performance is a trade-off between speed and accuracy. The major risk associated with these systems is that the speech input will be misrecognized and that erroneous input will be processed by the system. What would be considered a minor annoyance with a dictation system could be disastrous to a pilot using speech recognition in the cockpit

Developmental Issues:

Much research is being conducted in the area of spoken language understanding. Spoken language systems try to take the best possible result of a speech recognition system and further process the result. A spoken language system is defined as a system that understands spontaneous spoken input. Spontaneous speech is both acoustically and grammatically challenging to understand. Center for Spoken Language Research , University of Colorado, Boulder, conducts many research projects, some funded by DARPA, on natural language understanding, dialog modeling, and so on.

The Cambridge University Speech Research page contains information on speech recognition, coding synthesis, related conferences and web sites and more.

Speech at Carnegie Mellon University is dedicated to speech technology research, development, and deployment, and provides a vehicle to make their work available online. CMU has a historic position in computational speech research, and continues to test the limits of the art.

Center for Spoken Language Research, University of Colorado, Boulder is focused on research and education in areas of human communication technology. This center conducts many research projects, some funded by Defense Advanced Research Projects Agency (DARPA), on natural language understanding, dialog modeling, multi-modal speech recognition technology, and so on.

Military Applications of the Technology

The U.S. Navy has used speaker dependent, continuous speech recognizers in Air Traffic Controller and Landing Signal Officer simulators. Both of these applications are well suited to the use of speech recognition because the training objective is for students to learn to control aircraft via oral communication with a pilot. Before speech recognition, a person would have to 'play' the role of a pseudo pilot. Now, the speech recognizer/synthesizer assumes the role of the pilot.

Emerging trends/R&D Initiatives

The mission of the speech technologies group (STG) at NAWCTSD is to explore emerging technologies for possible use in trainer applications.

Army Research Office is sponsoring natural language speech and text research.

Naval Research Lab conducts natural language research.

Engineers and scientists at the Space and Naval Warfare Systems Command (SPAWAR) (formerly Naval Command, Control and Ocean Surveillance Center, Research, Development, Test and Evaluation Division (NCCOSC RDTE DIV or NRaD)) conduct research and development tasks in speech recognition, speech signal processing, speech synthesis and related voice technologies.

The Army Research Institute developed a language sustainment tutor for Special Operations Forces (SOF) that the student can talk to. Speech recognition technology will be integrated into ARI's existing Military Language Tutor (MILT) and applied to scenarios relevant to SOF missions. This R&D program was supported by DARPA and by the Special Operations Command (SOCOM). It will apply to SOF units across the services.

The Defense Advanced Research Projects Agency (DARPA) is the central research and development organization for the U.S. Department of Defense (DoD). It manages and directs selected basic and applied research and development projects for DoD, and pursues research and technology where risk and payoff are both very high but success may provide dramatic advances for traditional military roles and missions and dual-use applications.

DARPA speech recognition/synthesis projects include: The Human Language Systems (HLS) Program will create usable computer systems that can read and hear; moreover the systems will be able to understand what they read or heard in the context of a specific task. The overall objective of the program is to improve the readiness of military forces and improve the affordability of systems by providing dramatic new technology for systems interaction and use. The primary task domain for Human language Sytems will be military Command, Control, Communications, Computers, and Intelligence (C4I) with special emphasis on JTF crisis management planning and execution. By 1998, the goal was to provide easy to use dialog interaction capability for crisis decision support. This capability will enable effective operation of a geographically dispersed JTF staff and designated worldwide functional experts in domains like logistics, meteorological forecasting, and medicine.

The AEGIS Program Office has an ongoing hands-free computer project with the objective of enabling sailors to access their computer programs and displays anywhere on the ship via a two pound hands-free computer embedded into a vest. The interface includes voice recognition, a mouse and small wrist mounted keyboard and a miniature display that is head mounted. These hands-free computers have wireless local area network cards and up to 1 gigabyte of disk storage in PCMCIA cards which are the size of credit cards. An option to the system is inclusion of the Global Positioning System.

The first hands-free LAN access points were installed aboard the USS Rentz in August 1996 which allow sailors to access the ship's LAN from their hands-free at their job sites. Repair technicians can request remote supply, parts or technical data while staying at their repair site.

The USS Princeton and USS John Paul Jones are in the process of installing hands-free LAN access points aboard their ships to become hands-free beta test sites along with the USS Rentz, and the Aegis Training Center. The hands-free computers will house Interactive Electronic Technical Manuals (IETMs) like the Aegis Fire Control Transmitter IETM which helps the sailors to do their jobs more efficiently and quicker.

Interested in posting a banner ad? Start here.