Speech Recognition is the process in which words of a speaker will be automatically recognized as text or some predefined instruction or code based upon the information included in individual speech waves. A robust speech-recognition system combines accuracy of speech identification with the ability to filter out noise and adapt to other acoustic conditions, such as the speaker’s speech rate and accent. Speech-recognition technology is nowadays embedded in voice-activated routing systems at customer call centers, voice dialing on mobile phones, transcription (voice to text), managing stuff (creating voice commands),web search, GPS navigation, vending machines, smart homes and many other everyday applications.
ASR System can be: Speaker dependent, Speaker independent, Isolated Word, Limited Vocabulary, Continuous Speech, Unlimited Vocabulary.
Products Used include:
■ Data Acquisition Toolbox™
■ Signal Processing Toolbox™
■ Statistics Toolbox™
■ ASR System Overview:
The basic workflow is demonstrated considering an isolated; speaker dependent digit recognition system. It comprise of three steps:
■ Speech acquisition
For training, speech is acquired from a microphone and brought into the development environment for offline analysis. For testing, speech is continuously streamed into the environment for online processing. Data Acquisition Toolbox™ is used to set up continuous acquisition of the speech signal and for simultaneous extraction of frames of data for processing. Speech processing includes: Pre-emphasis (Flatten the magnitude spectrum), Frame Blocking (Speech is short term predictable), Windowing (Remove the discontinuities at the beginning and the end of each frame).
■ Speech analysis
Developing a Speech-Detection Algorithm : The speech-detection algorithm is developed by processing the prerecorded speech frame by frame within a simple loop.
Developing the Acoustic Model : A good acoustic model should be derived from speech characteristics that will enable the system to distinguish between the different words in the dictionary.
■ User interface development
After developing the isolated digit recognition system in an offline environment with pre-recorded speech, we migrate the system to operate on streaming speech from a microphone input. We use MATLAB GUIDE tools to create an interface that displays the time domain plot of each detected word as well as the classified digit (Figure1).
Author - Sushant shama
(Research Associate at Sillicon Mentor)