The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this case the stored model chosen was a phoneme-based model. This reference model differs from the direct word template matching, where speech features that are extracted from the input are directly compared to the word templates. Each word template in the direct matching model was stored as a vector of feature parameters. Thus, when the vocabulary size of the ASR system becomes large, the memory size for the word template will become humongous. In contrast, the model used here was phoneme-like template matching. Word templates are stored as phoneme-like template parameters. Thus, the memory size for the word templates will not grow as fast as that of the direct matching model.
All Science Journal Classification (ASJC) codes