دورية أكاديمية

APPEARANCE FEATURE EXTRACTION VERSUS IMAGE TRANSFORM-BASED APPROACH FOR VISUAL SPEECH RECOGNITION.

التفاصيل البيبلوغرافية
العنوان: APPEARANCE FEATURE EXTRACTION VERSUS IMAGE TRANSFORM-BASED APPROACH FOR VISUAL SPEECH RECOGNITION.
المؤلفون: SAGHEER, ALAA, TSURUTA, NAOYUKI, TANIGUCHI, RIN-ICHIRO, MAEDA, SAKASHI
المصدر: International Journal of Computational Intelligence & Applications; Mar2006, Vol. 6 Issue 1, p101-122, 22p, 1 Black and White Photograph, 4 Diagrams, 5 Charts, 2 Graphs
مصطلحات موضوعية: SPEECH perception, SPEECH education, VERBAL ability, VERBAL behavior, SPEECH processing systems, LIPREADING
مستخلص: In this paper we propose a new appearance based system which consists of two stages: visual speech feature extraction and classification, followed by recognition of the extracted feature, thereby the result is a complete lip-reading system. This lip-reading system employs our Hyper Column Model (HCM) approach to extract and classify the visual features and uses the Hidden Markov Model (HMM) for recognition. This paper addresses mainly the first stage; i.e. feature extraction and classification. We investigate the HCM performance to achieve feature extraction and classification and then compare the performance when replacing HCM with Fast Discrete Cosine Transform (FDCT). Unlike FDCT, HCM could extract the entire features without any loss. Also the experiments have shown that HCM is generally better than FDCT and provides a good distribution of the phonemes in the feature space for recognition purposes. For fair comparison, two databases are exploited with three different sets of resolution for each database. One of these two databases is designed to include shifted and scaled objects. Experiments reveal that HCM is capable of recovering and dealing with such image restrictions whereas the effectiveness of FDCT drops drastically especially for new subjects. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Computational Intelligence & Applications is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:14690268
DOI:10.1142/S1469026806001800