The paper addresses a problem of isolated vowels recognition in patients
following total laryngectomy. The visual and acoustic speech modalities were
separately incorporated in the machine learning algorithms. The authors used
the Mel Frequency Cepstral Coefficients as acoustic descriptors of a speech
signal. A lip contour was extracted from a video signal of the speaking
faces using OpenCV software library. In a vowels recognition procedure the
three types of classifiers were used for comparison purposes: Artificial
Neural Networks, Support Vector Machines and Naive Bayes. The highest
recognition rate was evaluated using Support Vector Machines. For a group of
the laryngectomees having a different quality of speech the authors achieved
75% for acoustic and 40% for visual recognition performances. The authors
obtained higher recognition rate than in a previous research where 10
cross-sectional areas of a vocal tract were estimated. Using presented image
processing algorithm the visual features can be extracted automatically from
a video signal.