M. H. Sedaaghi,
Volume 5, Issue 1 (3-2009)
Abstract
Accurate gender classification is useful in speech and speaker recognition as
well as speech emotion classification, because a better performance has been reported when
separate acoustic models are employed for males and females. Gender classification is also
apparent in face recognition, video summarization, human-robot interaction, etc. Although
gender classification is rather mature in applications dealing with images, it is still in its
infancy in speech processing. Age classification, on the other hand, is also concerned as a
useful tool in different applications, like issuing different permission levels for different
aging groups. This paper concentrates on a comparative study of gender and age
classification algorithms applied to speech signal. Experimental results are reported for the
Danish Emotional Speech database (DES) and English Language Speech Database for
Speaker Recognition (ELSDSR). The Bayes classifier using sequential floating forward
selection (SFFS) for feature selection, probabilistic Neural Networks (PNNs), support
vector machines (SVMs), the K nearest neighbor (K-NN) and Gaussian mixture model
(GMM), as different classifiers, are empirically compared in order to determine the best
classifier for gender and age classification when speech signal is processed. It is proven that
gender classification can be performed with an accuracy of 95% approximately using
speech signal either from both genders or male and female separately. The accuracy for age
classification is about 88%.