Published in IEEE 2019 6th Intl. Conference on Soft Computing & Machine Intelligence (ISCMI 2019)
This paper uses speech to build a gender recognition system based on neural networks. Three types of neural networks are investigated to find the best model for gender recognition system using Yorùbá, namely, feed-forward artificial neural networks (Multilayer Perceptrons), Recurrent neural networks (long short-term memory), and Convolutional neural networks. All the classifier models obtained the state-of-the-art performance in speech-based gender recognition with 99% in accuracy and F1 score.
Gender recognition in speech processing is one of the most challenging tasks. While many studies rely on extracting features and designing enhancement classifiers, classification accuracy is still not satisfactory. The remarkable improvement in performance achieved through the use of neural networks for automatic speech recognition has encouraged the use of deep neural networks in other voice techniques such as speech, emotion, language and gender recognition. An earlier study showed a significant improvement in the gender recognition of pictures and videos. In this paper, speech is used to create a gender recognition scheme based on neural networks. Attention-based BiLSTM architecture is proposed to discover the best approach for gender identification in Yorub` a. Acoustic features, including time, frequency, and cepstral features are extracted to train the model. The model obtained the state-of-the-art performance in speech-based gender recognition with 99% accuracy and F1 score.