Abstract: (1044 Views)
The Nonparametric Speech Kernel (NSK), a nonparametric kernel technique, is presented in this study as a novel way to improve Speech Emotion Recognition (SER). The method aims to effectively reduce the size of speech features to improve recognition accuracy. The proposed approach addresses the need for efficient and compact low-dimensional features for speech emotion recognition. Having acknowledged the intrinsic distinctions between speech and picture data, we have refined the Kernel Nonparametric Weighted Feature Extraction (KNWFE) formulation to suggest NSK, which is especially intended for speech emotion identification. The output of NSK can be used as input features for deep learning models such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or hybrid architectures. In deep learning, NSK can also be used as a kernel function for kernel-based methods such as kernelized support vector machines (SVM) or kernelized neural networks. Our tests demonstrate that NSK outperforms current techniques, outperforming the best-tested approach by 5.02% and 3.05%, respectively, with an average accuracy of 96.568% for the Persian speech emotion dataset and 82.56% for the Berlin speech emotion dataset.
Type of Study:
Research Paper |
Subject:
Speech Processing Received: 2022/05/30 | Revised: 2024/04/20 | Accepted: 2023/12/13