Dysarthria Severity detection Using Recurrent and Convolutional Neural Networks
Main Article Content
Abstract
The diagnosis and monitoring of dysarthria, a speech disorder caused by neuro-motor problems that affect articulation, depend on a precise evaluation of its severity. When creating automated systems to identify and categorize dysarthric speech, accurate severity classification is essential. Using neural network models, specifically recurrent neural networks (RNN) and convolutional neural networks (CNN), this paper offers a thorough investigation of how to distinguish dysarthric voices among a collection of normal voice samples and categorize the severity of dysarthria. Among the features used in the study are voice quality, prosodic parameters, formants, Mel frequency cepstral coefficients (MFCC), and spectrograms. Comparing the ability of convolutional networks and reccurents to identify abnormalities in normal data, as well as the hybrid model that combines convolutional and reccurent neural networks (CRNN), is our goal. The Nemours corpus database is used to assess these neural network models' performances. Notably, 99.8% is the highest classification accuracy attained with this corpus
Article Details
References
Bai, J., Wang, J., & Zhang, X. (2013). A Parameters Optimization Method of v-Support Vector Machine and Its Application in Speech Recognition. J. Comput., 8(1), 113-120..
Deng, L., & Platt, J. (2014, September). Ensemble deep learning for speech recognition. In Proc. interspeech.
Freed, D. B. (2023). Motor speech disorders: diagnosis and. treatment. plural publishing
Hamza, A., Addou, D., & Kheddar, H. (2023, November). Machine learning approaches for automated detection and classification of dysarthria severity. In 2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM) (Vol. 1, pp. 1-6). IEEE.
Hernandez, A., Kim, S., & Chung, M. (2020). Prosody-based measures for automatic severity assessment of dysarthric speech. Applied Sciences, 10(19), 6999.
Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(3), 637-645.
Kadi, K. L., & Selouani, S. A. (2019). Distinctive auditory-based cues and rhythm metrics to assess the severity level of dysarthria. Signal and Acoustic Modeling for Speech and Communication Disorders, edited by Patil and Amy Neustein, De Gruyter, 205-226.
Kheddar, H., Bouzid, M., & Megías, D. (2019). Pitch and fourier magnitude based steganography for hiding 2.4 kbps melp bitstream. IET Signal Processing, 13(3), 396-407.
Martínez, D., Lleida, E., Green, P., Christensen, H., Ortega, A., & Miguel, A. (2015). Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Transactions on Accessible Computing (TACCESS), 6(3), 1-21.
Mazari, A. C., & Kheddar, H. (2023). Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems.
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Information Fusion, 99, 101869.
palmer, R., & Enderby, P. (2007). Methods of speech therapy treatment for stable dysarthria: A review. Advances in Speech Language Pathology, 9(2), 140-153
Rudzicz, F. (2010). Articulatory knowledge in the recognition of dysarthric speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 947-960
Salhi, L., & Cherif, A. (2013, April). Selection of pertinent acoustic features for detection of pathological voices. In 2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO) (pp. 1-6). IEEE.
Schu, G., Janbakhshi, P., & Kodrasi, I. (2023, June). On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Seong, W. K., Kim, N. K., Ha, H. K., & Kim, H. K. (2016, December). A discriminative training method incorporating pronunciation variations for dysarthric automatic speech recognition. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (pp. 1-5). IEEE.
Tu, M., Wisler, A., Berisha, V., & Liss, J. M. (2016). The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance. The Journal of the Acoustical Society of America, 140(5), EL416-EL422.
Xiao, Y., & Cho, K. (2016). Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367.
Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., & Chen, Y. (2015). Convolutional recurrent neural networks: Learning spatial dependencies for image representation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 18-26).