شناسایی احساس سیگنال گفتار فارسی با استفاده از تحلیل ویژگی‌های طیفی-فرکانسی

مومنی, مریم

شناسایی احساس سیگنال گفتار فارسی با استفاده از تحلیل ویژگی‌های طیفی-فرکانسی

نوع مقاله : مقاله پژوهشی

نویسنده

مریم مومنی

گروه برق، دانشکده فنی و مهندسی اراک، دانشگاه اراک، اراک، ایران

چکیده

امروزه تشخیص احساس از گفتار در مواردی که ارتباط متقابل انسان و ماشین وجود دارد مورد توجه قرار گرفته است. با وجود تلاش‌های زیاد در این زمینه همچنان فاصله زیادی بین احساسات طبیعی انسان و درک کامپیوتر نسبت به آن وجود دارد. دلیل اصلی این موضوع نیز عدم توانایی رایانه در درک احساس کاربر است. هدف از این مقاله، طراحی یک سیستم تشخیص احساس از گفتار بر روی پایگاه داده گفتار احساسی فارسی که شامل 5 احساس خوشحالی، تنفر، ترس، ناراحتی و عصبانیت است. در این مقاله، پس از استخراج داده‌های چهار بعدی مقیاس، نرخ (سرعت)، زمان و فرکانس گفتار به کمک سیستم مدل شنوایی گوش انسان، داده دو بعدی مقیاس و فرکانس حاصل شد که بیشینه مقدار این داده‌ها به‌عنوان بردار ویژگی استفاده شد. در نهایت با استفاده از طبقه‌بند ماشین بردار پشتیبان احساس این پایگاه داده طبقه‌بندی شدند. نتایج آزمایش‌ها نشان می‌دهد الگوریتم پیشنهادی عملکرد قابل قبولی در مقایسه با سیستم‌های تشخیص خودکار احساسات از گفتار در زبان فارسی ارائه می‌دهد.

کلیدواژه‌ها

موضوعات

پردازش گفتار

عنوان مقاله [English]

Speech Emotion Recognition in Persian Using Spectro-Temporal Features

نویسنده [English]

Maryam Momeni

Electrical group, Engineering Department, Arak University, Arak, Iran

چکیده [English]

These days, speech emotion recognition has considered in the cases where there is a relationship between man and machine. Despite many efforts in this field, there is still a great gap between the natural feelings of humans and the computer's perception of it. The main reason for this is the inability of the computer to understand the user's feelings. The purpose of this paper is to design a system to recognize Persian emotional speech database, which includes five emotions of happiness, exhausting, fear, anger and sadness. In this paper, after extraction of four-dimensional features of scale, rate (speed), time and speech frequency with the help of the human auditory model system, two-dimensional features of the scale and frequency was obtained that the maximum amount of these features was used as a feature vector. Finally, the extracted features were classified using support vector machine. The results of the experiments show that the proposed algorithm provides acceptable performance compared to automatic speech emotion recognition in Persian.

کلیدواژه‌ها [English]

Speech Emotion Recognition
Persian Language
Spectro-Temporal Features

مراجع

[1] ح. مروی، ز. اسماعیلیان، " معرفی پایگاه داده فارسی جهت تشخیص احساس از روی گفتار، " بیست و یکمین کنفرانس مهندسی برق ایران، مشهد، دانشگاه . فردوسی مشهد، ۲۹۳۱

[2] D.J. France, R.G. Shiavi, S. Silverman, M. Silverman, D.M. Wilkes, "Acoustical properties of speech as indicators of depression and suicidal risk," Proc. IEEE, Trans. Biomedical Eng., vol. 74(4), pp. 9۱۳-9۹4, ۱004.

[3] T. Pao, C. Wang. "A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition," Proc. IEEE Fifth Int. Sym. Parallel Architectures, Algorithm and Programming, ۱0۲۱, pp. ۲54-۲6۱.

[4] ر. یوسفینژاد، ب. حاجی باقر نایینی، م. شفیعیان، " تشخیص احساس از سیگنال گفتار با استفاده از موجک بیونیک، " نشریه علمی ترویجی صوت و .9۹-4۲ ، ارتعاش، سال پنجم، شماره نهم، ۲۹۳5

[5] M.El. Ayadi, M.S. Kamel, F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognition, vol. 77, pp. 54۱–594, ۱0۲۲.

[4] E.M. Albornoz, D.H. Milone, H.L. Rufiner, "Spoken Emotion recognition using hierarchical classifier," Computer Speech and Language, vol. ۱5(۹), pp. 556–540, ۱0۲۲.

[8] B. Yang, M. Lugger, "Emotion recognition from speech signals using new harmony features," Signal Processing, vol. ۳0, pp.۲7۲5-۲7۱۹, ۱0۲0.

[7] D. Bitouk, R. Verma, A. Nenkova, "class level spectral features for emotion

recognition," Speech Communication, vol. 5۱, pp.6۲۹-6۱5, ۱0۲0.

[9] A. Hassan, R. Damper, "Classification of emotional speech using ۹DEC hierarchical classifier," Speech Communication, vol. 57, pp. ۳0۹-۳۲6, ۱0۲۱.

[11] D. Philippou-Hübner, B. Vlasenko, R. Böck, A. Wendemuth, "The Performance of The Speaking Rate Parameter in emotion recognition from speech," IEEE, International conference on Multimedia and Expo Workshops, Melbourne, VIC, Australia, ۳-۲۹ July ۱0۲۱.

[11] M. Gaurav, "Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech," ۱009 IEEE Spoken Language Technology Workshop, Goa, India, ۲5-۲۳ Dec. ۱009.

[12] ع. حریمی، ع. احمدی فرد، ع. شهزادی، خ. یغمایی، " تشخیص احساس از روی گفتار با استفاده از طبقهبند مبتنی بر مدل و ویژگیهای دینامیکی غیر

خطی، " ، نشریه، ب- مهندسی کامپیوتر، سال ۲5 .۲5۱-۲75 ، شماره ۱، تابستان ۲۹۳6

[13] O.M. Nezami, P. Jamshid Lou, M. Karami, "ShEMO: a large-scale validated database for Persian speech emotion detection, " Language Resources and Evaluation, March ۱0۲۳, Vol. 5۹, Issue ۲, pp ۲–۲6, ۱0۲۳.

[14] ب. ابراهیم پور، ح. محمودیان، " تشخیص احساسات گفتار با استفاده از انتخاب ویژگی بر اساس مدل های بازگشتی، " هفتمین کنفرانس ملی مهندسی برق و الکترونیک ایران، دانشگاه آزاد اسلامی . گناباد، ۱9 و ۱۳ مرداد ماه ۳7

[15] M. Hamidi, M. Mansoorizade, "Emotion Recognition From Persian Speech With Neural Network," International Journal of Artificial Intelligence & Applications (IJAIA), vol.۹(5), pp. ۲04-۲۲۱, ۱0۲۱.

[14] A. Shirani, A. R. N. Nilchi, "Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier," I.J. Image, Graphics and Signal Processing, vol. 7, pp. ۹۳-75, ۱0۲6.

[18] M. Shamsi, "Modeling of Emotion Recognition in Persian Speech by Machine Learning Method, "۹rd. International Conference on Science and Engineering, ۱ June ۱0۲6, Istanbul, Turkey.

[17] م. کرمی، پ. جمشیدلو، ح. صامتی، " تشخیص حس وابسته به گوینده گفتار فارسی با استفاده از ویژگیهای آکوستیکی، " نشریه علمی ترویجی صوت و - ارتعاش، سال دوم، شماره چهارم، ۲۹۳۱ ، صفحات ۹ .۲۹۳۱ ،۲7

[19] ن. کشتیاری، " تاثیر جنسیت بر درک نوای عاطفی گفتار در زبان فارسی، " فصلنامه زبانشناسی ، اجتماعی، سال اول، شماره اول، زمستان ۲۹۳5 .۳۳- صفحات 94

[21] H. Namvar Arefi, S.J. Sameni, H. Jalilvand, M. Kamali, "Effect of hearing aid amplitude compression on emotional speech recognition," Aud Vestib Res., vol. ۱6(7), pp. ۱۱۹-۱۹0, ۱0۲4.

[21] د. غرویان، س.م. احدی، " بازشناسی گفتار احساسی و شناسایی حالت گفتار در زبان فارسی، " مجله فنی و مهندسی مدرس، شماره ۹7 ، صفحات . ۱4-۲۹ ، زمستان ۲۹94

[22] D. Gharavian, "Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language," Amirkabir International Journal of Science& Research (Electrical & Electronics Engineering), vol. 77(۱), pp. ۹۹- 75, Fall ۱0۲۱.

[23] P. Jamshidlou, N. Keshtiari, M. Eslami, M. Bahrani, "Acoustic Representation of Intonational Elements in Persian Emotional Speech," Fifth International Conference on Iranian Linguistics (ICIL5), vol. ۱7, ۱0۲۹.

[24] M.Bashirpour, M. Geravanchizadeh, "Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments," EURASIP Journal on Audio, Speech, and Music Processing, vol. ۳, pp. ۲-۲۹, ۱0۲9.

[25] M. Bashirpour, M. Geravanchizadeh, "Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions," Iranian Journal of Electrical & Electronic Engineering, vol. ۲۱(۹), pp. ۲۳4-۱05, September ۱0۲6.

[24] T. Chi, Y. Gao, M. C. Guyton, P. Ru, S. Shamma, "Spectro-temporal modulation transfer functions and speech intelligibility, " J. Acoust Soc Am, vol. ۲06, pp: ۱4۲۳– ۱4۹۱, ۲۳۳۳.

[28] M. Karjalainen, "Auditory models for speech processing", Proc. of Int. Congr. of Phonetic Sciences, ۲۳94.

[27] N. Mesgarani, M. Slaney, and S. A. Shamma, "Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations," IEEE Transactions on Audio, Speech and Language Processing, vol. ۲7(۹), pp. ۳۱0–۳۹0, May ۱006.

[29] P.K. Ghosh, L.M. Goldstein, and S.S. Narayanan, "Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures," The Journal of the Acoustical Society of America, vol. ۲۱۳(6), pp. 70۲7– 70۱۱, Jun. ۱0۲۲.

[31] A. Klapuri, "Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model," IEEE Transactions on Audio, Speech, and Language Processing, vol. ۲6, no. ۱, pp. ۱55–۱66, Feb. ۱009.

[31] T. Chi, P. Ru, and S.A. Shamma, "Multiresolution spectrotemporal analysis of complex sounds," The Journal of the Acoustical Society of America, vol.۲۲9(۱), pp. 994-۳06, May ۱005

[32] S.M.N. Woolley, T.E. Fremouw, A. hsu and F. E. Theunissen, "Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds," Nature Neuroscience, vol. 9, pp. ۲۹4۲– ۲۹4۳, ۱005.

[33] T. Chi, Y. Gao, M.C. Guyton, P. Ru, S. Shamma, "Spectro-Temporal Modulation Transfer Functions and Speech Intelligibility," The Journal of the Acoustical Society of America, vol. ۲06(5), pp. ۱4۲۳-۹۱, ۲۳۳۳.

[34] T.M. Elliott, F.E. Theunissen , "The Modulation Transfer Function for Speech Intelligibility," PLoS Comput Biol, vol. 5(۹): e۲000۹0۱, pp. ۲-۲7, ۱00۳.

[35] M. R. Scha¨dler, B.T. Meyer, and B. Kollmeier, "Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition," Acoustical Society of America, vol. 131(5), pp. 4134-51

[34] R. Santoro, M. Moerel, F. De Martino, R. Goebel, K. Ugurbil, E. Yacoub, E. Formisano, "Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex," PLoS Comput Biol ۲0(۲): e۲00۹7۲۱, pp. ۲-۲7, ۱0۲7.

[38] Y. Li, L. Zhang, B. Li, Y. Xu, S. Wu, X. Wei, X. Liu, R. Lin, Q. Wang, "The Simulation Study of Three Typical Time Frequency Analysis Methods", BIO Web of Conferences, vol. 9, p. 0۱004, ۱0۲4.

[37] S. J. Chaudhari R. M. Kagalkar, "Automatic Speaker Age Estimation and Gender Dependent Emotion Recognition", International Journal of Computer Applications, vol. ۲۲4(۲4), May ۱0۲5.

[39] B. Schuller, "Towards intuitive speech interaction by the integration of emotional aspects," Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, October 6-۳, ۱00۱.

[41] N. Keshtiari, M. Kuhlmann, M. Eslami, G. Klann-Delius, "A database of Persian Emotional Speech," Paper presented at the ۲st Basic and Clinical Neuroscience Congress, Tehran University of Medical Sciences, ۱0۲۱.

[41] N. Keshtiari, M. Kuhlmann, M. Eslami, and G. Klann-Delius, "Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)," Behavior Research Methods, vol. 74, pp. ۱45-۱۳7, ۱0۲5.

[42] https://isr.umd.edu/Labs/NSL/Software.ht