Виявлення голосової активності на основі кута нахилу апроксимувальної прямої власних значень

Коваль, О. В.; Koval, O.

dc.contributor.author	Коваль, О. В.	uk
dc.contributor.author	Koval, O.	en
dc.date.accessioned	2024-06-18T08:36:37Z
dc.date.available	2024-06-18T08:36:37Z
dc.date.issued	2023
dc.identifier.citation	Коваль О. В. Виявлення голосової активності на основі кута нахилу апроксимувальної прямої власних значень [Текст] / О. В. Коваль // Вісник Вінницького політехнічного інституту. – 2023. – № 4. – С. 68-77.	uk
dc.identifier.issn	1997–9266
dc.identifier.issn	1997–9274
dc.identifier.uri	https://ir.lib.vntu.edu.ua//handle/123456789/42815
dc.description.abstract	Розглянуто метод виявлення голосової активності (VAD — Voice Activity Detection) з метою підвищення ефективності методів подавлення шуму в умовах низького співвідношення сигнал-шум. Наявність акустичних перешкод обмежує використання VAD та погіршує їхню продуктивність. Особливу увагу в роботі приділено методам VAD, що працюють в інтересах систем подавлення шуму, для оцінки шуму в зашумленому мовному повідомленні. Висока ефективність підпросторових методів подавлення шуму, основаних на перетворенні Корунена–Лоева, спонукала пошук простого та надійного VAD. Запропонований у статті метод виявлення голосової активності не вимагає додаткових перетворень та обчислень зашумленого мовлення та полегшує виявлення голосової активності в підпросторових методах подавлення шуму. Як ознака класифікації мовних кадрів під час детектування голосової активності в запропонованому VAD використовується кут нахилу апроксимувальної прямої власних значень. Особливістю реалізації цього підходу є коригований спектр власних значень. За рахунок віднімання з власних значень коваріаційної матриці вхідних даних дисперсії шуму, досягається зменшення енергії шуму в спостереженні. Використання покращеної оцінки дисперсії шуму враховує наявність адитивних компонентів шуму в підпросторі сигналу. Як критерій прийняття рішення в роботі пропонується використання адаптивного порогу, на основі вхідного відношення сигнал-шум. Проведений порівняльний аналіз роботи запропонованого VAD в умовах впливу кольорових шумів в порівнянні з VAD кодеку G.729. Реалізація моделей VAD проводилась в MATLAB та оцінено з використанням об’єктивних параметрів оцінки помилкових рішень в умовах впливу шуму. Подані результати оделювання, вказують на ефективність запропонованого методу за низьких значень відношення сигнал-шум (до 0 дБ). Запропонований метод VAD збільшує точність виявлення мовлення та зменшує кількість помилкових рішень. Проведене дослідження може бути використане для вдосконалення систем подавлення шуму.	uk
dc.description.abstract	The article discusses a method for detecting voice activity with the aim of improving the effectiveness of noise reduction methods in the conditions of low signal-to-noise ratio. The presence of acoustic disturbances limits the use of VAD (Voice Activity Detection) and degrades the performance. Special attention in the study is given to VAD methods that work in the interest of noise reduction systems, for estimating noise in noisy speech signals. The high efficiency of subspace-based noise reduction methods, based on the Karhunen–Loève transform, has prompted the search for a simple and reliable VAD for them. The method proposed in the article for voice activity detection does not require additional transformations of the noisy speech and facilitates the detection of voice activity in subspace-based noise reduction methods. The proposed VAD utilizes the slope angle of the approximating line of the adjusted eigenvalues as the classification feature for speech frame classification during voice activity detection. The implementation of this approach involves an adjustable eigenvalue spectrum. By subtracting the noise variance from the eigenvalues of the input data covariance matrix, the reduction of noise energy in the observation is achieved. The use of the improved estimation of the noise variance takes into account the presence of additive noise components in the signal space. An adaptive threshold based on the input signal-to-noise ratio is proposed as the decision criterion in the study. A comparative analysis of the performance of the proposed VAD under the influence of color noise was conducted compared to the G.729 VAD codec. The implementation of the VAD models was done in MATLAB and evaluated using objective parameters for assessing erroneous decisions in noisy conditions. The presented simulation results indicate the effectiveness of the proposed method at low signal-to-noise ratios (down to 0 dB). The proposed method for voice activity detection increases speech detection accuracy and reduces the number of VAD erroneous decisions. The conducted research can be used to improve noise suppression systems.	en
dc.language.iso	uk_UA	uk_UA
dc.publisher	ВНТУ	uk
dc.relation.ispartof	Вісник Вінницького політехнічного інституту. № 4 : 68-77.	uk
dc.relation.uri	https://visnyk.vntu.edu.ua/index.php/visnyk/article/view/2911
dc.subject	детектор голосової активності	uk
dc.subject	мовний сигнал	uk
dc.subject	власні значення	uk
dc.subject	подавлення шуму	uk
dc.subject	voice activity detector	en
dc.subject	speech signal	en
dc.subject	eigenvalues	en
dc.subject	noise reduction	en
dc.title	Виявлення голосової активності на основі кута нахилу апроксимувальної прямої власних значень	uk
dc.title.alternative	Detection of voice activity based on the angle of the slope of the approximating line of the eigenvalues	en
dc.type	Article
dc.identifier.udc	621.391
dc.relation.references	L. R. Rabiner, and R. W. Schafer, Theory and Applications of Digital Speech Processing, Pearson Education, 2011, 1060 p.	en
dc.relation.references	Y. Hu, and P. Loizou, “Subjective Comparison of Speech Enhancement Algorithms,” in IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006, vol. 1, pp. I-I. https://doi.org/10.1109/ICASSP.2006.1659980 .	en
dc.relation.references	N. Golyandina, and A. Zhigljavsky, Singular spectrum analysis for time series. London: Springer, 2013, 120 p.	en
dc.relation.references	V. Vasylyshyn, “Adaptive Complex Singular Spectrum Analysis with Application to Modern Superresolution Methods,” Data-Centric Business and Applications. Cham, 2020. pp. 35-54. https://doi.org/10.1007/978-3-030-43070-2_3.	en
dc.relation.references	R. Wang, “Karhunen-Loève transform and principal component analysis,” In Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis. Cambridge: Cambridge University Press, 2012, pp. 412-460. https://doi.org/10.1017/cbo9781139015158.011 .	en
dc.relation.references	J. Ramírez, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication, vol. 42, no. 3-4. pp. 271-287, April. 2004. https://doi.org/10.1016/j.specom.2003.10.002 .	en
dc.relation.references	M. Sankar, and S. Arun, “Speech Sound Classification and Estimation of Optimal Order of LPC Using Neural Network,” in The 2nd International Conference on Vision, Image and Signal Processing. ACM, 2018. https://doi.org/10.1145/3271553.3271611 .	en
dc.relation.references	S. Ozaydin, “Design of a Voice Activity Detection Algorithm based on Logarithmic Signal Energy,” in International Conference on Electrical and Computing Technologies and Applications. Ras Al Khaimah, United Arab Emirates, 2022, pp. 19-22. https://doi.org/10.1109/ICECTA57148.2022.9990492 .	en
dc.relation.references	R. Çolak, and R. Akdenіz, “A Novel Voice Activity Detection for Multi-Channel Noise Reduction,” IEEE Access, vol.9. pp. 91017-91026, June. 2021. URL: https://doi.org/10.1109/ACCESS.2021.3086364 .	en
dc.relation.references	K. Yang, L. Zhu, and W. Shan, “Design of an ultra-low Power MFCC Feature Extraction Circuit with Embedded Speech Activity Detector,” in International Conference on Integrated Circuits, Technologies and Applications. IEEE, 2021 pp. 82-83. URL: https://doi.org/10.1109/ICTA53157.2021.9661980 .	en
dc.relation.references	A.Samanta, I.Hatai, and A. Mal, “A Reconfigurable Gaussian Base Normalization Deep Neural Network Design for an Energy-Efficient Voice Activity Detector,” in 2nd International Conference on Communication, Computing and Industry 4.0: conference paper. Bangalore, 2021, pp. 1-6. https://doi.org/10.1109/C2I454156.2021.9689307 .	en
dc.relation.references	S. Abdullah, M. Zamani, and A. Demosthenous, “A Discrete wavelet transform-based voice activity detection and noise classification with sub-band selection,” in International Symposium on Circuits and Systems: conference paper. IEEE, 2021, pp. 1-5. https://doi.org/10.1109/iscas51556.2021.9401647 .	en
dc.relation.references	V. Neo, S. Weiss, S. McKnight, A. Hogg, and P. Naylor, “Polynomial Eigenvalue Decomposition-Based Target Speaker Voice Activity Detection in the Presence of Competing Talkers,” in International Workshop on Acoustic Signal Enhancement: conference paper. IEEE, 2022, pp. 1-5. https://doi.org/10.1109/IWAENC53105.2022.9914796 .	en
dc.relation.references	J. Ghasemi, A. Afzalian, and M.Mollaei, “A Combined Voice Activity Detector Based On Singular Value Decomposition and Fourier Transform,” Signal Processing: An International Journal, vol. 4 (1). pp. 54-61, 2010.	en
dc.relation.references	Y. Dongwen, “Robust Voice Activity Detection Based on Noise Eigenspace,” Acoustical Science and Technology, vol. 28, no. 6. pp. 413-423, June. 2007. https://doi.org/10.1250/ast.28.413 .	en
dc.relation.references	H. Song, S. Ban, and H. Kim, “Voice activity detection using singular value decomposition-based filter,” in Interspeech: conference paper. ISCA, 2009, pp. 2223-2226. https://doi.org/10.21437/Interspeech.2009-632 .	en
dc.relation.references	D. Kim, and J. Chang, “A subspace approach based on embedded prewhitening for voice activity detection,” The Journal of the Acoustical Society of America, vol. 130, no. 5, pp. EL304-EL310, Nov. 2011. https://doi.org/10.1121/1.3638927 .	en
dc.relation.references	V. Vasylyshyn, “DOA estimation based on proximity of the roots of several polynomials of superresolution methods,” Advanced Information Systems, vol. 4, no. 3, pp. 80-84, March. 2020. https://doi.org/10.20998/2522-9052.2020.3.10 .	en
dc.relation.references	P. Stoica, and Y. Selen, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36-47, July, 2004. https://doi.org/10.1109/MSP.2004.1311138 .	en
dc.relation.references	H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control,. vol. 19, no. 6, pp. 716-723, December. 1974. https://doi.org/10.1109/TAC.1974.1100705 .	en
dc.relation.references	V. Vasylyshyn, O. Koval, and K. Vasylyshyn, “Speech Enhancement Using Modified SSA,” in IEEE International Conference on Information and Telecommunication Technologies and Radio Electronics: conference paper. IEEE, 2021, pp. 203- 206. https://doi.org/10.1109/UkrMiCo52950.2021.9716635 .	en
dc.relation.references	В. И. Василишин, «Предварительная обработка сигналов с использованием метода SSA в задачах спектрального анализа,» Прикладная радиоэлектроника, № 13(1), с. 43-50, 2014.	ru
dc.relation.references	R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transaction on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, July, 2001. https://doi.org/10.1109/89.928915 .	en
dc.relation.references	A noisy speech corpus for evaluation of speech enhancement algorithms NOIZEUS. [Electronic resource]. Available: https://ecs.utdallas.edu/loizou/speech/noizeus. Accessed: 06.06.2023 .	en
dc.relation.references	G.729 Voice Activity Detection MATHWORKS. [[Electronic resource]. Available: https://www.mathworks.com/help/dsp/ug/g729-voice-activity-detection.html . Accessed: 06.06.2023.	en
dc.relation.references	D. Freeman, G. Cosier, C. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service,” International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1989, vol. 1, pp. 369-372. https://doi.org/10.1109/ICASSP.1989.266442 .	en
dc.identifier.doi	https://doi.org/10.31649/1997-9266-2023-169-4-68-77

Файли в цьому документі

Ім'я:: ВИЯВЛЕННЯ ГОЛОСОВОЇ.pdf
Розмір:: 519.5Kb
Формат:: PDF

Відкрити

Даний документ включений в наступну(і) колекцію(ї)

Вісник Вінницького політехнічного інституту. 2023. № 4 [12]

Показати скорочену інформацію