Integration of hidden markov models in the automated speaker recognition system for critical use
Author
Kovtun, V.
Yukhimchuk, M.
Ковтун, В. В.
Юхимчук, М. С.
Date
2019Metadata
Show full item recordCollections
- Наукові роботи каф. КСУ [197]
Abstract
In this article, the author theoretically substantiated the possibility of integration of hidden Markovian models (IHMM) in the structure of the automated speaker recognition system of critical use (ASRSCU) for analysis of speech information from a plurality of independent input channels, which allowed within the statistical conception of pattern recognition to combine the accuracy of the approximation of input signals inherent the apparatus of models of Gaussian mixtures, the effectiveness of the presentation of individual features of speech due to the possibilities of the hidden Markovsky odeley and robastist, which is a key characteristic of critical recognition by copyright method of integrating fuel receiving input information independent input channels. The author proposed a mathematical apparatus for the integration of hidden Markovian models, which allkows us to adequately describe the set of interacting processes in the Markov paradigm with the preservation of temporal, asymmetric conditional probabilities between the chains. Also, a method for adapting the Viterbi algorithm for learning the IHMM, which permits, at each iteration of the learning process, to overestimate the output parameters instead of providing the variability of the reliability of the statistical parameters used, choosing the desired version of the learning process and performing the procedures of direct-inverse analysis and analysis of the Viterbi in times faster, since the bulk The computation involves the calculation of multidimensional Gaussian. The proposed scientific results are tested empirically, during experiments on the recognition of speakers in conditions of the variable level and the type of noise in speech signals received from two narrow-gauged microphones located at a distance of 2 and 50 cm from the source of the speech signal. The use of the IHMM in the ASRSCU has made it possible to achieve a consistently high quality speech recognition even at an increase in signal / noise ratios (ISNR). It is interesting to find the result of recognition when using information exclusively from the first channel, where the dependence of the recognition quality on the ISNR is not followed. This can be explained by the fact that the first narrow-gauge microphone is located at a distance of 2 cm from the source of the speech signal, which limits the presence of ambient noise in the phonogram, but this location causes reverb for the explosive segments of the speech signal, which leads to a decrease in the quality of speech recognition speaker for the investigated configuration of the space of signs. It should be noted that the IHMM detected less sensitivity to the operating conditions of the recognition system, which allows recommending their use as part of the automated speaker recognition system of critical use.
URI:
http://ir.lib.vntu.edu.ua//handle/123456789/37581