Synthetic data generation for Kazakh speech separation and diarization based on the use of neural networks
Автор
Oralbekova, D.
Mamyrbayev, О.
Azarova, L.
Kurmetkan, Т.
Gordiichuk, Н.
Zhumazhan, N.
Sawicki, D.
Азарова, Л. Є.
Дата
2025Metadata
Показати повну інформаціюCollections
- Наукові роботи каф. МЗ [526]
Анотації
This paper explores the impact of various synthetic data generation methods on the performance of speech separation and
diarization models. Three approaches are considered: simple audio track overlay, synthetic dialogue generation, and
acoustic condition modeling. To evaluate their effectiveness, we used Conv-TasNet for speech separation and EENDConformer for diarization, both trained on a 400-hour Kazakh speech corpus. Experiments demonstrated that synthetic data
can significantly enhance model performance when adapting to low-resource languages. The most effective method was
synthetic dialogue generation, yielding results close to those obtained with real data for both speech separation and
diarization. In contrast, acoustic condition modeling showed the highest deviations, indicating the need for further
refinement. The findings confirm the potential of synthetic data for speech processing tasks. The proposed methods can
improve the performance of automatic speech recognition models in scenarios with limited labeled data and challenging
acoustic environments.
URI:
https://ir.lib.vntu.edu.ua//handle/123456789/50414

