Порівняльний аналіз моделей машинного навчання в задачі передбачення вигорання співробітників

Гладіголов, С. С.; Мокін, О. Б.; Hladiholov, S. S.; Mokin, O. B.

dc.contributor.author	Гладіголов, С. С.	uk
dc.contributor.author	Мокін, О. Б.	uk
dc.contributor.author	Hladiholov, S. S.	en
dc.contributor.author	Mokin, O. B.	en
dc.date.accessioned	2023-12-13T20:07:49Z
dc.date.available	2023-12-13T20:07:49Z
dc.date.issued	2023
dc.identifier.citation	Гладіголов С. С. Порівняльний аналіз моделей машинного навчання в задачі передбачення вигорання співробітників [Текст] / C. C. Гладіголов, О. Б. Мокін // Вісник Вінницького політехнічного інституту. – 2023. – № 5. – С. 25–31.	uk
dc.identifier.issn	1997-9266
dc.identifier.uri	http://ir.lib.vntu.edu.ua//handle/123456789/38639
dc.description.abstract	The article explores the problem of predicting the emotional burnout syndrome of employees , which is relevant due to the high level of stress in the modern world. The study uses the publicly available dataset "Are your employees burning out" from the competition on the HackerEarth platform. A comparative analysis of three traditional machine learning models based on classical machine learning approaches (linear regression, Random Forest, XGBoost) and three Bayesian models (Bayesian linear regression, varying intercept model, varying intercept and slope model) was carried out in the study. The change in the quality of the models is studied for different sizes of data sets, ranging from 13,000 (i.e., the full training set, which accounted for 70% of all data) to 25 observations, including testing on the full data set. It is demonstrated that XGBoost is the best model for large data sets. However, when the training sample size is reduced to less than 5000 observations, the validation performance of the XGBoost model becomes significantly less accurate and becomes lower than the corresponding metrics for Bayesian models. After optimizing such hyperparameters as tree depth, number of trees, learning rate, and others, the quality of XGBoost improved significantly, but did not make it stable enough to demonstrate better results than Bayesian models on samples of less than 600 observations. Bayesian models, on the other hand, in addition to being better on small samples, also allow estimating the "confidence" in the predicted values, which is an important feature for a specific tasks. However, they also have a significant disadvantage in the form of much greater computational complexity, which leads to an increase in training time. In conclusion, results of this study emphasize the importance of careful selection of a model that considers the peculiarities of the amount and quality of available data. Bayesian models have proven to be highly effective with a small amount of data, due to their ability to consider uncertainty and insufficient information.	en
dc.description.abstract	Розглянуто задачу передбачення синдрому емоційного вигорання співробітників, актуальність якої пов`язана з високим рівнем стресу в сучасному світі. У дослідженні використано публічний набір даних “Are your employees burning out” зі змагання на платформі HackerEarth. Проведено порівняльний аналіз трьох традиційних моделей машинного навчання, основаних на класичних підходах машинного навчання (лінійна регресія, Random Forest, XGBoost) та трьох баєсових моделей (баєсова лінійна регресія, модель регресії зі змінним вільним членом, модель регресії зі змінним вільним членом та кутовим коефіцієнтом). Досліджено зміну якості моделей на різних розмірах наборів даних, починаючи від 13000 (тобто від повної тренувальної вибірки, яка склала 70 % від всіх даних) до 25 спостережень включно з перевіркою на повному наборі даних. Продемонстровано, що за великих обсягів даних найкращою моделлю є XGBoost. Однак зі зменшенням розміру тренувальної вибірки до менше ніж 5000 спостережень валідаційні показники XGBoost моделі суттєво погіршилися та стали нижчими ніж відповідні значення метрик для баєсових моделей. Після оптимізації таких гіперпараметрів, як глибина дерев, кількість дерев, швидкість навчання та інші, якість XGBoost суттєво покращилась, але не зробила її достатньо стійкою, щоб продемонструвати кращі результати, ніж баєсові моделі на вибірках менше 600 спостережень. Баєсові ж моделі окрім кращої якості на малих вибірках також дозволяють оцінювати «впевненість» у прогнозованих значеннях, що є важливою особливістю для низки задач. Проте, вони мають і значний недолік у вигляді набагато більшої обчислювальної складності, що призводить до збільшення часу навчання. У висновку підкреслено важливість ретельного вибору моделі, яка враховує особливості обсягу та якості наявних даних. Баєсові моделі проявили високу ефективність у разі невеликого обсягу даних, завдяки їхньої здатності враховувати невизначеність та недостатність інформації.	uk
dc.language.iso	uk_UA	uk_UA
dc.publisher	ВНТУ	uk
dc.relation.ispartof	Вісник Вінницького політехнічного інституту. № 5 : 25–31.	uk
dc.relation.uri	https://visnyk.vntu.edu.ua/index.php/visnyk/article/view/2928
dc.subject	машинне навчання	uk
dc.subject	баєсові моделі	uk
dc.subject	синдром вигорання	uk
dc.subject	малі набори даних	uk
dc.subject	machine learning	en
dc.subject	bayesian models	en
dc.subject	burnout syndrome	en
dc.subject	small data sets	en
dc.title	Порівняльний аналіз моделей машинного навчання в задачі передбачення вигорання співробітників	uk
dc.title.alternative	Comparative Analysis of Machine Learning Models for Predicting Employee Burnout Problem	en
dc.type	Article
dc.identifier.udc	004.89:159.944
dc.relation.references	D. A. J. Salvagioni, F. N. Melanda, A. E. Mesas, A. D. González, F. L. Gabani and S. M. de Andrade, “Physical, psychological and occupational consequences of job burnout: A systematic review of prospective studies,” PLOS ONE, no. 12, pp. e0185781, October 2017	en
dc.relation.references	М. С. & І. С., “The Role of the Stress in Development of the Diseases: Array,” Precarpathian Bulletin of the Shevchenko Scientific Society Pulse, pp. 25-32, October 2019.	en
dc.relation.references	М. Гурська, «Я вигорів і боюсь звільнення — що робити? Топові IT-компанії відповіли, як вони реагують на вигоряння у працівників та кандидатів,» DOU.ua, 15.11.2022. [Електронний ресурс]. Режим доступу: https://dou.ua/lenta/articles/emotional-burnout-at-work . Дата звернення: 20.09.2023.	uk
dc.relation.references	“Hacker Earth Machine Learning Challenge: Are your employees burning out?” HackerEarth, 21.10.2021. [Online]. Available: https://www.hackerearth.com/challenges/new/competitive/hackerearth-machine-learning-challenge-predict-burnoutrate. Accessed on: 20.09.2023.	en
dc.relation.references	L. Breiman, “Random Forests,” Machine Learning, no. 45, pp. 5-32, 2001	en
dc.relation.references	T. Chen, and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” в Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016.	en
dc.relation.references	O. Abril-Pla, et al. “PyMC: a modern, and comprehensive probabilistic programming framework in Python,” PeerJ Computer Science, no. 9, pp. e1516, September 2023	en
dc.relation.references	A. Gelma, and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2006.	en
dc.identifier.doi	https://doi.org/10.31649/1997-9266-2023-170-5-25-31

Файли в цьому документі

Ім'я:: 138578.pdf
Розмір:: 576.2Kb
Формат:: PDF

Відкрити

Даний документ включений в наступну(і) колекцію(ї)

Наукові роботи каф. САІТ [429]
статті, матеріали конференцій
Вісник Вінницького політехнічного інституту. 2023. № 5 [12]

Показати скорочену інформацію