Розробка Telegram-бота для виявлення фейкових новин із використанням NLP і моделей штучного інтелекту

Крилик, Л. В.; Романець, К. В.; Krylik, L. V.; Romanets, K. V.

Автор

Крилик, Л. В.

Романець, К. В.

Krylik, L. V.

Romanets, K. V.

Дата

2026

Metadata

Показати повну інформацію

Collections

JetIQ [255]

Анотації

This article presents the development of a Telegram bot for detecting fake news using NLP and artificial intelligence models. The aim of the development is to expand the functional capabilities of the Telegram bot by combining machine learning-based classification of news text, headline analysis, contextual analysis of results, processing of articles by URL, verification of open sources, and an educational module on media literacy. The developed bot operates on a client-server architecture: interaction with the user takes place via the Telegram interface, whilst the server-side component performs text pre-processing, language detection, translation where necessary, classification and response generation. Python, the aiogram framework, TF-IDF and Bag-of-Words methods, as well as Logistic Regression and Multinomial Naive Bayes models were used for implementation. The Fake and Real News Dataset from the Kaggle platform, containing over 40,000 English-language records, was used to train the models. The Logistic Regression + TF-IDF news classification model with an n-gram range of (1,2) achieved an F1 score of 0.9804, whilst the Multinomial Naive Bayes + Bag-of-Words headline analysis model with an n-gram range of (1,1) achieved an F1 score of 0.9382. A comparison with similar tools has shown that the developed bot extends existing solutions in seven key areas: it supports separate analysis of headlines, processes articles by URL, generates a results profile with indicators of emotionality, risk of manipulation, factual accuracy and the presence of sources, performs a search for similar materials in open sources with a fallback to Google, and includes an educational module, interactive user testing and bilingual interaction with language adaptation. The classification speed is approximately 1.08 ms for news texts and 0.03÷0.04 ms for headlines. The development does not replace professional fact-checking, but serves as a tool for rapid preliminary verification of news content within the Telegram environment, helping users navigate the media landscape and draw attention to potentially dubious reports.

У статті представлено розробку Telegram-бота для виявлення фейкових новин із використанням NLP і моделей штучного інтелекту. Метою розробки є розширення функціональних можливостей Telegram-бота за рахунок поєднання ML-класифікації новинного тексту, аналізу заголовків, профільного аналізу результату, обробки статей за URL, перевірки відкритих джерел та освітнього модуля з медіаграмотності. Розроблений бот має клієнт-серверну логіку роботи: взаємодія з користувачем відбувається через Telegram-інтерфейс, а серверна частина виконує попередню обробку тексту, визначення мови, переклад за потреби, класифікацію та формування відповіді. Для реалізації використано Python, фреймворк aiogram, методи TF-IDF і Bag-of-Words, а також моделі Logistic Regression та Multinomial Naive Bayes. Для навчання моделей застосовано відкритий набір даних Fake and Real News Dataset із платформи Kaggle, що містить понад 40 тис. англомовних записів. Модель класифікації новин Logistic Regression + TF-IDF з діапазоном n-грам (1,2) показала F1=0,9804, а модель аналізу заголовків Multinomial Naive Bayes + Bag-of-Words з діапазоном n-грам (1,1) продемонструвала F1=0,9382. Порівняння з аналогами показало, що розроблений бот розширює наявні рішення за сімома функціональними характеристиками: підтримує окремий аналіз заголовків, обробляє статті за URL, формує профіль результату з показниками емоційності, ризику маніпуляції, конкретності фактів і наявності джерел, виконує пошук схожих матеріалів у відкритих джерелах із fallback-переходом до Google, містить освітній модуль, інтерактивне тестування користувача та двомовну взаємодію з мовною адаптацією. При цьому швидкість класифікації становить близько 1,08 мс для текстів новин і 0,03÷0,04 мс для заголовків. Розробка не замінює професійний фактчекінг, але є інструментом для швидкої попередньої перевірки новинного контенту в середовищі Telegram, допомагаючи користувачу орієнтуватися в медіапросторі та звертати увагу на потенційно сумнівні повідомлення.

URI:

https://ir.lib.vntu.edu.ua//handle/123456789/51974

Відкрити

204403.pdf (3.475Mb)