Автоматичне видобування знань з екологічних звітів з прив’язкою до часу та до просторових координат масивів вод

Бондалєтов, К. О.; Мокін, В. Б.; Штельмах, І. М.; Слободянюк, О. В.; Bondalietov, K. O.; Mokin, V. B.; Shtelmakh, I. M.; Slobodianiuk, O. V.

dc.contributor.author	Бондалєтов, К. О.	uk
dc.contributor.author	Мокін, В. Б.	uk
dc.contributor.author	Штельмах, І. М.	uk
dc.contributor.author	Слободянюк, О. В.	uk
dc.contributor.author	Bondalietov, K. O.	en
dc.contributor.author	Mokin, V. B.	en
dc.contributor.author	Shtelmakh, I. M.	en
dc.contributor.author	Slobodianiuk, O. V.	en
dc.date.accessioned	2025-09-11T08:04:17Z
dc.date.available	2025-09-11T08:04:17Z
dc.date.issued	2025
dc.identifier.citation	Бондалєтов К. О., Мокін В. Б., Штельмах І. М., Слободянюк О. В. Автоматичне видобування знань з екологічних звітів з прив’язкою до часу та до просторових координат масивів вод // Вісник Вінницького політехнічного інституту. 2025. № 3. С. 101–110.	uk
dc.identifier.issn	1997-9266
dc.identifier.uri	https://ir.lib.vntu.edu.ua//handle/123456789/49077
dc.description.abstract	The paper presents a new method for automatically extracting environmental knowledge from reports and news texts related to facts about the state of river waters or their pollution. Knowledge extraction is carried out taking into account the binding of the obtained facts to the spatial coordinates of specific water bodies and time intervals. The relevance of the work is due to the significant availability of such environmental data in the news, websites of institutions, and social media, and the need for their quick and accurate processing. The proposed method combines the detection of facts about the state of waters or their pollution, recognition of geographical names from the text and headlines, as well as the determination of time features by analyzing the hierarchical structure of the document. The method optimizes the contextual-semantic criterion, which maximizes the completeness and probability of detecting all existing connections between key phrases in the text of facts, time periods and water bodies and, at the same time, minimizes the number of false positive connections between them, by formalizing the connections in the form of “subject–predicate–object” (SPO) triplets and using the Jaccard measure to find the degree of similarity between the lists of key phrases that characterize these facts and water bodies. Knowledge extraction is based on identifying and using the hierarchical structure of the document, using large language models, and actualization the knowledge base with information with Retrieval-Augmented Generation (RAG) for regular knowledge update and binding to the time intervals and spatial coordinates. The result is a structured knowledge base in the form of “fact – water body – time interval” triplets, which can be used to analyze the dynamics of water status, identify trends, and make management decisions to improve the state of surface waters. The result of applying the proposed method is presented using the example of the annual report on the activities of the Southern Booh River Basin Water Resources Management for 2019, which illustrates its efficiency.	en
dc.description.abstract	Запропоновано новий метод автоматичного видобування екологічних знань з текстів звітів та новин про факти щодо стану вод річок чи їхнього забруднення. Видобування знань здійснюється з урахуванням прив’язки отриманих фактів до просторових координат конкретних масивів вод і інтервалів часу. Актуальність роботи зумовлена значною доступністю таких екологічних даних у новинах, веб-сайтах установ та соціальних медіа, необхідністю їхнього швидкого та точного оброблення. Запропонований метод поєднує виявлення фактів про стан вод чи про їх забруднення, розпізнавання географічних назв з тексту та заголовків, а також визначення часових ознак за допомогою аналізу ієрархічної структури документа. Метод оптимізує контекстно-семантичний критерій, який максимізує повноту та ймовірність виявлення усіх наявних зв’язків між ключовими словосполученнями у тексті фактів, періодами часу і масивами вод та, одночасно, мінімізує кількість хибнопозитивних зв’язків між ними, за рахунок формалізації зв’язків у вигляді триплетів “subject–predicate–object” (SPO) та використання міри Жаккара для пошуку ступеня подібності між списками ключових словосполучень, що характеризують ці факти і масиви вод. Видобування знань основано на виявленні і використанні ієрархічної структури документа, використанні великих мовних моделей, на актуалізації бази знань інформацією з використанням методу генерації з доповненням через пошук (RAG) для регулярного оновлення знань та їхньої прив’язки до періоду часу і просторових координат. Результатом є структурована база знань у вигляді триплетів «факт–масив вод–інтервал часу», який може використовуватися для аналізу динаміки стану вод, виявлення тенденцій та ухвалення управлінських рішень щодо поліпшення стану поверхневих вод. Наведено результат застосування запропонованого методу на прикладі річного звіту про діяльність Басейного управління водних ресурсів річки Південний Буг за 2019 рік, який проілюстрував його працездатність.	uk
dc.language.iso	uk_UA	uk_UA
dc.publisher	ВНТУ	uk
dc.relation.ispartof	Вісник Вінницького політехнічного інституту. № 3 : 101–110.	uk
dc.relation.uri	https://visnyk.vntu.edu.ua/index.php/visnyk/article/view/3265
dc.subject	видобування знань	uk
dc.subject	SPO-триплети	uk
dc.subject	штучний інтелект	uk
dc.subject	геоприв’язка даних	uk
dc.subject	масив вод	uk
dc.subject	великі мовні моделі	uk
dc.subject	генерація з доповненим пошуком	uk
dc.subject	knowledge mining	en
dc.subject	SPO-triplets	en
dc.subject	artificial intelligence	en
dc.subject	data georeferencing	en
dc.subject	water array	en
dc.subject	large language models	en
dc.subject	Retrieval-Augmented Generation	en
dc.title	Автоматичне видобування знань з екологічних звітів з прив’язкою до часу та до просторових координат масивів вод	uk
dc.title.alternative	Automatic Knowledge Extraction from Environmental Reports with Reference to Time and Spatial Coordinates of Water Bodies	en
dc.type	Article, professional native edition
dc.type	Article
dc.identifier.udc	004.9+556
dc.relation.references	Верховна Рада України, «ВоднийКодекс України», ПостановаВР No 214/95-ВР від 06.06.95, Відомості Верхов-ної Ради (ВВР), 1995, No 24, ст. 189).[Електронний ресурс]. Режим доступу: http:// zakon2.rada.gov.ua/laws/show /213/95-%D0%B2%D1%80 .	uk
dc.relation.references	КабінетМіністрів України, Водна стратегія України на період до 2050 року. Розпорядження від 9 грудня 2022 р. No 1134-р. [Електронний ресурс]. Режим доступу: https:// zakon.rada.gov.ua/laws/show/1134-2022-%D1%80#Te	uk
dc.relation.references	Водна Рамкова Директива ЄС 2000/60/ЄС. Основні терміни та їх визначення. Київ , Україна, 2006, 240 с. [Елект-ронний ресурс]. Режим доступу: http://dbuwr.com.ua/docs/Waterdirect.pdf	uk
dc.relation.references	J. Zhu, “A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models,” Scientific Data, no.12, p. 734, 2025. [Electronic resource]. Available:https://doi.org/10.1038/s41597-025-05062-	en
dc.relation.references	К. Salmas et al., “Extracting Geographic Knowledge from Large Language Models: An Experiment,” Workshop LM-KBC, 2023, [Electronic resource]. Available: https://lm-kbc.github.io/workshop2023/proceedings/13_Salmas.pdf .	en
dc.relation.references	М. Gritta et al., “What’s missing in geographical parsing?”Springer Nature Link. [Electronic resource]. Available:https://link.springer.com/article/10.1007/s10579-017-9385-8 .	en
dc.relation.references	A. Halterman “Mordecai 3: A Neural Geoparser,” arXiv, 2023, [Electronic resource]. Available:https://arxiv.org/pdf/2303.13675	en
dc.relation.references	Hanwen Zheng, et al., “A Comprehensive Survey on Document-Level Information Extraction,” in Proceedings of the Workshop on the Future of Event Detection (FuturED), 2024, pp. 58-72, USA: Association for Computational Linguistics. [Elec-tronic resource]. Available:https://aclanthology.org/2024.futured-1.6.pd	en
dc.relation.references	Dagdelen, et al., “Structured information extraction from scientific text with large language models,” Nature Commun. no. 15, pp.1418, 2024. [Electronic resource]. Available:https://doi.org/10.1038/s41467-024-45563-x	en
dc.relation.references	В. Б. Мокін, К.О. Бондалєтов, Є.М. Крижановський, і В.О. Караваєв, «Метод аугментації текстів про стан ма-сивів вод на основі інтелектуальної прив’язки до багатозв’язних геоінформаційних систем іменованих сутностей», Віс-ник Вінницького політехнічного інституту, No3, с. 55-65, 2023. https://doi.org/10.31649/1997-9266-2023-168-3-55-65	uk
dc.relation.references	D. Dessí, et al., “CS-KG 2.0: A Large-scale Knowledge Graph of Computer Science,” Scientific Data, no. 12, pp. 964, 2025. [Electronic resource]. Available:https://doi.org/10.1038/s41597-025-05200-8	en
dc.relation.references	Yunyi Zhang, “Automated Mining of Structured Knowledge from Text in the Era of Large Language Models,” inKDD‘24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. [Electronic resource]. Available: https://doi.org/10.1145/3637528.3671469	en
dc.relation.references	Haoran Luo, et al., “Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Con-struction,” Advances in Neural Information Processing Systems 37 (NeurIPS), 2024. [Electronic resource]. Available:https://proceedings.neurips.cc/paper_files/paper/2024/hash/Abstract-Conference.html (date of access: 06.06.2025) .	en
dc.relation.references	R. Bommasani, et al. “On the Opportunities and Risks of Foundation Models,” Computer Science, Machine Learning,2021.[Electronic resource]. Available: https://arxiv.org/abs/2108.07258 .	en
dc.relation.references	К.Бондалєтов, іВ.Мокін, «Інтелектуальна автоматизація геоприв’язки повідомлень з соцмереж до маси-вів вод за допомогою зваженої Jaccard-міри,» ВНТКП ВНТУ. Факультет інтелектуальних інформаційних техноло-гій та автоматизації ВНТУ, Вінниця, 24-27 березня 2025. [Електронний ресурс]. Режим доступу:https://conferences.vntu.edu.ua/index.php/all-fksa/all-fksa-2025/paper/view/23298/19275	uk
dc.relation.references	Річний звіт про діяльність басейнового управління водних ресурсів річки Південний Буг з питань управління вод-ними ресурсами за 2019 рік, Вінниця. Україна: БУВР, 2019	uk
dc.identifier.doi	https://doi.org/10.31649/1997-9266-2025-180-3-101-110

Файли в цьому документі

Ім'я:: 185369.pdf
Розмір:: 514.5Kb
Формат:: PDF

Відкрити

Даний документ включений в наступну(і) колекцію(ї)

Вісник Вінницького політехнічного інституту. 2025. № 3 [23]
Наукові роботи каф. САІТ [450]
статті, матеріали конференцій

Показати скорочену інформацію