Інформаційна технологія пошуку ключових слів на основі парсингу англомовних текстів

Яхимович, Олександр Вікторович; Яхимович, Александр Викторович; Yahimovich, O. V.

Автор

Яхимович, Олександр Вікторович

Яхимович, Александр Викторович

Yahimovich, O. V.

Дата

2021

Metadata

Показати повну інформацію

Collections

05.13.06 «Інформаційні технології» [28]

Анотації

Робота присвячена розробці інформаційної технології пошуку ключових слів на основі автоматизації процесів парсингу англомовних текстів. Удосконалено модель пошуку ключових слів, яка, на відміну від існуючих, побудована на основі інформаційної оцінки результатів парсингу тексту та враховує результати аналізу зв’язків між лексичними одиницями тексту, що дозволило формалізувати критерій якості процесу пошуку ключових слів. Уперше розроблено метод пошуку ключових слів, який, на відміну від існуючих, базується на знаходженні синтаксичних зв’язків між словоформами у реченнях англомовного тексту за допомогою технологічних можливостей парсингу сучасних лінгвістичних пакетів. Удосконалено метод зменшення впливу вербального шуму на пошук ключових слів, який, на відміну від існуючих, побудовано на основі зв'язків між лексичними одиницями речення, що дозволило підвищити якість результатів пошуку ключових слів у порівнянні з основним методом. Набула подальшого розвитку інформаційна технологія пошуку ключових слів, яка, на відміну від існуючих, враховує інформацію процесів парсингу речень, що дозволило уточнити чисельні оцінки змістовних параметрів тексту та підвищити якість пошуку його ключових слів

Работа посвящена разработке информационной технологии поиска ключевых слов на основе автоматизации процессов парсинга англоязычных текстов. Усовершенствована модель поиска ключевых слов, которая, в отличие от существующих, базируется на основе информационной оценки результатов парсинга текста и учитывает результаты анализа связей между лексическими единицами текста, что позволило формализовать критерий качества процесса поиска ключевых слов. Впервые разработан метод поиска ключевых слов, который, в отличие от существующих, основан на обработке синтаксических связей между словоформами в предложениях англоязычного текста с помощью технологических возможностей парсинга современных лингвистических пакетов. Усовершенствован метод уменьшения влияния вербального шума на поиск ключевых слов, который, в отличие от существующих, построен на основе связей между лексическими единицами предложения, что позволило повысить качество результатов поиска ключевых слов по сравнению с основным методом. Получила дальнейшее развитие информационная технология поиска ключевых слов, которая, в отличие от существующих, учитывает информацию процессов парсинга предложений, что позволило уточнить численные оценки содержательных параметров текста и повысить качество поиска его ключевых слов.

The scientific novelty of the qualification research paper is: 1. The model of searching keywords has been improved, which, unlike the existing ones, is based on the information evaluation of parsing text results and takes 22 into account the results of analysis of relationships between lexical units of text, which allowed to formalize the quality criterion of searching keywords process. 2. For the first time, searching keywords method has been developed, which, unlike the existing ones, is based on finding syntactic relationships between word forms in sentences of English text with the help of technological capabilities of parsing of modern linguistic packages. The proposed method allows to improve the numerical characteristics of searching keywords quality, namely completeness (according to Jacquard) and accuracy. 3. The method of reducing the impact of verbal noise for searching keywords has been improved, which, unlike the existing ones, is based on the Stanford classification of relationships between lexical units of a sentence, which has improved the quality of results of searching keywords compared to the main method. 4. The information technology of searching keywords has been further developed, which, unlike the existing ones, takes into account additional information of sentence parsing processes within the bounds of the consistent use of the two proposed methods, which allowed to refine numerical estimates of content parameters of the text and improve the quality of searching keywords. The practical value of the results obtained in the qualification research paper is as follows: formal description of the method of searching keywords in the English text, creating an algorithm for its implementation and developing software that finds keywords based on significant relationships between word forms in sentences of the English text and subsequent filtering of verbal noise. Created models, algorithms and software can be used in solving practical problems of computational linguistics, which require searching keywords, for example, to improve the accuracy of site content analysis and raise the position of the site in search results. The use of language-independent tools of the proposed information technology of searching keywords in combination with the needed, according to the obtained specification, technological resources of linguistic analysis of other natural languages will expand the scope of information technology, in particular to use in the Ukrainian language. The results of the qualification research paper were implemented at LLC «SPILNA SPRAVA» (act on the results of implementation from 10.01.2020), as well as to the educational process of the Automation and Intelligent Information Technologies Department of Vinnytsia National Technical University, which is confirmed by the publication of the textbook “A lexical relationships-based keywords selection in an English text”. The results of the experiments showed that the proposed information technology simultaneously increases in the range from 8.1% to 12.7% the completeness according to the Jacquard metric and from 9.1% to 14.3% the absolute accuracy of searching keywords for English texts of 140-1400 words in comparison with analogues

URI:

http://ir.lib.vntu.edu.ua//handle/123456789/33101

Відкрити

Яхимович Олександр Вікторович.pdf (669.2Kb)