Optimising fuzzy hash function parameters for ensuring compliance with Open Data Regulations
Автор
Maidanevych, L.
Kondratenko, N.
Kazmirevskyi, V.
Майданевич, Л. О.
Дата
2024Metadata
Показати повну інформаціюCollections
- JetIQ [337]
Анотації
The aim of this study was to investigate the parameters of the hash function to enhance the efficiency and
accuracy of detecting similarities in text fragments across various web resources when monitoring compliance with the
requirements of the Regulation on Open Data on official government websites. The research focused on assessing three
key parameters of the hash function: block size, prime number base, and modulus. To achieve this, a series of experiments
was conducted, employing different combinations of these parameters to generate hash values for text data. The results
demonstrated which parameter combinations provide the best balance between accuracy, completeness, F-measure,
and execution time. The study showed that specific parameter configurations enable a significant improvement in
algorithm accuracy while minimising computational costs, which is particularly important for real-time data analysis.
It is established that optimising the parameters of the hash function reduces the occurrence of false positives and false
negatives, which are common issues in similarity detection. In particular, selecting optimal values for each parameter
significantly enhances the accuracy and completeness of the analysis, leading to more precise text fragment comparisons
and reduced execution time. This optimisation makes the fuzzy hashing algorithm well-suited for use in automated
systems that monitor government websites for compliance with open data regulations. Furthermore, the study found that
parameter optimisation decreases the number of duplicate records, which is especially relevant for ensuring that open
data adheres to legislative requirements. The conclusions drawn from this research can be applied to the development of
software tools designed to efficiently identify deficiencies and improve transparency and legal compliance. Additionally,
the findings can contribute to further optimisation of fuzzy hash function algorithms, thereby advancing data monitoring
technologies for regulatory compliance. This study enhances the development of web resource monitoring technologies
by demonstrating how the careful selection of fuzzy hash function parameters can substantially improve the efficiency
and reliability of open data analysis
URI:
https://ir.lib.vntu.edu.ua//handle/123456789/44531