GENERATING KNOWLEDGE STRUCTURES FROM OPEN DATASETS' TAGS - AN APPROACH BASED ON FORMAL CONCEPT ANALYSIS
Under influence of data transparency initiatives, a variety of institutions have published a significant number of datasets. In most cases, data publishers take advantage of open data portals (ODPs) for making their datasets publicly available. To improve the datasets' discoverability, open data portals (ODPs) group open datasets into categories using various criteria like publishers, institutions, formats, and descriptions. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata may be missing, or may be incomplete or redundant. Each of these situations makes it difficult for users to find appropriate datasets and obtain the desired information. As the number of available datasets grows, this problem becomes easy to notice. This paper is focused on the first step towards decreasing this problem by implementing knowledge structures to be used in situations where a part of datasets' metadata is missing. In particular, we focus on developing knowledge structures capable of suggesting the best match for the category where an uncategorized dataset should belong to. Our approach relies on dataset descriptions provided by users within dataset tags. We take advantage of a formal concept analysis to reveal the shared conceptualization originating from the tags' usage by developing a concept lattice per each category of open datasets. Since tags represent free text metadata entered by users, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Finally, we will demonstrate the advantage of our proposal by comparing concept lattices generated using formal the concept analysis before and after the optimization process. The main experimental research results will show that our approach is capable of reducing the number of nodes within a lattice more than 40%.