Property Clustering in Linked Data

Author(s):  
Saisai Gong ◽  
Wei Hu ◽  
Haoxuan Li ◽  
Yuzhong Qu

Properties are used to describe entities, and a part of them are likely to be clustered together to constitute an aspect. For example, first name, middle name and last name are usually gathered to describe a person's name. However, existing automated approaches to property clustering remain far from satisfactory for an open domain like Linked Data. In this paper, the authors firstly investigated the relatedness between properties using 13 different measures. Then, they employed seven clustering algorithms and two combination methods for property clustering. Based on a sample set of Linked Data, the authors empirically studied property clustering in Linked Data and found that a proper combination of different measures and clustering algorithms gave rise to the best result. Additionally, they reported how property clustering can improve user experience in an entity browsing system.

Author(s):  
JOSEP MARIA BRUNETTI ◽  
ROSA GIL ◽  
JUAN MANUEL GIMENO ◽  
ROBERTO GARCIA

Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.


2012 ◽  
Vol 04 (02) ◽  
pp. 1250023 ◽  
Author(s):  
YI SHI ◽  
MARYAM HASAN ◽  
ZHIPENG CAI ◽  
GUOHUI LIN ◽  
DALE SCHUURMANS

We propose a new bi-clustering algorithm, LinCoh, for finding linear coherent bi-clusters in gene expression microarray data. Our method exploits a robust technique for identifying conditionally correlated genes, combined with an efficient density-based search for clustering sample sets. Experimental results on both synthetic and real datasets demonstrated that LinCoh consistently finds more accurate and higher quality bi-clusters than existing bi-clustering algorithms.


Author(s):  
Jessica Oliveira De Souza ◽  
Jose Eduardo Santarem Segundo

Since the Semantic Web was created in order to improve the current web user experience, the Linked Data is the primary means in which semantic web application is theoretically full, respecting appropriate criteria and requirements. Therefore, the quality of data and information stored on the linked data sets is essential to meet the basic semantic web objectives. Hence, this article aims to describe and present specific dimensions and their related quality issues.


2002 ◽  
Vol 28 (3) ◽  
pp. 245-288 ◽  
Author(s):  
Daniel Gildea ◽  
Daniel Jurafsky

We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as Agent or Patient, or more domain-specific semantic roles, such as Speaker, Message, and Topic. The system is based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers. Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall. Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.


2011 ◽  
Author(s):  
Christina Harrington ◽  
Sharon Joines
Keyword(s):  

Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Sign in / Sign up

Export Citation Format

Share Document