Genetic-Fuzzy Programming Based Linkage Rule Miner (GFPLR-Miner) for Entity Linking in Semantic Web

2018 ◽  
Vol 14 (3) ◽  
pp. 134-166 ◽  
Author(s):  
Amit Singh ◽  
Aditi Sharan

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.

Author(s):  
Amit Singh ◽  
Aditi Sharan

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.


1995 ◽  
Vol 04 (03) ◽  
pp. 413-432 ◽  
Author(s):  
NICHOLAS MARCHALLECK ◽  
ABRAHAM KANDEL

The purpose of this paper is to provide a survey of state of the art fuzzy logic applications in the field of transportation, illustrating the usefulness, and the promising future of the fuzzy approach. The majority of the discussion covers the area of fuzzy control. A wide range of Fuzzy Logic Controllers (FLCs) is discussed, ranging from traffic, to aircraft controllers. Although the majority of applications are to surface transportation, surveys of several aerospace applications are also given.


Author(s):  
Mamadou Tadiou Kone

This chapter proposes a state-of-the-art survey on the emerging field of Semantic Organizational Knowledge. This concept refers to the technologies of the Semantic Web and Linked Data applied to the principles and procedures of organizational knowledge. Originally, organizational Knowledge is described as the ability of employees of an organization to exercise judgment based on the history and collective understanding of a particular context. Researchers have identified the existence of several types of knowledge in organized contexts including explicit knowledge, tacit knowledge, cultural knowledge, and embedded knowledge. Along these lines, a number of issues must be addressed in order to apply Semantic Web and Linked Data technologies. The main objective of this chapter is to demonstrate that there exists substantial research that supports the use of the Semantic Web or Linked Data technologies to effectively support all aspects of knowledge creation, sharing, distribution, and acquisition.


Author(s):  
Lehireche Nesrine ◽  
Malki Mimoun ◽  
Lehireche Ahmed ◽  
Reda Mohamed Hamou

The purpose of the semantic web goes well beyond a simple provision of raw data: it is a matter of linking data together. This data meshing approach, called linked data (LD), refers to a set of best practices for publishing and interlinking data on the web. Due to its principles, a new context appeared called linked enterprise data (LED). The LED is the application of linked data to the information system of the enterprise to answer all the challenge of an IS, in order to have an agile and performing System. Where internal data sources link to external data, with easy access to information in performing time. This article focuses on using the LED to support the challenges of database integration and state-of-the-art for mapping RDB to RDF based on LD. Then, the authors introduce a proposition for on demand extract transform load (ETL) of RDB to RDF mapping using algorithms. Finally, the authors present a conclusion and discussion for their perspectives to implement the solution.


Author(s):  
Seán O’Riain ◽  
Andreas Harth ◽  
Edward Curry

With increased dependence on efficient use and inclusion of diverse corporate and Web based data sources for business information analysis, financial information providers will increasingly need agile information integration capabilities. Linked Data is a set of technologies and best practices that provide such a level of agility for information integration, access, and use. Current approaches struggle to cope with multiple data sources inclusion in near real-time, and have looked to Semantic Web technologies for assistance with infrastructure access, and dealing with multiple data formats and their vocabularies. This chapter discusses the challenges of financial data integration, provides the component architecture of Web enabled financial data integration and outlines the emergence of a financial ecosystem, based upon existing Web standards usage. Introductions to Semantic Web technologies are given, and the chapter supports this with insight and discussion gathered from multiple financial services use case implementations. Finally, best practice for integrating Web data based on the Linked Data principles and emergent areas are described.


Author(s):  
Andra Waagmeester ◽  
Paul Braun ◽  
Manoj Karingamadathil ◽  
Jose Emilio Labra Gayo ◽  
Siobhan Leachman ◽  
...  

Moths form a diverse group of species that are predominantly active at night. They are colourful, have an ecological role, but are less well described compared to their closest relatives, the butterflies. Much remains to be understood about moths, which is shown by the many issues within their taxonomy, including being a paraphyletic group and the inability to clearly distinguish them from butterflies (Fig. 1). We present the Wikimedia architecture as a hub of knowledge on moths. This ecosystem consists of 312 language editions of Wikipedia and sister projects such as Wikimedia commons (a multimedia repository), and Wikidata (a public knowledge graph). Through Wikidata, external data repositories can be integrated into this knowledge landscape on moths. Wikidata contains links to (open) data repositories on biodiversity like iNaturalist, Global Biodiversity Information Facility (GBIF) and the Biodiversity Heritage Library (BHL) which in return contain detailed content like species occurrence data, images or publications on moths. We present a workflow that integrates crowd-sourced information and images from iNaturalist, with content from GBIF and BHL into the different language editions of Wikipedia. The Wikipedia articles in turn feed information to other sources. Taxon pages on iNaturalist, for example, have an "About" tab, which is fed by the Wikipedia article describing the respective taxon, where the current language of the (iNaturalist) interface fetches the appropriate language version from Wikipedia. This is a nice example of data reuse, which is one of the pillars of FAIR (Findable, Accessible, Interoperable and Reusable) (Wilkinson et al. 2016). Wikidata provides the linked data hub in this flow of knowledge. Since Wikidata is available in RDF, it aligns well with the data model of the semantic web. This allows rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. Wikidata is includes information on all sorts of things (e.g., people, species, locations, events). Which is why it can structure data in a multitude of ways, thus leading to 9000+ properties. Because all those different domains and communities use the same source for different things it is important to have good structure and documentation for a specific topic so we and others can interpret the data. We present a schema that describes data about moth taxa on Wikidata. Since 2019, Wikidata has an EntitySchema namespace that allows contributors to specify applicable linked-data schemas. The schemas are expressed using Shape Expressions (ShEx) (Thornton et al. 2019), which is a formal modelling language for RDF, one of the data formats used on the Semantic Web. Since Wikidata is also rendered as RDF, it is possible to use ShEx to describe data models and user expectations in Wikidata (Waagmeester et al. 2021). These schemas can then be used to verify if a subset of Wikidata conforms to an expected or described data model. Starting from a document that describes an expected schema on moths, we have developed an EntitySchema (E321) for moths in Wikidata. This schema provides unambiguous guidance for contributors who have data they are not sure how to model. For example, a user with data about a particular species of moth may be working from a scientific article that states that the species is only found in New Zealand, and may be unsure of how to model that fact as a statement in Wikidata. After consulting Schema E321, the user will find out about Property P183 “endemic_to” and then use that property to state that the species is endemic to New Zealand. As more contributors follow the data model expressed in schema E321, there will be structural consistency across items for moths in Wikidata. This reduces the risk of contributors using different combinations of properties and qualifiers to express the same meaning. If a contributor needs to express something that is not yet represented in Schema E321 they can extend the schema itself, as each schema can be edited. The multilingual affordances of the Wikidata platform allow users to edit in over 300 languages. In this way, contributors edit in their preferred language and see the structure of the data as well as the schemas in their language of choice. This broadens the range of people who can contribute to these data models and reduces the dominance of English. There are approximately 160K+ estimated moth species. This number is equal to the number of moths described in iNaturalist, while Wikidata contains 220K items on moths. As the biggest language edition, the English Wikipedia contains 65K moth articles; other language editions contain far fewer Wikipedia articles. The higher number of items on moths in Wikidata can be partly explained by Wikidata taxon synonyms being treated as distinct taxa. Wikidata, as a proxy of knowledge on moths, is instrumental in getting them better described in Wikipedia and other (FAIR) sources. While in return, curation in Wikidata happens by a large community. This approach to data modelling has the advantage of allowing multilingual collaboration and iterative extension and improvement over time.


2020 ◽  
Vol 34 (05) ◽  
pp. 8576-8583 ◽  
Author(s):  
Yasumasa Onoe ◽  
Greg Durrett

Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a “domain” is characterized not just by genre of text but even by factors as specific as the particular distribution of entities, as neural models tend to overfit by memorizing properties of frequent entities in a dataset. We tackle the problem of building robust entity linking models that generalize effectively and do not rely on labeled entity linking data with a specific entity distribution. Rather than predicting entities directly, our approach models fine-grained entity properties, which can help disambiguate between even closely related entities. We derive a large inventory of types (tens of thousands) from Wikipedia categories, and use hyperlinked mentions in Wikipedia to distantly label data and train an entity typing model. At test time, we classify a mention with this typing model and use soft type predictions to link the mention to the most similar candidate entity. We evaluate our entity linking system on the CoNLL-YAGO dataset (Hoffart et al. 2011) and show that our approach outperforms prior domain-independent entity linking systems. We also test our approach in a harder setting derived from the WikilinksNED dataset (Eshel et al. 2017) where all the mention-entity pairs are unseen during test time. Results indicate that our approach generalizes better than a state-of-the-art neural model on the dataset.


Author(s):  
D. Ulutaş Karakol ◽  
G. Kara ◽  
C. Yılmaz ◽  
Ç. Cömert

<p><strong>Abstract.</strong> Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e.g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc.</p>


Semantic Web ◽  
2021 ◽  
pp. 1-16
Author(s):  
Esko Ikkala ◽  
Eero Hyvönen ◽  
Heikki Rantala ◽  
Mikko Koho

This paper presents a new software framework, Sampo-UI, for developing user interfaces for semantic portals. The goal is to provide the end-user with multiple application perspectives to Linked Data knowledge graphs, and a two-step usage cycle based on faceted search combined with ready-to-use tooling for data analysis. For the software developer, the Sampo-UI framework makes it possible to create highly customizable, user-friendly, and responsive user interfaces using current state-of-the-art JavaScript libraries and data from SPARQL endpoints, while saving substantial coding effort. Sampo-UI is published on GitHub under the open MIT License and has been utilized in several internal and external projects. The framework has been used thus far in creating six published and five forth-coming portals, mostly related to the Cultural Heritage domain, that have had tens of thousands of end-users on the Web.


Sign in / Sign up

Export Citation Format

Share Document