WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems

Domain-specific Evaluation Dataset Generator for Multilingual Text Analysis

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201912084 ◽

2019 ◽

pp. 140-147

Author(s):

Emrah Inan ◽

Vahab Mostafapour ◽

Fatif Tekbacak

Keyword(s):

Text Analysis ◽

General Purpose ◽

Entity Linking ◽

Named Entity ◽

Domain Specific ◽

Benchmark Datasets ◽

Concise Information ◽

Multilingual Text ◽

The Given ◽

Specific Evaluation

Web enables to retrieve concise information about specific entities including people, organizations, movies and their features. Additionally, large amount of Web resources generally lies on a unstructured form and it tackles to find critical information for specific entities. Text analysis approaches such as Named Entity Recognizer and Entity Linking aim to identify entities and link them to relevant entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific approaches due to lack of evaluation datasets for specific domains. This study presents WeDGeM that is a multilingual evaluation set generator for specific domains exploiting Wikipedia category pages and DBpedia hierarchy. Also, Wikipedia disambiguation pages are used to adjust the ambiguity level of the generated texts. Based on this generated test data, a use case for well-known Entity Linking systems supporting Turkish texts are evaluated in the movie domain.

Download Full-text

A Domain Specific Entity Linking Approach Consuming Multistore Environment

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805016 ◽

2018 ◽

pp. 46-52

Author(s):

Emrah Inan ◽

Burak Yonyul ◽

Fatih Tekbacak

Keyword(s):

Big Data ◽

Data Models ◽

Use Cases ◽

Unstructured Data ◽

Entity Linking ◽

Domain Specific ◽

Storage Technology ◽

Integrated Environment ◽

Data Environment ◽

The Web

Most of the data on the web is non-structural, and it is required that the data should be transformed into a machine operable structure. Therefore, it is appropriate to convert the unstructured data into a structured form according to the requirements and to store those data in different data models by considering use cases. As requirements and their types increase, it fails using one approach to perform on all. Thus, it is not suitable to use a single storage technology to carry out all storage requirements. Managing stores with various type of schemas in a joint and an integrated manner is named as 'multistore' and 'polystore' in the database literature. In this paper, Entity Linking task is leveraged to transform texts into wellformed data and this data is managed by an integrated environment of different data models. Finally, this integrated big data environment will be queried and be examined by presenting the method.

Download Full-text

Domain-Specific Entity Linking via Fake Named Entity Detection

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-32025-0_7 ◽

2016 ◽

pp. 101-116 ◽

Cited By ~ 5

Author(s):

Jiangtao Zhang ◽

Juanzi Li ◽

Xiao-Li Li ◽

Yao Shi ◽

Junpeng Li ◽

...

Keyword(s):

Entity Linking ◽

Named Entity ◽

Domain Specific ◽

Named Entity Detection

Download Full-text

Shapelet Discovery by Lazy Time Series Classification

Computational Intelligence and Neuroscience ◽

10.1155/2020/1978310 ◽

2020 ◽

Vol 2020 ◽

pp. 1-19

Author(s):

Wei Zhang ◽

Zhihai Wang ◽

Jidong Yuan ◽

Shilei Hao

Keyword(s):

Time Series ◽

Training Dataset ◽

Class Label ◽

Considerable Research ◽

Evaluation Dataset ◽

Differential Ability ◽

Global And Local ◽

Insight Into ◽

Specific Evaluation ◽

Feature Frequency

As a representation of discriminative features, the time series shapelet has recently received considerable research interest. However, most shapelet-based classification models evaluate the differential ability of the shapelet on the whole training dataset, neglecting characteristic information contained in each instance to be classified and the classwise feature frequency information. Hence, the computational complexity of feature extraction is high, and the interpretability is inadequate. To this end, the efficiency of shapelet discovery is improved through a lazy strategy fusing global and local similarities. In the prediction process, the strategy learns a specific evaluation dataset for each instance, and then the captured characteristics are directly used to progressively reduce the uncertainty of the predicted class label. Moreover, a shapelet coverage score is defined to calculate the discriminability of each time stamp for different classes. The experimental results show that the proposed method is competitive with the benchmark methods and provides insight into the discriminative features of each time series and each type in the data.

Download Full-text

SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2017.2730862 ◽

2018 ◽

Vol 30 (2) ◽

pp. 353-366 ◽

Cited By ~ 7

Author(s):

Wei Shen ◽

Jiawei Han ◽

Jianyong Wang ◽

Xiaojie Yuan ◽

Zhenglu Yang

Keyword(s):

General Framework ◽

Information Networks ◽

Entity Linking ◽

Heterogeneous Information ◽

Domain Specific ◽

Heterogeneous Information Networks

Download Full-text

A Sequence Learning Method for Domain-Specific Entity Linking

10.18653/v1/w18-2403 ◽

2018 ◽

Cited By ~ 1

Author(s):

Emrah Inan ◽

Oguz Dikenelli

Keyword(s):

Sequence Learning ◽

Learning Method ◽

Entity Linking ◽

Domain Specific

Download Full-text

The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval

Cross-Language Information Retrieval and Evaluation - Lecture Notes in Computer Science ◽

10.1007/3-540-44645-1_5 ◽

2001 ◽

pp. 48-56 ◽

Cited By ~ 20

Author(s):

Michael Kluck ◽

Fredric C. Gey

Keyword(s):

Information Retrieval ◽

Specific Task ◽

Domain Specific ◽

Cross Language Information Retrieval ◽

Evaluation Strategies ◽

Cross Language ◽

Specific Evaluation

Download Full-text

Domain specific evaluation during the design of human-computer interfaces

IFIP Advances in Information and Communication Technology - Domain Knowledge for Interactive System Design ◽

10.1007/978-0-387-35059-2_13 ◽

1996 ◽

pp. 181-193

Author(s):

Magnus Lif ◽

Bengt Sandblad

Keyword(s):

Domain Specific ◽

Computer Interfaces ◽

Human Computer Interfaces ◽

Specific Evaluation

Download Full-text

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking

BMC Bioinformatics ◽

10.1186/s12859-019-3157-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Andre Lamurias ◽

Pedro Ruas ◽

Francisco M. Couto

Keyword(s):

Semantic Similarity ◽

Similarity Measures ◽

Chemical Compounds ◽

Biomedical Literature ◽

Training Data ◽

Local Similarity ◽

Entity Linking ◽

Personalized Pagerank ◽

Domain Specific ◽

Wide Range

Abstract Background Biomedical literature concerns a wide range of concepts, requiring controlled vocabularies to maintain a consistent terminology across different research groups. However, as new concepts are introduced, biomedical literature is prone to ambiguity, specifically in fields that are advancing more rapidly, for example, drug design and development. Entity linking is a text mining task that aims at linking entities mentioned in the literature to concepts in a knowledge base. For example, entity linking can help finding all documents that mention the same concept and improve relation extraction methods. Existing approaches focus on the local similarity of each entity and the global coherence of all entities in a document, but do not take into account the semantics of the domain. Results We propose a method, PPR-SSM, to link entities found in documents to concepts from domain-specific ontologies. Our method is based on Personalized PageRank (PPR), using the relations of the ontology to generate a graph of candidate concepts for the mentioned entities. We demonstrate how the knowledge encoded in a domain-specific ontology can be used to calculate the coherence of a set of candidate concepts, improving the accuracy of entity linking. Furthermore, we explore weighting the edges between candidate concepts using semantic similarity measures (SSM). We show how PPR-SSM can be used to effectively link named entities to biomedical ontologies, namely chemical compounds, phenotypes, and gene-product localization and processes. Conclusions We demonstrated that PPR-SSM outperforms state-of-the-art entity linking methods in four distinct gold standards, by taking advantage of the semantic information contained in ontologies. Moreover, PPR-SSM is a graph-based method that does not require training data. Our method improved the entity linking accuracy of chemical compounds by 0.1385 when compared to a method that does not use SSMs.

Download Full-text

Survey on entity linking for domain specific with heterogeneous information networks

Informatologia ◽

10.32914/i.52.3-4.5 ◽

2019 ◽

Vol 52 (3-4) ◽

pp. 173-184

Author(s):

S. Mythrei ◽

S. Singaravelan

Keyword(s):

Deep Understanding ◽

Information Networks ◽

Information Collection ◽

Entity Linking ◽

Heterogeneous Information ◽

Social Media Networks ◽

Domain Specific ◽

Heterogeneous Information Networks ◽

Unique Identity ◽

Data Objects

Entity linking is a task of extracting information that links the mentioned entity in a collection of text with their similar knowledge base as well as it is the task of allocating unique identity to various entities such as locations, individuals and companies. Knowledgebase (KB) is used to optimize the information collection, organization and for retrieval of information. Heterogeneous information networks (HIN) comprises multiple-type interlinked objects with various types of relationship which are becoming increasingly most popular named bibliographic networks, social media networks as well including the typical relational database data. In HIN, there are various data objects are interconnected through various relations. The entity linkage determines the corresponding entities from unstructured web text, in the existing HIN. This work is the most important and it is the most challenge because of ambiguity and existing limited knowledge. Some HIN could be considered as a domain-specific KB. The current Entity Linking (EL) systems aimed towards corpora which contain heterogeneous as web information and it performs sub-optimally on the domain-specific corpora. The EL systems used one or more general or specific domains of linking such as DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet and MKB. This paper presents a survey on domain-specific entity linking with HIN. This survey describes with a deep understanding of HIN, which includes datasets,types and examples with related concepts.

Download Full-text