scholarly journals Entity Synonym Discovery via Multipiece Bilateral Context Matching

Author(s):  
Chenwei Zhang ◽  
Yaliang Li ◽  
Nan Du ◽  
Wei Fan ◽  
Philip S. Yu

Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization. Existing works either only utilize entity features, or rely on structured annotations from a single piece of context where the entity is mentioned. To leverage diverse contexts where entities are mentioned, in this paper, we generalize the distributional hypothesis to a multi-context setting and propose a synonym discovery framework that detects entity synonyms from free-text corpora with considerations on effectiveness and robustness. As one of the key components in synonym discovery, we introduce a neural network model SynonymNet to determine whether or not two given entities are synonym with each other. Instead of using entities features, SynonymNet makes use of multiple pieces of contexts in which the entity is mentioned, and compares the context-level similarity via a bilateral matching schema. Experimental results demonstrate that the proposed model is able to detect synonym sets that are not observed during training on both generic and domain-specific datasets: Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in terms of Area Under the Curve and 3.19% in terms of Mean Average Precision compared to the best baseline method.

AI Magazine ◽  
2010 ◽  
Vol 31 (3) ◽  
pp. 93 ◽  
Author(s):  
Stephen Soderland ◽  
Brendan Roof ◽  
Bo Qin ◽  
Shi Xu ◽  
Mausam ◽  
...  

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.


2021 ◽  
Vol 11 (5) ◽  
pp. 2083
Author(s):  
Jia Xie ◽  
Zhu Wang ◽  
Zhiwen Yu ◽  
Bin Guo ◽  
Xingshe Zhou

Ischemic stroke is one of the typical chronic diseases caused by the degeneration of the neural system, which usually leads to great damages to human beings and reduces life quality significantly. Thereby, it is crucial to extract useful predictors from physiological signals, and further diagnose or predict ischemic stroke when there are no apparent symptoms. Specifically, in this study, we put forward a novel prediction method by exploring sleep related features. First, to characterize the pattern of ischemic stroke accurately, we extract a set of effective features from several aspects, including clinical features, fine-grained sleep structure-related features and electroencephalogram-related features. Second, a two-step prediction model is designed, which combines commonly used classifiers and a data filter model together to optimize the prediction result. We evaluate the framework using a real polysomnogram dataset that contains 20 stroke patients and 159 healthy individuals. Experimental results demonstrate that the proposed model can predict stroke events effectively, and the Precision, Recall, Precision Recall Curve and Area Under the Curve are 63%, 85%, 0.773 and 0.919, respectively.


Author(s):  
Guillermo Infante Hernández ◽  
Aquilino A. Juan Fuente ◽  
Benjamín López Pérez ◽  
Edward Rolando Núñez-Valdéz

Software platforms for e-government transactions may differ in developed functionalities, languages and technologies, hardware platforms, and operating systems that support them. Those differences can be found among public organizations that share common processes, services, and regulations. This scenario hinders interoperability between these organizations. Hence, to find a technique for integrating these platforms becomes a necessity. In this chapter, a rule-based domain-specific modeling environment for public services and process integration is suggested, which consists of common identified public service elements and a set of process integration rules. This approach provides the needed integration or interoperability pursued in this domain. Furthermore a service and process model is proposed to formalize the information needed for integration of both. A set of integration rules is also presented as part of the modeling environment. This set of integration rules completes the proposed model to meet the business requirements of this domain.


Rheumatology ◽  
2019 ◽  
Vol 59 (5) ◽  
pp. 1059-1065 ◽  
Author(s):  
Sizheng Steven Zhao ◽  
Chuan Hong ◽  
Tianrun Cai ◽  
Chang Xu ◽  
Jie Huang ◽  
...  

Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Methods An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. Results NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87). Conclusion Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.


2021 ◽  
Vol 38 (2) ◽  
pp. 481-494
Author(s):  
Yurong Guan ◽  
Muhammad Aamir ◽  
Zhihua Hu ◽  
Waheed Ahmed Abro ◽  
Ziaur Rahman ◽  
...  

Object detection in images is an important task in image processing and computer vision. Many approaches are available for object detection. For example, there are numerous algorithms for object positioning and classification in images. However, the current methods perform poorly and lack experimental verification. Thus, it is a fascinating and challenging issue to position and classify image objects. Drawing on the recent advances in image object detection, this paper develops a region-baed efficient network for accurate object detection in images. To improve the overall detection performance, image object detection was treated as a twofold problem, involving object proposal generation and object classification. First, a framework was designed to generate high-quality, class-independent, accurate proposals. Then, these proposals, together with their input images, were imported to our network to learn convolutional features. To boost detection efficiency, the number of proposals was reduced by a network refinement module, leaving only a few eligible candidate proposals. After that, the refined candidate proposals were loaded into the detection module to classify the objects. The proposed model was tested on the test set of the famous PASCAL Visual Object Classes Challenge 2007 (VOC2007). The results clearly demonstrate that our model achieved robust overall detection efficiency over existing approaches using fewer or more proposals, in terms of recall, mean average best overlap (MABO), and mean average precision (mAP).


Author(s):  
Jivan Y. Patil ◽  
Girish P. Potdar

The ability to process, understand and interact in natural language carries high importance for building a Intelligent system, as it will greatly affect the way of communicating with the system. Deep Neural Networks (DNNs) have achieved excellent performance for many of machine learning problems and are widely accepted for applications in the field of computer vision and supervised  learning. Although DNNs work well with availability of large labeled training set, it cannot be used to map complex structures like sentences end-to-end. Existing approaches for conversational modeling are domain specific and require handcrafted rules. This paper proposes a simple approach based on use of neural networks’ recently proposed sequence to sequence framework. The proposed model generates reply by predicting sentence using chained probability for given sentence(s) in conversation. This model is trained end-to-end on large data set. Proposed approach uses Attention to focus text generation on intent of conversation as well as beam search to generate optimum output with some diversity.Primary findings show that model shows common sense reasoning on movie transcript data set.


Author(s):  
Wei Ji ◽  
Xi Li ◽  
Yueting Zhuang ◽  
Omar El Farouk Bourahla ◽  
Yixin Ji ◽  
...  

Clothing segmentation is a challenging vision problem typically implemented within a fine-grained semantic segmentation framework. Different from conventional segmentation, clothing segmentation has some domain-specific properties such as texture richness, diverse appearance variations, non-rigid geometry deformations, and small sample learning. To deal with these points, we propose a semantic locality-aware segmentation model, which adaptively attaches an original clothing image with a semantically similar (e.g., appearance or pose) auxiliary exemplar by search. Through considering the interactions of the clothing image and its exemplar, more intrinsic knowledge about the locality manifold structures of clothing images is discovered to make the learning process of small sample problem more stable and tractable. Furthermore, we present a CNN model based on the deformable convolutions to extract the non-rigid geometry-aware features for clothing images. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art approaches.


2018 ◽  
Vol 16 (2) ◽  
pp. 121-131
Author(s):  
Mongkud KLUNGPORNKUN ◽  
Peerapon VATEEKUL

In text corpora, it is common to categorize each document to a predefined class hierarchy, which is usually a tree. One of the most widely-used approaches is a level-based strategy that induces a multiclass classifier for each class level independently. However, all prior attempts did not utilize information from its parent level and employed a bag of words rather than considered a sequence of words. In this paper, we present a novel level-based hierarchical text categorization with a strategy called “sharing layer information” For each class level, a neural network is constructed, where its input is a sequence of word embedding vectors generated from Convolutional Neural Networks (CNN). Also, a training strategy to avoid imbalance issues is proposed called “the balanced resampling with mini-batch training” Furthermore, a label correction strategy is proposed to conform the predicted results from all networks on different class levels. The experiment was conducted on 2 standard benchmarks: WIPO and Wiki comparing to a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model can achieved the highest accuracy in terms of micro F1 and outperforms the baseline in the top levels in terms of macro F1.


Sign in / Sign up

Export Citation Format

Share Document