scholarly journals Extracting Entity Synonymous Relations via Context-Aware Permutation Invariance

Author(s):  
Nan Yan ◽  
Subin Huang ◽  
Chao Kong

Discovering entity synonymous relations is an important work for many entity-based applications. Existing entity synonymous relation extraction approaches are mainly based on lexical patterns or distributional corpus-level statistics, ignoring the context semantics between entities. For example, the contexts around ''apple'' determine whether ''apple'' is a kind of fruit or Apple Inc. In this paper, an entity synonymous relation extraction approach is proposed using context-aware permutation invariance. Specifically, a triplet network is used to obtain the permutation invariance between the entities to learn whether two given entities possess synonymous relation. To track more synonymous features, the relational context semantics and entity representations are integrated into the triplet network, which can improve the performance of extracting entity synonymous relations. The proposed approach is implemented on three real-world datasets. Experimental results demonstrate that the approach performs better than the other compared approaches on entity synonymous relation extraction task.

Author(s):  
Xiaocheng Feng ◽  
Jiang Guo ◽  
Bing Qin ◽  
Ting Liu ◽  
Yongjie Liu

Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.


2018 ◽  
Vol 189 ◽  
pp. 03008
Author(s):  
Xiaoshuang Qiao ◽  
Hui Wang ◽  
Gongde Guo ◽  
Yuanyuan Liu

This paper explores a new ensemble approach called Ensemble Probability Distribution Novelty Detection (EPDND) for novelty detection. The proposed ensemble approach provides a metric to characterize different classes. Experimental results on 4 real-world datasets show that EPDND exhibits competitive overall performance to the other two common novelty detection approaches - Support Vector Domain Description and Gaussian Mixed Models in terms of accuracy, recall and F1 scores in many cases.


Author(s):  
Abbas Keramati ◽  
Niloofar Yousefi ◽  
Amin Omidvar

Credit scoring has become a very important issue due to the recent growth of the credit industry. As the first objective, this chapter provides an academic database of literature between and proposes a classification scheme to classify the articles. The second objective of this chapter is to suggest the employing of the Optimally Weighted Fuzzy K-Nearest Neighbor (OWFKNN) algorithm for credit scoring. To show the performance of this method, two real world datasets from UCI database are used. In classification task, the empirical results demonstrate that the OWFKNN outperforms the conventional KNN and fuzzy KNN methods and also other methods. In the predictive accuracy of probability of default, the OWFKNN also show the best performance among the other methods. The results in this chapter suggest that the OWFKNN approach is mostly effective in estimating default probabilities and is a promising method to the fields of classification.


2021 ◽  
Vol 17 (3) ◽  
pp. 30-49
Author(s):  
Sharon Moses J. ◽  
Dhinesh Babu L. D.

The advancement of web services paved the way to the accumulation of a tremendous amount of information into the world wide web. The huge pile of information makes it hard for the user to get the required information at the right time. Therefore, to get the right item, recommender systems are emphasized. Recommender algorithms generally act on the user information to render recommendations. In this scenario, when a new user enters the system, it fails in rendering recommendation due to unavailability of user information, resulting in a new user problem. So, in this paper, a movie recommender algorithm is constructed to address the prevailing new user cold start problem by utilizing only movie genres. Unlike other techniques, in the proposed work, familiarity of each movie genre is considered to compute the genre significance value. Based on genre significance value, genre similarity is correlated to render recommendations to a new user. The evaluation of the proposed recommender algorithm on real-world datasets shows that the algorithm performs better than the other similar approaches.


2021 ◽  
Vol 25 (6) ◽  
pp. 1349-1368
Author(s):  
Chung-Chian Hsu ◽  
Wei-Cyun Tsao ◽  
Arthur Chang ◽  
Chuan-Yu Chang

Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In order to properly analyze mixed-type data, dedicated methods to handle categorical values in the datasets are needed. The limitation of most existing methods is lack of appropriate numeric representations of categorical values. Consequently, some of analysis algorithms cannot be applied. In this paper, we address this deficiency by transforming categorical values to their numeric representation so as to facilitate various analyses of mixed-type data. In particular, the proposed transformation method preserves semantics of categorical values with respect to the other values in the dataset, resulting in better performance on data analyses including classification and clustering. The proposed method is verified and compared with other methods on extensive real-world datasets.


2021 ◽  
Vol 10 (5) ◽  
pp. 336
Author(s):  
Jian Yu ◽  
Meng Zhou ◽  
Xin Wang ◽  
Guoliang Pu ◽  
Chengqi Cheng ◽  
...  

Forecasting the motion of surrounding vehicles is necessary for an autonomous driving system applied in complex traffic. Trajectory prediction helps vehicles make more sensible decisions, which provides vehicles with foresight. However, traditional models consider the trajectory prediction as a simple sequence prediction task. The ignorance of inter-vehicle interaction and environment influence degrades these models in real-world datasets. To address this issue, we propose a novel Dynamic and Static Context-aware Attention Network named DSCAN in this paper. The DSCAN utilizes an attention mechanism to dynamically decide which surrounding vehicles are more important at the moment. We also equip the DSCAN with a constraint network to consider the static environment information. We conducted a series of experiments on a real-world dataset, and the experimental results demonstrated the effectiveness of our model. Moreover, the present study suggests that the attention mechanism and static constraints enhance the prediction results.


Author(s):  
Masoud Hamedani ◽  
Sang-Wook Kim

In this paper, we propose SimAndro-Plus as an improved variant of the state-of-the-art method, SimAndro, to compute the similarity of Android applications (apps) regarding their functionalities. SimAndro-Plus has two major differences with SimAndro: 1) it exploits two beneficial features to similarity computation, which are totally disregarded by SimAndro; 2) to compute the similarity score of an app-pair based on strings and package name features, SimAndro-Plus considers not only those terms co-appearing in both apps but also considers those terms appearing in one app while missing in the other one. The results of our extensive ex periments with three real-world datasets and a dataset constructed by human experts demonstrate that 1) each of the two aforementioned differences is really effective to achieve better accuracy and 2) SimAndro-Plus outperforms SimAndro in similarity computation by 14% in average.


Author(s):  
Duc-Trong Le ◽  
Hady W. Lauw ◽  
Yuan Fang

Our interactions with an application frequently leave a heterogeneous and contemporaneous trail of actions and adoptions (e.g., clicks, bookmarks, purchases). Given a sequence of a particular type (e.g., purchases)-- referred to as the target sequence, we seek to predict the next item expected to appear beyond this sequence. This task is known as next-item recommendation. We hypothesize two means for improvement. First, within each time step, a user may interact with multiple items (a basket), with potential latent associations among them. Second, predicting the next item in the target sequence may be helped by also learning from another supporting sequence (e.g., clicks). We develop three twin network structures modeling the generation of both target and support basket sequences. One based on "Siamese networks" facilitates full sharing of parameters between the two sequence types. The other two based on "fraternal networks" facilitate partial sharing of parameters. Experiments on real-world datasets show significant improvements upon baselines relying on one sequence type.


2018 ◽  
pp. 1838-1874
Author(s):  
Abbas Keramati ◽  
Niloofar Yousefi ◽  
Amin Omidvar

Credit scoring has become a very important issue due to the recent growth of the credit industry. As the first objective, this chapter provides an academic database of literature between and proposes a classification scheme to classify the articles. The second objective of this chapter is to suggest the employing of the Optimally Weighted Fuzzy K-Nearest Neighbor (OWFKNN) algorithm for credit scoring. To show the performance of this method, two real world datasets from UCI database are used. In classification task, the empirical results demonstrate that the OWFKNN outperforms the conventional KNN and fuzzy KNN methods and also other methods. In the predictive accuracy of probability of default, the OWFKNN also show the best performance among the other methods. The results in this chapter suggest that the OWFKNN approach is mostly effective in estimating default probabilities and is a promising method to the fields of classification.


2021 ◽  
Author(s):  
Mandana Saebi ◽  
Bozhao Nan ◽  
John Herr ◽  
Jessica Wahlers ◽  
Zhichun Guo ◽  
...  

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as good or better than the best previous models on two HTE datasets for the Suzuki and Buchwald-Hartwig reactions. However, training of the AGNN on the ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.


Sign in / Sign up

Export Citation Format

Share Document