Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain

Author(s):  
Jorge Martinez-Gil ◽  
Riad Mokadem ◽  
Franck Morvan ◽  
Josef Küng ◽  
Abdelkader Hameurlain
Semantic Web ◽  
2021 ◽  
pp. 1-20
Author(s):  
Pierre Monnin ◽  
Chedy Raïssi ◽  
Amedeo Napoli ◽  
Adrien Coulet

Knowledge graphs are freely aggregated, published, and edited in the Web of data, and thus may overlap. Hence, a key task resides in aligning (or matching) their content. This task encompasses the identification, within an aggregated knowledge graph, of nodes that are equivalent, more specific, or weakly related. In this article, we propose to match nodes within a knowledge graph by (i) learning node embeddings with Graph Convolutional Networks such that similar nodes have low distances in the embedding space, and (ii) clustering nodes based on their embeddings, in order to suggest alignment relations between nodes of a same cluster. We conducted experiments with this approach on the real world application of aligning knowledge in the field of pharmacogenomics, which motivated our study. We particularly investigated the interplay between domain knowledge and GCN models with the two following focuses. First, we applied inference rules associated with domain knowledge, independently or combined, before learning node embeddings, and we measured the improvements in matching results. Second, while our GCN model is agnostic to the exact alignment relations (e.g., equivalence, weak similarity), we observed that distances in the embedding space are coherent with the “strength” of these different relations (e.g., smaller distances for equivalences), letting us considering clustering and distances in the embedding space as a means to suggest alignment relations in our case study.


2021 ◽  
Vol 17 (3) ◽  
pp. e1008831
Author(s):  
Denis A. Shah ◽  
Erick D. De Wolf ◽  
Pierce A. Paul ◽  
Laurence V. Madden

Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.


2019 ◽  
Vol 35 (19) ◽  
pp. 3628-3634 ◽  
Author(s):  
Soufiane Ajana ◽  
Niyazi Acar ◽  
Lionel Bretillon ◽  
Boris P Hejblum ◽  
Hélène Jacqmin-Gadda ◽  
...  

Abstract Motivation In some prediction analyses, predictors have a natural grouping structure and selecting predictors accounting for this additional information could be more effective for predicting the outcome accurately. Moreover, in a high dimension low sample size framework, obtaining a good predictive model becomes very challenging. The objective of this work was to investigate the benefits of dimension reduction in penalized regression methods, in terms of prediction performance and variable selection consistency, in high dimension low sample size data. Using two real datasets, we compared the performances of lasso, elastic net, group lasso, sparse group lasso, sparse partial least squares (PLS), group PLS and sparse group PLS. Results Considering dimension reduction in penalized regression methods improved the prediction accuracy. The sparse group PLS reached the lowest prediction error while consistently selecting a few predictors from a single group. Availability and implementation R codes for the prediction methods are freely available at https://github.com/SoufianeAjana/Blisar. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Mariam Alaverdian ◽  
William Gilroy ◽  
Veronica Kirgios ◽  
Xia Li ◽  
Carolina Matuk ◽  
...  

2019 ◽  
Vol 59 ◽  
pp. 100486 ◽  
Author(s):  
W.X. Wilcke ◽  
V. de Boer ◽  
M.T.M. de Kleijn ◽  
F.A.H. van Harmelen ◽  
H.J. Scholten

2021 ◽  
pp. 106-124
Author(s):  
Bernardo Alkmim ◽  
Edward Haeusler ◽  
Daniel Schwabe

Sign in / Sign up

Export Citation Format

Share Document