Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics

Janus Wawrzinek; José María González Pinto; Oliver Wiehr; Wolf-Tilo Balke

doi:10.1007/s41019-020-00140-2

Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics

Data Science and Engineering ◽

10.1007/s41019-020-00140-2 ◽

2020 ◽

Vol 5 (4) ◽

pp. 333-345

Author(s):

Janus Wawrzinek ◽

José María González Pinto ◽

Oliver Wiehr ◽

Wolf-Tilo Balke

Keyword(s):

Domain Knowledge ◽

State Of The Art ◽

Drug Repositioning ◽

Hypothesis Generation ◽

Semantic Relations ◽

Automatic Extraction ◽

Biomedical Domain ◽

Important Research ◽

Disease Associations ◽

Active Substances

Abstract State-of-the-art approaches in the field of neural embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis generation and drug repositioning. A core challenge in the biomedical domain is to have interpretable semantics from NEMs that can distinguish, for instance, between the following two situations: (a) drug x induces disease y and (b) drug x treats disease y. However, NEMs alone cannot distinguish between associations such as treats or induces. Is it possible to develop a model to learn a latent representation from the NEMs capable of such disambiguation? To what extent do we need domain knowledge to succeed in the task? In this paper, we answer both questions and show that our proposed approach not only succeeds in the disambiguation task but also advances current growing research efforts to find real predictions using a sophisticated retrospective analysis. Furthermore, we investigate which type of associations is generally better contextualized and therefore probably has a stronger influence in our disambiguation task. In this context, we present an approach to extract an interpretable latent semantic subspace from the original embedding space in which therapeutic drug–disease associations are more likely .

Download Full-text

ID:2047 Drug respositioning by integrating known disease-gene and drug-target associations

Biomedical Research and Therapy ◽

10.15419/bmrat.v4is.281 ◽

2017 ◽

Vol 4 (S) ◽

pp. 76

Author(s):

Duc-Hau Le ◽

Duc-Hau Le

Keyword(s):

Computational Methods ◽

Drug Target ◽

Disease Gene ◽

State Of The Art ◽

Drug Repositioning ◽

Least Square ◽

Data Sources ◽

Chemical Structures ◽

Drug Compounds ◽

Disease Associations

Computational drug repositioning has been proven as a promising and efficient strategy for discovering new uses from existing drugs. To achieve this goal, a number of computational methods have been proposed, which are based on different data sources of drugs, diseases and different approaches. Depending on where the discovery of drug-disease relationships comes from, proposed computational methods can be categorized as either ‘drug-based’ or ‘disease-based’. The proposed methods are usually based on an assumption that similar drugs can be used for similar diseases to identify new indications of drugs. Therefore, similarity between drugs and between diseases is usually used as inputs. In addition, known drug-disease associations are also needed for the methods. It should be noted that these associations are still not well established due to many of marketed drugs have been withdrawn and this could affect to outcome of the methods. In this study, instead of using the known drug-disease associations, we based on known disease-gene and drug-target associations. In addition, similarity between drugs measured by chemical structures of drug compounds and similarity between diseases sharing phenotypes are used. Then, a semi-supervised learning model, Regularized Least Square (RLS), which can exploit these information effectively, is used to find new uses of drugs. Experiment results demonstrate that our method, namely RLSDR, outperforms several state-of-the-art existing methods in terms of area under the ROC curve (AUC). Novel indications for a number of drugs are identified and validated by evidences from different resources

Download Full-text

DDA-SKF: Predicting Drug–Disease Associations Using Similarity Kernel Fusion

Frontiers in Pharmacology ◽

10.3389/fphar.2021.784171 ◽

2022 ◽

Vol 12 ◽

Author(s):

Chu-Qiao Gao ◽

Yuan-Ke Zhou ◽

Xiao-Hong Xin ◽

Hui Min ◽

Pu-Feng Du

Keyword(s):

Computational Model ◽

State Of The Art ◽

Drug Repositioning ◽

Source Code ◽

Orphan Drugs ◽

Kernel Fusion ◽

Disease Associations ◽

Laplacian Regularized Least Squares ◽

Novel Drug ◽

Similarity Information

Drug repositioning provides a promising and efficient strategy to discover potential associations between drugs and diseases. Many systematic computational drug-repositioning methods have been introduced, which are based on various similarities of drugs and diseases. In this work, we proposed a new computational model, DDA-SKF (drug–disease associations prediction using similarity kernels fusion), which can predict novel drug indications by utilizing similarity kernel fusion (SKF) and Laplacian regularized least squares (LapRLS) algorithms. DDA-SKF integrated multiple similarities of drugs and diseases. The prediction performances of DDA-SKF are better, or at least comparable, to all state-of-the-art methods. The DDA-SKF can work without sufficient similarity information between drug indications. This allows us to predict new purpose for orphan drugs. The source code and benchmarking datasets are deposited in a GitHub repository (https://github.com/GCQ2119216031/DDA-SKF).

Download Full-text

An All-Batch Loss for Constructing Prediction Intervals

Applied Sciences ◽

10.3390/app11041728 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1728

Author(s):

Hua Zhong ◽

Li Xu

Keyword(s):

Gradient Descent ◽

State Of The Art ◽

Prediction Interval ◽

Feedforward Neural Networks ◽

Important Research ◽

Likelihood Principle ◽

High Quality ◽

Construction Methods ◽

Important Research Topic ◽

Benchmark Datasets

The prediction interval (PI) is an important research topic in reliability analyses and decision support systems. Data size and computation costs are two of the issues which may hamper the construction of PIs. This paper proposes an all-batch (AB) loss function for constructing high quality PIs. Taking the full advantage of the likelihood principle, the proposed loss makes it possible to train PI generation models using the gradient descent (GD) method for both small and large batches of samples. With the structure of dual feedforward neural networks (FNNs), a high-quality PI generation framework is introduced, which can be adapted to a variety of problems including regression analysis. Numerical experiments were conducted on the benchmark datasets; the results show that higher-quality PIs were achieved using the proposed scheme. Its reliability and stability were also verified in comparison with various state-of-the-art PI construction methods.

Download Full-text

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Natural Language Engineering ◽

10.1017/s1351324920000352 ◽

2020 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Clément Dalloux ◽

Vincent Claveau ◽

Natalia Grabar ◽

Lucas Emanuel Silva Oliveira ◽

Claudia Maria Cabral Moro ◽

...

Keyword(s):

Machine Learning ◽

Information Extraction ◽

State Of The Art ◽

Automatic Detection ◽

Brazilian Portuguese ◽

Supervised Machine Learning ◽

Biomedical Domain ◽

Learning Approaches ◽

Cross Domain ◽

Automatic Methods

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

Download Full-text

Automatic Extraction of Semantic Relations from Wikipedia

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015400102 ◽

2015 ◽

Vol 24 (02) ◽

pp. 1540010 ◽

Cited By ~ 8

Author(s):

Patrick Arnold ◽

Erhard Rahm

Keyword(s):

Finite State Machines ◽

Background Knowledge ◽

Semantic Relations ◽

Automatic Extraction ◽

State Machines ◽

High Quality ◽

Novel Approach ◽

Finite State ◽

Semantic Ontology

We introduce a novel approach to extract semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. These relations are used to build up a large and up-to-date thesaurus providing background knowledge for tasks such as determining semantic ontology mappings. Our automatic approach uses a comprehensive set of semantic patterns, finite state machines and NLP techniques to extract millions of relations between concepts. An evaluation for different domains shows the high quality and effectiveness of the proposed approach. We also illustrate the value of the newly found relations for improving existing ontology mappings.

Download Full-text

Word Embedding for Semantically Relative Words: an Experimental Study

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2018-6-726-733 ◽

2018 ◽

Vol 25 (6) ◽

pp. 726-733

Author(s):

Maria S. Karyaeva ◽

Pavel I. Braslavski ◽

Valery A. Sokolov

Keyword(s):

Experimental Study ◽

Natural Language Processing ◽

Language Processing ◽

Intelligent Systems ◽

Russian Language ◽

Word Embedding ◽

Semantic Relations ◽

Automatic Extraction ◽

Semantic Relationships ◽

The Russian Language

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.

Download Full-text

Turning Informal Thesauri into Formal Ontologies: A Feasibility Study on Biomedical Knowledge Re-Use

Comparative and Functional Genomics ◽

10.1002/cfg.247 ◽

2003 ◽

Vol 4 (1) ◽

pp. 94-97 ◽

Cited By ~ 6

Author(s):

Udo Hahn

Keyword(s):

Domain Knowledge ◽

Large Scale ◽

Description Logics ◽

Biomedical Domain ◽

Biomedical Knowledge ◽

Knowledge Conversion ◽

Formal Knowledge ◽

Knowledge Repositories ◽

Formal Ontologies ◽

Rigorous Description

This paper reports a large-scale knowledge conversion and curation experiment. Biomedical domain knowledge from a semantically weak and shallow terminological resource, the UMLS, is transformed into a rigorous description logics format. This way, the broad coverage of the UMLS is combined with inference mechanisms for consistency and cycle checking. They are the key to proper cleansing of the knowledge directly imported from the UMLS, as well as subsequent updating, maintenance and refinement of large knowledge repositories. The emerging biomedical knowledge base currently comprises more than 240 000 conceptual entities and hence constitutes one of the largest formal knowledge repositories ever built.

Download Full-text

Learning Non-Taxonomic Relations of Ontologies

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2021010105 ◽

2021 ◽

Vol 17 (1) ◽

pp. 97-122

Author(s):

Mohamed Hassan Mohamed Ali ◽

Said Fathalla ◽

Mohamed Kholief ◽

Yasser Fouad Hassan

Keyword(s):

Systematic Review ◽

State Of The Art ◽

Research Work ◽

Semantic Knowledge ◽

Future Research ◽

Automatic Extraction ◽

Ontology Learning ◽

Comprehensive Understanding ◽

Specific Domain ◽

Taxonomic Relations

Ontologies, as semantic knowledge representation, have a crucial role in various information systems. The main pitfall of manually building ontologies is effort and time-consuming. Ontology learning is a key solution. Learning Non-Taxonomic Relationships of Ontologies (LNTRO) is the process of automatic/semi-automatic extraction of all possible relationships between concepts in a specific domain, except the hierarchal relations. Most of the research works focused on the extraction of concepts and taxonomic relations in the ontology learning process. This article presents the results of a systematic review of the state-of-the-art approaches for LNTRO. Sixteen approaches have been described and qualitatively analyzed. The solutions they provide are discussed along with their respective positive and negative aspects. The goal is to provide researchers in this area a comprehensive understanding of the drawbacks of the existing work, thereby encouraging further improvement of the research work in this area. Furthermore, this article proposes a set of recommendations for future research.

Download Full-text

An Explorative Study of Virtual Product Placement

Advances in Multimedia and Interactive Technologies - Online Multimedia Advertising ◽

10.4018/978-1-60960-189-8.ch008 ◽

2011 ◽

pp. 122-147

Author(s):

Chia-Hu Chang ◽

Ja-Ling Wu

Keyword(s):

Domain Knowledge ◽

Design Space ◽

State Of The Art ◽

Product Placement ◽

The State ◽

Explorative Study ◽

Advertising Message ◽

Computational Aesthetics ◽

The Right ◽

Virtual Product

With the aid of content-based multimedia analysis, virtual product placement opens up new opportunities for advertisers to effectively monetize the existing videos in an efficient way. In addition, a number of significant and challenging issues are raising accordingly, such as how to less-intrusively insert the contextually relevant advertising message (what) at the right place (where) and the right time (when) with the attractive representation (how) in the videos. In this chapter, domain knowledge in support of delivering and receiving the advertising message is introduced, such as the advertising theory, psychology and computational aesthetics. We briefly review the state of the art techniques for assisting virtual product placement in videos. In addition, we present a framework to serve the virtual spotlighted advertising (ViSA) for virtual product placement and give an explorative study of it. Moreover, observations about the new trend and possible extension in the design space of virtual product placement will also be stated and discussed. We believe that it would inspire the researchers to develop more interesting and applicable multimedia advertising systems for virtual product placement.

Download Full-text

Computational drug repositioning based on multi-similarities bilinear matrix factorization

Briefings in Bioinformatics ◽

10.1093/bib/bbaa267 ◽

2020 ◽

Author(s):

Mengyun Yang ◽

Gaoyan Wu ◽

Qichang Zhao ◽

Yaohang Li ◽

Jianxin Wang

Keyword(s):

Matrix Factorization ◽

Drug Repositioning ◽

Disease Association ◽

Biological Entity ◽

Biomedical Data ◽

Supplementary Data ◽

Practical Applications ◽

Disease Associations ◽

Association Matrix ◽

Similarity Matrices

Abstract With the development of high-throughput technology and the accumulation of biomedical data, the prior information of biological entity can be calculated from different aspects. Specifically, drug–drug similarities can be measured from target profiles, drug–drug interaction and side effects. Similarly, different methods and data sources to calculate disease ontology can result in multiple measures of pairwise disease similarities. Therefore, in computational drug repositioning, developing a dynamic method to optimize the fusion process of multiple similarities is a crucial and challenging task. In this study, we propose a multi-similarities bilinear matrix factorization (MSBMF) method to predict promising drug-associated indications for existing and novel drugs. Instead of fusing multiple similarities into a single similarity matrix, we concatenate these similarity matrices of drug and disease, respectively. Applying matrix factorization methods, we decompose the drug–disease association matrix into a drug-feature matrix and a disease-feature matrix. At the same time, using these feature matrices as basis, we extract effective latent features representing the drug and disease similarity matrices to infer missing drug–disease associations. Moreover, these two factored matrices are constrained by non-negative factorization to ensure that the completed drug–disease association matrix is biologically interpretable. In addition, we numerically solve the MSBMF model by an efficient alternating direction method of multipliers algorithm. The computational experiment results show that MSBMF obtains higher prediction accuracy than the state-of-the-art drug repositioning methods in cross-validation experiments. Case studies also demonstrate the effectiveness of our proposed method in practical applications. Availability: The data and code of MSBMF are freely available at https://github.com/BioinformaticsCSU/MSBMF. Corresponding author: Jianxin Wang, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China. E-mail: [email protected] Supplementary Data: Supplementary data are available online at https://academic.oup.com/bib.

Download Full-text