Biomedical data and computational models for drug repositioning: a comprehensive review

Author(s):  
Huimin Luo ◽  
Min Li ◽  
Mengyun Yang ◽  
Fang-Xiang Wu ◽  
Yaohang Li ◽  
...  

Abstract Drug repositioning can drastically decrease the cost and duration taken by traditional drug research and development while avoiding the occurrence of unforeseen adverse events. With the rapid advancement of high-throughput technologies and the explosion of various biological data and medical data, computational drug repositioning methods have been appealing and powerful techniques to systematically identify potential drug-target interactions and drug-disease interactions. In this review, we first summarize the available biomedical data and public databases related to drugs, diseases and targets. Then, we discuss existing drug repositioning approaches and group them based on their underlying computational models consisting of classical machine learning, network propagation, matrix factorization and completion, and deep learning based models. We also comprehensively analyze common standard data sets and evaluation metrics used in drug repositioning, and give a brief comparison of various prediction methods on the gold standard data sets. Finally, we conclude our review with a brief discussion on challenges in computational drug repositioning, which includes the problem of reducing the noise and incompleteness of biomedical data, the ensemble of various computation drug repositioning methods, the importance of designing reliable negative samples selection methods, new techniques dealing with the data sparseness problem, the construction of large-scale and comprehensive benchmark data sets and the analysis and explanation of the underlying mechanisms of predicted interactions.

2021 ◽  
Author(s):  
Andrew J Kavran ◽  
Aaron Clauset

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.Conclusions: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.


2018 ◽  
Vol 1 (1) ◽  
pp. 263-274 ◽  
Author(s):  
Marylyn D. Ritchie

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.


PLoS ONE ◽  
2014 ◽  
Vol 9 (4) ◽  
pp. e91315 ◽  
Author(s):  
Minchao Wang ◽  
Wu Zhang ◽  
Wang Ding ◽  
Dongbo Dai ◽  
Huiran Zhang ◽  
...  

2020 ◽  
Author(s):  
Silu Huang ◽  
Charles Blatti ◽  
Saurabh Sinha ◽  
Aditya Parameswaran

AbstractMotivationA common but critical task in genomic data analysis is finding features that separate and thereby help explain differences between two classes of biological objects, e.g., genes that explain the differences between healthy and diseased patients. As lower-cost, high-throughput experimental methods greatly increase the number of samples that are assayed as objects for analysis, computational methods are needed to quickly provide insights into high-dimensional datasets with tens of thousands of objects and features.ResultsWe develop an interactive exploration tool called Genvisage that rapidly discovers the most discriminative feature pairs that best separate two classes in a dataset, and displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially when the numbers of objects and features are large, we propose a suite of optimizations to make Genvisage more responsive and demonstrate that our optimizations lead to a 400X speedup over competitive baselines for multiple biological data sets. With this speedup, Genvisage enables the exploration of more large-scale datasets and alternate hypotheses in an interactive and interpretable fashion. We apply Genvisage to uncover pairs of genes whose transcriptomic responses significantly discriminate treatments of several chemotherapy drugs.AvailabilityFree webserver at http://genvisage.knoweng.org:443/ with source code at https://github.com/KnowEnG/Genvisage


2016 ◽  
Author(s):  
Fangping Wan ◽  
Jianyang (Michael) Zeng

AbstractAccurately identifying compound-protein interactions in silico can deepen our understanding of the mechanisms of drug action and significantly facilitate the drug discovery and development process. Traditional similarity-based computational models for compound-protein interaction prediction rarely exploit the latent features from current available large-scale unlabelled compound and protein data, and often limit their usage on relatively small-scale datasets. We propose a new scheme that combines feature embedding (a technique of representation learning) with deep learning for predicting compound-protein interactions. Our method automatically learns the low-dimensional implicit but expressive features for compounds and proteins from the massive amount of unlabelled data. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline for accurate compound-protein interaction prediction, even when the interaction knowledge of compounds and proteins is entirely unknown. Evaluations on current large-scale databases of the measured compound-protein affinities, such as ChEMBL and BindingDB, as well as known drug-target interactions from DrugBank have demonstrated the superior prediction performance of our method, and suggested that it can offer a useful tool for drug development and drug repositioning.


PLoS ONE ◽  
2007 ◽  
Vol 2 (8) ◽  
pp. e718 ◽  
Author(s):  
Debbie Winter ◽  
Ben Vinegar ◽  
Hardeep Nahal ◽  
Ron Ammar ◽  
Greg V. Wilson ◽  
...  

2015 ◽  
Vol 26 (22) ◽  
pp. 3932-3939 ◽  
Author(s):  
Hisao Moriya

Overexpression experiments are sometimes considered as qualitative experiments designed to identify novel proteins and study their function. However, in order to draw conclusions regarding protein overexpression through association analyses using large-scale biological data sets, we need to recognize the quantitative nature of overexpression experiments. Here I discuss the quantitative features of two different types of overexpression experiment: absolute and relative. I also introduce the four primary mechanisms involved in growth defects caused by protein overexpression: resource overload, stoichiometric imbalance, promiscuous interactions, and pathway modulation associated with the degree of overexpression.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Douwe van der Wal ◽  
Iny Jhun ◽  
Israa Laklouk ◽  
Jeff Nirschl ◽  
Lara Richer ◽  
...  

AbstractBiology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.


2014 ◽  
Author(s):  
R Daniel Kortschak ◽  
David L Adelson

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.


Author(s):  
Manisha Sritharan ◽  
Farhat A. Avin

Biological big data represents a vast amount of data in bioinformatics and this could lead to the transformation of the research pattern into large scale. In medical research, a large amount of data can be generated from tools including genomic sequencing machines. The availability of advanced tools and modern technology has become the main reason for the expansion of biological data in a huge amount. Such immense data should be utilized in an efficient manner in order to distribute this valuable information. Besides that, storing and dealing with those big data has become a great challenge as the data generation are tremendously increasing over years. As well, the blast of data in healthcare systems and biomedical research appeal for an immediate solution as health care requires a compact integration of biomedical data. Thus, researchers should make use of this available big data for analysis rather than keep creating new data as they could provide meaningful information with the use of current advanced bioinformatics tools.


Sign in / Sign up

Export Citation Format

Share Document