Review of the Applications of Deep Learning in Bioinformatics

2021 ◽  
Vol 15 (8) ◽  
pp. 898-911
Author(s):  
Yongqing Zhang ◽  
Jianrong Yan ◽  
Siyu Chen ◽  
Meiqin Gong ◽  
Dongrui Gao ◽  
...  

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.

2018 ◽  
Vol 1 (1) ◽  
pp. 181-205 ◽  
Author(s):  
Pierre Baldi

Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.


2020 ◽  
Vol 1 (1) ◽  
pp. 015009 ◽  
Author(s):  
Chang Min Hyun ◽  
Kang Cheol Kim ◽  
Hyun Cheol Cho ◽  
Jae Kyu Choi ◽  
Jin Keun Seo

2001 ◽  
Vol 7 (S2) ◽  
pp. 622-623
Author(s):  
Xiaoyou Ying ◽  
Jean Sprinkle Cavallo ◽  
Bruce McCullough

Digital microscopy, the integration of digital and microscopy technologies, was initiated for quantitative microscopic image analysis, but it is now for almost all microscopy applications. During the past decade, with the advance of digital technologies, digital microscopy imaging is becoming an indispensable technology in drug discovery.We started establishing state-of-the-art digital microscopy imaging for drug discovery with the investigation of bioimaging applications at our US research site. Our results shown that all the top 5 bioimaging needs require computer-aided microscopy. Based on this investigation and our review of the microscopy imaging applications in the pharmaceutical industry, we determined four directions for microscopy in drug discovery: multidimensional/multimodal microscopy, digitalization, automation, and bioimage informatics.Multidimensional/multimodal microscopy imaging is required by the nature of biological research, which is fundamental in drug discovery. From genomic imaging to pathology observation, we require biological details and compound activities at the levels from subcellular organelles to organ tissues, from cellular signaling to anatomical locations of compounds.


2021 ◽  
pp. 1-11
Author(s):  
Tianshi Mu ◽  
Kequan Lin ◽  
Huabing Zhang ◽  
Jian Wang

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.


2020 ◽  
Author(s):  
Cuong Q. Nguyen ◽  
Constantine Kreatsoulas ◽  
Kim M. Branson

Building in silico models to predict chemical properties and activities is a crucial step in drug discovery. However, drug discovery projects are often characterized by limited labeled data, hindering the applications of deep learning in this setting. Meanwhile advances in meta-learning have enabled state-of-the-art performances in few-shot learning benchmarks, naturally prompting the question: Can meta-learning improve deep learning performance in low-resource drug discovery projects? In this work, we assess the efficiency of the Model-Agnostic Meta-Learning (MAML) algorithm – along with its variants FO-MAML and ANIL – at learning to predict chemical properties and activities. Using the ChEMBL20 dataset to emulate low-resource settings, our benchmark shows that meta-initializations perform comparably to or outperform multi-task pre-training baselines on 16 out of 20 in-distribution tasks and on all out-of-distribution tasks, providing an average improvement in AUPRC of 7.2% and 14.9% respectively. Finally, we observe that meta-initializations consistently result in the best performing models across fine-tuning sets with k ∈ {16, 32, 64, 128, 256} instances.<br>


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 676-676
Author(s):  
Samuel Beck ◽  
Jun-Yeong Lee ◽  
Jarod Rollins

Abstract In this era of Big Data, the volume of biological data is growing exponentially. Systematic profiling and analysis of these data will provide a new insight into biology and human health. Among diverse types of biological data, gene expression data closely mirror both the static phenotypes and the dynamic changes in biological systems. Drug-to-drug or drug-to-disease comparison of gene expression signature allows repurposing/repositioning of existing pharmaceutics to treat additional diseases that, in turn, provides a rapid and cost-effective approach for drug discovery. Thanks to technological advances, gene expression profiling by mRNA-seq became a routine tool to address all aspects of the problem in modern biological research. Here, we present how drug repositioning using published mRNA-seq data can provide unbiased and applicable pharmaco-chemical intervention strategies to human diseases and aging. In specifics, we profiled over a half-million gene expression profiling data generated from various contexts, and using this, we screened conditions that can suppress age-associated gene expression changes. As a result, our analysis identified various previously validated aging intervention strategies as positive hits. Furthermore, our analysis also predicted a novel group of chemicals that has not been studied from an aging context, and this indeed significantly extended the life span in model animals. Taken together, our data demonstrate that our community knowledge-guided in silico drug-discovery pipeline provides a useful and effective tool to identify the novel aging intervention strategy.


2019 ◽  
Author(s):  
Leihong Wu ◽  
Xiangwen Liu ◽  
Joshua Xu

Abstract Background: Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. Results: HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.


2021 ◽  
pp. 9-31 ◽  
Author(s):  
Mohammad Behdad Jamshidi ◽  
Ali Lalbakhsh ◽  
Jakub Talla ◽  
Zdeněk Peroutka ◽  
Sobhan Roshani ◽  
...  

2020 ◽  
Author(s):  
Aviv Zelig ◽  
Noam Kaplan

AbstractThe challenges of clustering noisy high-dimensional biological data have spawned advanced clustering algorithms that are tailored for specific subtypes of biological datatypes. However, the performance of such methods varies greatly between datasets, they require post hoc tuning of cryptic hyperparameters, and they are often not transferable to other types of data. Here we present a novel generic clustering approach called k minimal distances (KMD) clustering, based on a simple generalization of single and average linkage hierarchical clustering. We show how a generalized silhouette-like function is predictive of clustering accuracy and exploit this property to eliminate the main hyperparameter k. We evaluated KMD clustering on standard simulated datasets, simulated datasets with high noise added, mass cytometry datasets and scRNA-seq datasets. When compared to standard generic and state-of-the-art specialized algorithms, KMD clustering’s performance was consistently better or comparable to that of the best algorithm on each of the tested datasets.


2020 ◽  
Author(s):  
Caleb K. Chan ◽  
Amalia Hadjitheodorou ◽  
Tony Y.-C. Tsai ◽  
Julie A. Theriot

ABSTRACTCell motility is a crucial biological function for many cell types, including the immune cells in our body that act as first responders to foreign agents. In this work we consider the amoeboid motility of human neutrophils, which show complex and continuous morphological changes during locomotion. We imaged live neutrophils migrating on a 2D plane and extracted unbiased shape representations using cell contours and binary masks. We were able to decompose these complex shapes into low-dimensional encodings with both principal component analysis (PCA) and an unsupervised deep learning technique using variational autoencoders (VAE), enhanced with generative adversarial networks (GANs). We found that the neural network architecture, the VAE-GAN, was able to encode complex cell shapes into a low-dimensional latent space that encodes the same shape variation information as PCA, but much more efficiently. Contrary to the conventional viewpoint that the latent space is a “black box”, we demonstrated that the information learned and encoded within the latent space is consistent with PCA and is reproducible across independent training runs. Furthermore, by including cell speed into the training of the VAE-GAN, we were able to incorporate cell shape and speed into the same latent space. Our work provides a quantitative framework that connects biological form, through cell shape, to a biological function, cell movement. We believe that our quantitative approach to calculating a compact representation of cell shape using the VAE-GAN provides an important avenue that will support further mechanistic dissection of cell motility.AUTHOR SUMMARYDeep convolutional neural networks have recently enjoyed a surge in popularity, and have found useful applications in many fields, including biology. Supervised deep learning, which involves the training of neural networks using existing labeled data, has been especially popular in solving image classification problems. However, biological data is often highly complex and continuous in nature, where prior labeling is impractical, if not impossible. Unsupervised deep learning promises to discover trends in the data by reducing its complexity while retaining the most relevant information. At present, challenges in the extraction of meaningful human-interpretable information from the neural network’s nonlinear discovery process have earned it a reputation of being a “black box” that can perform impressively well at prediction but cannot be used to shed any meaningful insight on underlying mechanisms of variation in biological data sets. Our goal in this paper is to establish unsupervised deep learning as a practical tool to gain scientific insight into biological data by first establishing the interpretability of our particular data set (images of the shapes of motile neutrophils) using more traditional techniques. Using the insight gained from this as a guide allows us to shine light into the “black box” of unsupervised deep learning.


Sign in / Sign up

Export Citation Format

Share Document