A Survey of Network Embedding for Drug Analysis and Prediction

Author(s):  
Zhixian Liu ◽  
Qingfeng Chen ◽  
Wei Lan ◽  
Jiahai Liang ◽  
Yiping Pheobe Chen ◽  
...  

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

2021 ◽  
pp. 1-12
Author(s):  
JinFang Sheng ◽  
Huaiyu Zuo ◽  
Bin Wang ◽  
Qiong Li

 In a complex network system, the structure of the network is an extremely important element for the analysis of the system, and the study of community detection algorithms is key to exploring the structure of the complex network. Traditional community detection algorithms would represent the network using an adjacency matrix based on observations, which may contain redundant information or noise that interferes with the detection results. In this paper, we propose a community detection algorithm based on density clustering. In order to improve the performance of density clustering, we consider an algorithmic framework for learning the continuous representation of network nodes in a low-dimensional space. The network structure is effectively preserved through network embedding, and density clustering is applied in the embedded low-dimensional space to compute the similarity of nodes in the network, which in turn reveals the implied structure in a given network. Experiments show that the algorithm has superior performance compared to other advanced community detection algorithms for real-world networks in multiple domains as well as synthetic networks, especially when the network data chaos is high.


2020 ◽  
Vol 12 (11) ◽  
pp. 1838 ◽  
Author(s):  
Zhao Zhang ◽  
Paulo Flores ◽  
C. Igathinathane ◽  
Dayakar L. Naik ◽  
Ravi Kiran ◽  
...  

The current mainstream approach of using manual measurements and visual inspections for crop lodging detection is inefficient, time-consuming, and subjective. An innovative method for wheat lodging detection that can overcome or alleviate these shortcomings would be welcomed. This study proposed a systematic approach for wheat lodging detection in research plots (372 experimental plots), which consisted of using unmanned aerial systems (UAS) for aerial imagery acquisition, manual field evaluation, and machine learning algorithms to detect the occurrence or not of lodging. UAS imagery was collected on three different dates (23 and 30 July 2019, and 8 August 2019) after lodging occurred. Traditional machine learning and deep learning were evaluated and compared in this study in terms of classification accuracy and standard deviation. For traditional machine learning, five types of features (i.e. gray level co-occurrence matrix, local binary pattern, Gabor, intensity, and Hu-moment) were extracted and fed into three traditional machine learning algorithms (i.e., random forest (RF), neural network, and support vector machine) for detecting lodged plots. For the datasets on each imagery collection date, the accuracies of the three algorithms were not significantly different from each other. For any of the three algorithms, accuracies on the first and last date datasets had the lowest and highest values, respectively. Incorporating standard deviation as a measurement of performance robustness, RF was determined as the most satisfactory. Regarding deep learning, three different convolutional neural networks (simple convolutional neural network, VGG-16, and GoogLeNet) were tested. For any of the single date datasets, GoogLeNet consistently had superior performance over the other two methods. Further comparisons between RF and GoogLeNet demonstrated that the detection accuracies of the two methods were not significantly different from each other (p > 0.05); hence, the choice of any of the two would not affect the final detection accuracies. However, considering the fact that the average accuracy of GoogLeNet (93%) was larger than RF (91%), it was recommended to use GoogLeNet for wheat lodging detection. This research demonstrated that UAS RGB imagery, coupled with the GoogLeNet machine learning algorithm, can be a novel, reliable, objective, simple, low-cost, and effective (accuracy > 90%) tool for wheat lodging detection.


2019 ◽  
Vol 27 (1) ◽  
pp. 13-21 ◽  
Author(s):  
Qiang Wei ◽  
Zongcheng Ji ◽  
Zhiheng Li ◽  
Jingcheng Du ◽  
Jingqi Wang ◽  
...  

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.


2020 ◽  
Author(s):  
Jinghan Yang ◽  
Zhiqiang Gao ◽  
Xiuhan Ren ◽  
Jie Sheng ◽  
Ping Xu ◽  
...  

ABSTRACTIn shotgun proteomics, it is essential to accurately determine the proteolytic products of each protein in the sample for subsequent identification and quantification, because these proteolytic products are usually taken as the surrogates of their parent proteins in the further data analysis. However, systematical studies about the commonly used proteases in proteomics research are insufficient, and there is a lack of easy-to-use tools to predict the digestibilities of these proteolytic products. Here, we propose a novel sequence-based deep learning model – DeepDigest, which integrates convolutional neural networks and long-short term memory networks for digestibility prediction of peptides. DeepDigest can predict the proteolytic cleavage sites for eight popular proteases including trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN and LysargiNase. Compared with traditional machine learning algorithms, DeepDigest showed superior performance for all the eight proteases on a variety of datasets. Besides, some interesting characteristics of different proteases were revealed and discussed.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 3153 ◽  
Author(s):  
Fei Deng ◽  
Shengliang Pu ◽  
Xuehong Chen ◽  
Yusheng Shi ◽  
Ting Yuan ◽  
...  

Deep learning techniques have boosted the performance of hyperspectral image (HSI) classification. In particular, convolutional neural networks (CNNs) have shown superior performance to that of the conventional machine learning algorithms. Recently, a novel type of neural networks called capsule networks (CapsNets) was presented to improve the most advanced CNNs. In this paper, we present a modified two-layer CapsNet with limited training samples for HSI classification, which is inspired by the comparability and simplicity of the shallower deep learning models. The presented CapsNet is trained using two real HSI datasets, i.e., the PaviaU (PU) and SalinasA datasets, representing complex and simple datasets, respectively, and which are used to investigate the robustness or representation of every model or classifier. In addition, a comparable paradigm of network architecture design has been proposed for the comparison of CNN and CapsNet. Experiments demonstrate that CapsNet shows better accuracy and convergence behavior for the complex data than the state-of-the-art CNN. For CapsNet using the PU dataset, the Kappa coefficient, overall accuracy, and average accuracy are 0.9456, 95.90%, and 96.27%, respectively, compared to the corresponding values yielded by CNN of 0.9345, 95.11%, and 95.63%. Moreover, we observed that CapsNet has much higher confidence for the predicted probabilities. Subsequently, this finding was analyzed and discussed with probability maps and uncertainty analysis. In terms of the existing literature, CapsNet provides promising results and explicit merits in comparison with CNN and two baseline classifiers, i.e., random forests (RFs) and support vector machines (SVMs).


Algorithms ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 20 ◽  
Author(s):  
Dehai Zhang ◽  
Linan Liu ◽  
Cheng Xie ◽  
Bing Yang ◽  
Qing Liu

With the arrival of 5G networks, cellular networks are moving in the direction of diversified, broadband, integrated, and intelligent networks. At the same time, the popularity of various smart terminals has led to an explosive growth in cellular traffic. Accurate network traffic prediction has become an important part of cellular network intelligence. In this context, this paper proposes a deep learning method for space-time modeling and prediction of cellular network communication traffic. First, we analyze the temporal and spatial characteristics of cellular network traffic from Telecom Italia. On this basis, we propose a hybrid spatiotemporal network (HSTNet), which is a deep learning method that uses convolutional neural networks to capture the spatiotemporal characteristics of communication traffic. This work adds deformable convolution to the convolution model to improve predictive performance. The time attribute is introduced as auxiliary information. An attention mechanism based on historical data for weight adjustment is proposed to improve the robustness of the module. We use the dataset of Telecom Italia to evaluate the performance of the proposed model. Experimental results show that compared with the existing statistics methods and machine learning algorithms, HSTNet significantly improved the prediction accuracy based on MAE and RMSE.


2019 ◽  
Vol 35 (24) ◽  
pp. 5235-5242 ◽  
Author(s):  
Jun Wang ◽  
Liangjiang Wang

Abstract Motivation Circular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5′ and 3′ termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. Results In this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. Availability and implementation All the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 15 ◽  
Author(s):  
Xuefei Peng ◽  
Lei Chen ◽  
Jian-Peng Zhou

: Cancer is the second cause of human death in the world. To date, many factors have been confirmed to be the cause of cancer. Among them, carcinogenic chemical has been widely accepted as the important one. Traditional methods for detecting carcinogenic chemicals are of low efficiency and high cost. It is urgent to design effective computational methods, which can provide extensive detections. In this study, a new computational model was proposed for detecting carcinogenic chemicals. As a data-driven model, carcinogenic and non-carcinogenic chemicals were obtained from Carcinogenic Potency Database (CPDB). These chemicals were represented by features extracted from five chemical networks, representing five types of chemical associations, via a network embedding method, Mashup. Obtained features were fed into a powerful deep learning method, recurrent neural network, to build the model. The jackknife test on such model provided the F-measure of 0.971 and AUROC of 0.971. Extensive comparisons were performed, suggesting that the proposed model was superior to the models with traditional machine learning algorithms, classic chemical encoding scheme or direct usage of chemical associations.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Zhihong Liu ◽  
Jiewen Du ◽  
Jiansong Fang ◽  
Yulong Yin ◽  
Guohuan Xu ◽  
...  

Abstract Deep learning contributes significantly to researches in biological sciences and drug discovery. Previous studies suggested that deep learning techniques have shown superior performance to other machine learning algorithms in virtual screening, which is a critical step to accelerate the drug discovery. However, the application of deep learning techniques in drug discovery and chemical biology are hindered due to the data availability, data further processing and lacking of the user-friendly deep learning tools and interface. Therefore, we developed a user-friendly web server with integration of the state of art deep learning algorithm, which utilizes either the public or user-provided dataset to help biologists or chemists perform virtual screening either the chemical probes or drugs for a specific target of interest. With DeepScreening, user could conveniently construct a deep learning model and generate the target-focused de novo libraries. The constructed classification and regression models could be subsequently used for virtual screening against the generated de novo libraries, or diverse chemical libraries in stock. From deep models training to virtual screening, and target focused de novo library generation, all those tasks could be finished with DeepScreening. We believe this deep learning-based web server will benefit to both biologists and chemists for probes or drugs discovery.


Author(s):  
Hongmei Wang ◽  
Lu Wang ◽  
Edward H. Lee ◽  
Jimmy Zheng ◽  
Wei Zhang ◽  
...  

Abstract Purpose High-dimensional image features that underlie COVID-19 pneumonia remain opaque. We aim to compare feature engineering and deep learning methods to gain insights into the image features that drive CT-based for COVID-19 pneumonia prediction, and uncover CT image features significant for COVID-19 pneumonia from deep learning and radiomics framework. Methods A total of 266 patients with COVID-19 and other viral pneumonia with clinical symptoms and CT signs similar to that of COVID-19 during the outbreak were retrospectively collected from three hospitals in China and the USA. All the pneumonia lesions on CT images were manually delineated by four radiologists. One hundred eighty-four patients (n = 93 COVID-19 positive; n = 91 COVID-19 negative; 24,216 pneumonia lesions from 12,001 CT image slices) from two hospitals from China served as discovery cohort for model development. Thirty-two patients (17 COVID-19 positive, 15 COVID-19 negative; 7883 pneumonia lesions from 3799 CT image slices) from a US hospital served as external validation cohort. A bi-directional adversarial network-based framework and PyRadiomics package were used to extract deep learning and radiomics features, respectively. Linear and Lasso classifiers were used to develop models predictive of COVID-19 versus non-COVID-19 viral pneumonia. Results 120-dimensional deep learning image features and 120-dimensional radiomics features were extracted. Linear and Lasso classifiers identified 32 high-dimensional deep learning image features and 4 radiomics features associated with COVID-19 pneumonia diagnosis (P < 0.0001). Both models achieved sensitivity > 73% and specificity > 75% on external validation cohort with slight superior performance for radiomics Lasso classifier. Human expert diagnostic performance improved (increase by 16.5% and 11.6% in sensitivity and specificity, respectively) when using a combined deep learning-radiomics model. Conclusions We uncover specific deep learning and radiomics features to add insight into interpretability of machine learning algorithms and compare deep learning and radiomics models for COVID-19 pneumonia that might serve to augment human diagnostic performance.


Sign in / Sign up

Export Citation Format

Share Document