Generalising Better: Applying Deep Learning to Integrate Deleteriousness Prediction Scores for Whole-Exome SNV Studies

DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy

Briefings in Bioinformatics ◽

10.1093/bib/bbaa125 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ruopeng Xie ◽

Jiahui Li ◽

Jiawei Wang ◽

Wei Dai ◽

André Leier ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Virulence Factors ◽

Bacterial Genome ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Wide Range ◽

Extreme Gradient Boosting ◽

Hybrid Framework

Abstract Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user’s viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.

Investigating Deep Feedforward Neural Networks for Classification of Transposon-Derived piRNAs

10.1101/2020.04.08.032755 ◽

2020 ◽

Author(s):

Alisson Hayasi da Costa ◽

Renato Augusto C. dos Santos ◽

Ricardo Cerri

Keyword(s):

Neural Networks ◽

Deep Learning ◽

State Of The Art ◽

Feedforward Neural Networks ◽

The State ◽

Machine Learning Algorithms ◽

Support Vector ◽

Advantages And Disadvantages ◽

Large Application

AbstractPIWI-Interacting RNAs (piRNAs) form an important class of non-coding RNAs that play a key role in the genome integrity through the silencing of transposable elements. However, despite their importance and the large application of deep learning in computational biology for classification tasks, there are few studies of deep learning and neural networks for piRNAs prediction. Therefore, this paper presents an investigation on deep feedforward networks models for classification of transposon-derived piRNAs. We analyze and compare the results of the neural networks in different hyperparameters choices, such as number of layers, activation functions and optimizers, clarifying the advantages and disadvantages of each configuration. From this analysis, we propose a model for human piRNAs classification and compare our method with the state-of-the-art deep neural network for piRNA prediction in the literature and also traditional machine learning algorithms, such as Support Vector Machines and Random Forests, showing that our model has achieved a great performance with an F-measure value of 0.872, outperforming the state-of-the-art method in the literature.

Convolutional Neural Networks for Water Body Extraction from Landsat Imagery

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026817500018 ◽

2017 ◽

Vol 16 (01) ◽

pp. 1750001 ◽

Cited By ~ 23

Author(s):

Long Yu ◽

Zhiyin Wang ◽

Shengwei Tian ◽

Feiyue Ye ◽

Jianli Ding ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Water Body ◽

Spatial Information ◽

Landsat Imagery ◽

Classification Performance ◽

Support Vector ◽

Learning Methods ◽

Wide Range

Traditional machine learning methods for water body extraction need complex spectral analysis and feature selection which rely on wealth of prior knowledge. They are time-consuming and hard to satisfy our request for accuracy, automation level and a wide range of application. We present a novel deep learning framework for water body extraction from Landsat imagery considering both its spectral and spatial information. The framework is a hybrid of convolutional neural networks (CNN) and logistic regression (LR) classifier. CNN, one of the deep learning methods, has acquired great achievements on various visual-related tasks. CNN can hierarchically extract deep features from raw images directly, and distill the spectral–spatial regularities of input data, thus improving the classification performance. Experimental results based on three Landsat imagery datasets show that our proposed model achieves better performance than support vector machine (SVM) and artificial neural network (ANN).

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Hyperspectral Image Classification with Capsule Network Using Limited Training Samples

Sensors ◽

10.3390/s18093153 ◽

2018 ◽

Vol 18 (9) ◽

pp. 3153 ◽

Cited By ~ 32

Author(s):

Fei Deng ◽

Shengliang Pu ◽

Xuehong Chen ◽

Yusheng Shi ◽

Ting Yuan ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Network Architecture ◽

Hyperspectral Image ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Complex Data ◽

Training Samples ◽

Limited Training Samples

Deep learning techniques have boosted the performance of hyperspectral image (HSI) classification. In particular, convolutional neural networks (CNNs) have shown superior performance to that of the conventional machine learning algorithms. Recently, a novel type of neural networks called capsule networks (CapsNets) was presented to improve the most advanced CNNs. In this paper, we present a modified two-layer CapsNet with limited training samples for HSI classification, which is inspired by the comparability and simplicity of the shallower deep learning models. The presented CapsNet is trained using two real HSI datasets, i.e., the PaviaU (PU) and SalinasA datasets, representing complex and simple datasets, respectively, and which are used to investigate the robustness or representation of every model or classifier. In addition, a comparable paradigm of network architecture design has been proposed for the comparison of CNN and CapsNet. Experiments demonstrate that CapsNet shows better accuracy and convergence behavior for the complex data than the state-of-the-art CNN. For CapsNet using the PU dataset, the Kappa coefficient, overall accuracy, and average accuracy are 0.9456, 95.90%, and 96.27%, respectively, compared to the corresponding values yielded by CNN of 0.9345, 95.11%, and 95.63%. Moreover, we observed that CapsNet has much higher confidence for the predicted probabilities. Subsequently, this finding was analyzed and discussed with probability maps and uncertainty analysis. In terms of the existing literature, CapsNet provides promising results and explicit merits in comparison with CNN and two baseline classifiers, i.e., random forests (RFs) and support vector machines (SVMs).

A Research on Deep Learning Advance for Landslide Classification using Convolutional Neural Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1184.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 903-906

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Proposed Model

Landslides can easily be tragic to human life and property. Increase in the rate of human settlement in the mountains has resulted in safety concerns. Landslides have caused economic loss between 1-2% of the GDP in many developing countries. In this study, we discuss a deep learning approach to detect landslides. Convolutional Neural Networks are used for feature extraction for our proposed model. As there was no source of an exact and precise data set for feature extraction, therefore, a new data set was built for testing the model. We have tested and compared this work with our proposed model and with other machine-learning algorithms such as Logistic Regression, Random Forest, AdaBoost, K-Nearest Neighbors and Support Vector Machine. Our proposed deep learning model produces a classification accuracy of 96.90% outperforming the classical machine-learning algorithms.

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241.v1 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Deep Learning Associated with Laser-Induced Breakdown Spectroscopy (LIBS) for the Prediction of Lead in Soil

Applied Spectroscopy ◽

10.1177/0003702819826283 ◽

2019 ◽

Vol 73 (5) ◽

pp. 565-573 ◽

Cited By ~ 9

Author(s):

Yun Zhao ◽

Mahamed Lamine Guindo ◽

Xing Xu ◽

Miao Sun ◽

Jiyu Peng ◽

...

Keyword(s):

Deep Learning ◽

Confusion Matrix ◽

Principal Component ◽

Classification Performance ◽

Soil Samples ◽

Laser Induced Breakdown Spectroscopy ◽

Support Vector ◽

Breakdown Spectroscopy ◽

Laser Induced Breakdown ◽

Different Levels

In this study, a method based on laser-induced breakdown spectroscopy (LIBS) was developed to detect soil contaminated with Pb. Different levels of Pb were added to soil samples in which tobacco was planted over a period of two to four weeks. Principal component analysis and deep learning with a deep belief network (DBN) were implemented to classify the LIBS data. The robustness of the method was verified through a comparison with the results of a support vector machine and partial least squares discriminant analysis. A confusion matrix of the different algorithms shows that the DBN achieved satisfactory classification performance on all samples of contaminated soil. In terms of classification, the proposed method performed better on samples contaminated for four weeks than on those contaminated for two weeks. The results show that LIBS can be used with deep learning for the detection of heavy metals in soil.

Molecular imaging and deep learning analysis of uMUC1 expression in response to chemotherapy in an orthotopic model of ovarian cancer

Scientific Reports ◽

10.1038/s41598-020-71890-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Hongwei Zhao ◽

Hasaan Hayat ◽

Xiaohong Ma ◽

Daguang Fan ◽

Ping Wang ◽

...

Keyword(s):

Neural Networks ◽

Ovarian Cancer ◽

Deep Learning ◽

Cancer Progression ◽

In Vivo Imaging ◽

Near Infrared ◽

Response To Therapy ◽

Response To Chemotherapy

Abstract Artificial Intelligence (AI) algorithms including deep learning have recently demonstrated remarkable progress in image-recognition tasks. Here, we utilized AI for monitoring the expression of underglycosylated mucin 1 (uMUC1) tumor antigen, a biomarker for ovarian cancer progression and response to therapy, using contrast-enhanced in vivo imaging. This was done using a dual-modal (magnetic resonance and near infrared optical imaging) uMUC1-specific probe (termed MN-EPPT) consisted of iron-oxide magnetic nanoparticles (MN) conjugated to a uMUC1-specific peptide (EPPT) and labeled with a near-infrared fluorescent dye, Cy5.5. In vitro studies performed in uMUC1-expressing human ovarian cancer cell line SKOV3/Luc and control uMUC1low ES-2 cells showed preferential uptake on the probe by the high expressor (n = 3, p < .05). A decrease in MN-EPPT uptake by SKOV3/Luc cells in vitro due to uMUC1 downregulation after docetaxel therapy was paralleled by in vivo imaging studies that showed a reduction in probe accumulation in the docetaxel treated group (n = 5, p < .05). The imaging data were analyzed using deep learning-enabled segmentation and quantification of the tumor region of interest (ROI) from raw input MRI sequences by applying AI algorithms including a blend of Convolutional Neural Networks (CNN) and Fully Connected Neural Networks. We believe that the algorithms used in this study have the potential to improve studying and monitoring cancer progression, amongst other diseases.

Implementation of IoT Framework with Data Analysis Using Deep Learning Methods for Occupancy Prediction in a Building

Future Internet ◽

10.3390/fi13030067 ◽

2021 ◽

Vol 13 (3) ◽

pp. 67

Author(s):

Eric Hitimana ◽

Gaurav Bajpai ◽

Richard Musabe ◽

Louis Sibomana ◽

Jayavel Kayalvizhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Time Series Data ◽

Multivariate Time Series ◽

Machine Learning Algorithms ◽

Series Data ◽

Support Vector ◽

Human Beings ◽

Feed Forward Network

Many countries worldwide face challenges in controlling building incidence prevention measures for fire disasters. The most critical issues are the localization, identification, detection of the room occupant. Internet of Things (IoT) along with machine learning proved the increase of the smartness of the building by providing real-time data acquisition using sensors and actuators for prediction mechanisms. This paper proposes the implementation of an IoT framework to capture indoor environmental parameters for occupancy multivariate time-series data. The application of the Long Short Term Memory (LSTM) Deep Learning algorithm is used to infer the knowledge of the presence of human beings. An experiment is conducted in an office room using multivariate time-series as predictors in the regression forecasting problem. The results obtained demonstrate that with the developed system it is possible to obtain, process, and store environmental information. The information collected was applied to the LSTM algorithm and compared with other machine learning algorithms. The compared algorithms are Support Vector Machine, Naïve Bayes Network, and Multilayer Perceptron Feed-Forward Network. The outcomes based on the parametric calibrations demonstrate that LSTM performs better in the context of the proposed application.