Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Short-Term Forecasting of Photovoltaic Solar Power Production Using Variational Auto-Encoder Driven Deep Learning Approach

Applied Sciences ◽

10.3390/app10238400 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8400 ◽

Cited By ~ 1

Author(s):

Abdelkader Dairi ◽

Fouzi Harrou ◽

Ying Sun ◽

Sofiane Khadraoui

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Solar Power ◽

Power Production ◽

Superior Performance ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Learning Methods ◽

Short Term Forecasting

The accurate modeling and forecasting of the power output of photovoltaic (PV) systems are critical to efficiently managing their integration in smart grids, delivery, and storage. This paper intends to provide efficient short-term forecasting of solar power production using Variational AutoEncoder (VAE) model. Adopting the VAE-driven deep learning model is expected to improve forecasting accuracy because of its suitable performance in time-series modeling and flexible nonlinear approximation. Both single- and multi-step-ahead forecasts are investigated in this work. Data from two grid-connected plants (a 243 kW parking lot canopy array in the US and a 9 MW PV system in Algeria) are employed to show the investigated deep learning models’ performance. Specifically, the forecasting outputs of the proposed VAE-based forecasting method have been compared with seven deep learning methods, namely recurrent neural network, Long short-term memory (LSTM), Bidirectional LSTM, Convolutional LSTM network, Gated recurrent units, stacked autoencoder, and restricted Boltzmann machine, and two commonly used machine learning methods, namely logistic regression and support vector regression. The results of this investigation demonstrate the satisfying performance of deep learning techniques to forecast solar power and point out that the VAE consistently performed better than the other methods. Also, results confirmed the superior performance of deep learning models compared to the two considered baseline machine learning models.

Download Full-text

Deep Learning and Conventional Machine Learning for Image-Based in-Situ Fault Detection During Laser Welding: A Comparative Study

10.20944/preprints202105.0272.v1 ◽

2021 ◽

Author(s):

Christian Knaak ◽

Moritz Kröger ◽

Frederic Schulze ◽

Peter Abels ◽

Arnold Gillner

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Near Infrared ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Detection Rates ◽

Feature Extraction And Selection ◽

Welding Defects

An effective process monitoring strategy is a requirement for meeting the challenges posed by increasingly complex products and manufacturing processes. To address these needs, this study investigates a comprehensive scheme based on classical machine learning methods, deep learning algorithms, and feature extraction and selection techniques. In a first step, a novel deep learning architecture based on convolutional neural networks (CNN) and gated recurrent units (GRU) is introduced to predict the local weld quality based on mid-wave infrared (MWIR) and near-infrared (NIR) image data. The developed technology is used to discover critical welding defects including lack of fusion (false friends), sagging and lack of penetration, and geometric deviations of the weld seam. Additional work is conducted to investigate the significance of various geometrical, statistical, and spatio-temporal features extracted from the keyhole and weld pool regions. Furthermore, the performance of the proposed deep learning architecture is compared to that of classical supervised machine learning algorithms, such as multi-layer perceptron (MLP), logistic regression (LogReg), support vector machines (SVM), decision trees (DT), random forest (RF) and k-Nearest Neighbors (kNN). Optimal hyperparameters for each algorithm are determined by an extensive grid search. Ultimately, the three best classification models are combined into an ensemble classifier that yields the highest detection rates and achieves the most robust estimation of welding defects among all classifiers studied, which is validated on previously unknown welding trials.

Download Full-text

Applied Machine Learning for Spine Surgeons: Predicting Outcome for Patients Undergoing Treatment for Lumbar Disc Herniation Using PRO Data

Global Spine Journal ◽

10.1177/2192568220967643 ◽

2020 ◽

pp. 219256822096764

Author(s):

Casper Friis Pedersen ◽

Mikkel Østerheden Andersen ◽

Leah Yacat Carreon ◽

Søren Eiskjær

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Lumbar Disc Herniation ◽

Disc Herniation ◽

Lumbar Disc ◽

Multivariate Adaptive Regression Splines ◽

Superior Performance ◽

Support Vector ◽

Conventional Models

Study Design: Retrospective/prospective study. Objective: Models based on preoperative factors can predict patients’ outcome at 1-year follow-up. This study measures the performance of several machine learning (ML) models and compares the results with conventional methods. Methods: Inclusion criteria were patients who had lumbar disc herniation (LDH) surgery, identified in the Danish national registry for spine surgery. Initial training of models included 16 independent variables, including demographics and presurgical patient-reported measures. Patients were grouped by reaching minimal clinically important difference or not for EuroQol, Oswestry Disability Index, Visual Analog Scale (VAS) Leg, and VAS Back and by their ability to return to work at 1 year follow-up. Data were randomly split into training, validation, and test sets by 50%/35%/15%. Deep learning, decision trees, random forest, boosted trees, and support vector machines model were trained, and for comparison, multivariate adaptive regression splines (MARS) and logistic regression models were used. Model fit was evaluated by inspecting area under the curve curves and performance during validation. Results: Seven models were arrived at. Classification errors were within ±1% to 4% SD across validation folds. ML did not yield superior performance compared with conventional models. MARS and deep learning performed consistently well. Discrepancy was greatest among VAS Leg models. Conclusions: Five predictive ML and 2 conventional models were developed, predicting improvement for LDH patients at the 1-year follow-up. We demonstrate that it is possible to build an ensemble of models with little effort as a starting point for further model optimization and selection.

Download Full-text

Wheat Lodging Detection from UAS Imagery Using Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12111838 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1838 ◽

Cited By ~ 8

Author(s):

Zhao Zhang ◽

Paulo Flores ◽

C. Igathinathane ◽

Dayakar L. Naik ◽

Ravi Kiran ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Standard Deviation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Unmanned Aerial Systems

The current mainstream approach of using manual measurements and visual inspections for crop lodging detection is inefficient, time-consuming, and subjective. An innovative method for wheat lodging detection that can overcome or alleviate these shortcomings would be welcomed. This study proposed a systematic approach for wheat lodging detection in research plots (372 experimental plots), which consisted of using unmanned aerial systems (UAS) for aerial imagery acquisition, manual field evaluation, and machine learning algorithms to detect the occurrence or not of lodging. UAS imagery was collected on three different dates (23 and 30 July 2019, and 8 August 2019) after lodging occurred. Traditional machine learning and deep learning were evaluated and compared in this study in terms of classification accuracy and standard deviation. For traditional machine learning, five types of features (i.e. gray level co-occurrence matrix, local binary pattern, Gabor, intensity, and Hu-moment) were extracted and fed into three traditional machine learning algorithms (i.e., random forest (RF), neural network, and support vector machine) for detecting lodged plots. For the datasets on each imagery collection date, the accuracies of the three algorithms were not significantly different from each other. For any of the three algorithms, accuracies on the first and last date datasets had the lowest and highest values, respectively. Incorporating standard deviation as a measurement of performance robustness, RF was determined as the most satisfactory. Regarding deep learning, three different convolutional neural networks (simple convolutional neural network, VGG-16, and GoogLeNet) were tested. For any of the single date datasets, GoogLeNet consistently had superior performance over the other two methods. Further comparisons between RF and GoogLeNet demonstrated that the detection accuracies of the two methods were not significantly different from each other (p > 0.05); hence, the choice of any of the two would not affect the final detection accuracies. However, considering the fact that the average accuracy of GoogLeNet (93%) was larger than RF (91%), it was recommended to use GoogLeNet for wheat lodging detection. This research demonstrated that UAS RGB imagery, coupled with the GoogLeNet machine learning algorithm, can be a novel, reliable, objective, simple, low-cost, and effective (accuracy > 90%) tool for wheat lodging detection.

Download Full-text

A study of deep learning approaches for medication and adverse drug event extraction from clinical text

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz063 ◽

2019 ◽

Vol 27 (1) ◽

pp. 13-21 ◽

Cited By ~ 11

Author(s):

Qiang Wei ◽

Zongcheng Ji ◽

Zhiheng Li ◽

Jingcheng Du ◽

Jingqi Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Joint Model ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Superior Performance ◽

Drug Event ◽

Support Vector ◽

Learning Approaches

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.

Download Full-text

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study

Journal of Medical Internet Research ◽

10.2196/17478 ◽

2020 ◽

Vol 22 (8) ◽

pp. e17478 ◽

Cited By ~ 1

Author(s):

Shyam Visweswaran ◽

Jason B Colditz ◽

Patrick O’Halloran ◽

Na-Rae Han ◽

Sanya B Taneja ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Surveillance System ◽

Short Term Memory ◽

Characteristic Curve ◽

Superior Performance ◽

Support Vector ◽

Data Set ◽

Machine Learning Classifiers ◽

Learning Classifiers

Background Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. Objective This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. Methods We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. Results LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. Conclusions We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.

Download Full-text

Deep learning of the back-splicing code for circular RNA formation

Bioinformatics ◽

10.1093/bioinformatics/btz382 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5235-5242 ◽

Cited By ~ 10

Author(s):

Jun Wang ◽

Liangjiang Wang

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Rna Splicing ◽

Expression Patterns ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Superior Performance ◽

Supplementary Information ◽

Circular Rnas ◽

Support Vector

Abstract Motivation Circular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5′ and 3′ termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. Results In this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. Availability and implementation All the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Touch-Based Active Cloud Authentication Using Traditional Machine Learning and LSTM on a Distributed Tensorflow Framework

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500226 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1950022

Author(s):

Dylan J. Gunn ◽

Zhipeng Liu ◽

Rushit Dave ◽

Xiaohong Yuan ◽

Kaushik Roy

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Mobile Device ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Modern World ◽

Support Vector ◽

Learning Classifier ◽

Cloud Framework ◽

Computationally Intensive

In this modern world, mobile devices have been paired with the cloud environment to scale the voluminous amount of generated data. The implementation comes at the cost of privacy as proprietary data can be stolen in transit to the cloud, or victims’ phones can be seized along with synced data from cloud. The attacker can gain access to the phone through shoulder surfing, or even spoofing attacks. Our approach is to mitigate this issue by proposing an active cloud authentication framework using touch biometric pattern. To the best of our knowledge, active cloud authentication using touch dynamics for mobile cloud computing has not been explored in the literature. This research creates a proof of concept that will lead into a simulated cloud framework for active authentication. Given the amount of data captured by the mobile device from user activity, it can be a computationally intensive process for the mobile device to handle with such limited resources. To solve this, we simulated a post-transmission process of data to the cloud so that we could implement the authentication process within the cloud. We evaluated the touch data using traditional machine learning algorithms, such as Random Forest (RF), Support Vector Machine (SVM), and also using a deep learning classifier, the Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) algorithms. The novelty of this work is two-fold. First, we develop a distributed tensorflow framework for cloud authentication using touch biometric pattern. This framework helps alleviate the drawback of the computationally intensive recognition of the substantial amount of raw data from the user. Second, we apply the RF, SVM, and a deep learning classifier, the LSTM-RNN, on the touch data to evaluate the performance of the proposed authentication scheme. The proposed approach shows a promising performance with an accuracy of 99.0361% using RF on the distributed tensorflow framework.

Download Full-text

Sketch Based Image Retrieval using Deep Learning Based Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e2622.0610521 ◽

2021 ◽

Vol 10 (5) ◽

pp. 79-86

Author(s):

Deepika Sivasankaran ◽

Sai Seena P ◽

Rajesh R ◽

Madheswari Kanmani

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Image Retrieval ◽

Supervised Machine Learning ◽

Support Vector ◽

Main Challenge ◽

Vector Machines ◽

Single Input ◽

Express Information ◽

The Given

Sketch based image retrieval (SBIR) is a sub-domain of Content Based Image Retrieval(CBIR) where the user provides a drawing as an input to obtain i.e retrieve images relevant to the drawing given. The main challenge in SBIR is the subjectivity of the drawings drawn by the user as it entirely relies on the user's ability to express information in hand-drawn form. Since many of the SBIR models created aim at using singular input sketch and retrieving photos based on the given single sketch input, our project aims to enable detection and extraction of multiple sketches given together as a single input sketch image. The features are extracted from individual sketches obtained using deep learning architectures such as VGG16 , and classified to its type based on supervised machine learning using Support Vector Machines. Based on the class obtained, photos are retrieved from the database using an opencv library, CVLib , which finds the objects present in a photo image. From the number of components obtained in each photo, a ranking function is performed to rank the retrieved photos, which are then displayed to the user starting from the highest order of ranking up to the least. The system consisting of VGG16 and SVM provides 89% accuracy.

Download Full-text