Supervised Machine Learning and Deep Learning Classification Techniques to Identify Scholarly and Research Content

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data

10.1101/2020.11.12.380881 ◽

2020 ◽

Author(s):

John T. Halloran ◽

Gregor Urban ◽

David Rocke ◽

Pierre Baldi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Peptide Identification ◽

Shotgun Proteomics ◽

Database Search ◽

Supervised Machine Learning ◽

Superior Performance ◽

Support Vector ◽

Proteomics Data ◽

Learning Classifier

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.

Download Full-text

The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

The Computer Journal ◽

10.1093/comjnl/bxz098 ◽

2019 ◽

Vol 63 (11) ◽

pp. 1658-1667

Author(s):

M J Castro-Bleda ◽

S España-Boquera ◽

J Pastor-Pellicer ◽

F Zamora-Martínez

Keyword(s):

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

Supervised Learning ◽

Image Enhancement ◽

Super Resolution ◽

Supervised Machine Learning ◽

Text Documents ◽

Learning Techniques ◽

Printed Text

Abstract This paper presents the ‘NoisyOffice’ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.

Download Full-text

Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2019.104068 ◽

2020 ◽

Vol 136 ◽

pp. 104068 ◽

Cited By ~ 4

Author(s):

Rasheed Omobolaji Alabi ◽

Mohammed Elmusrati ◽

Iris Sawazaki‐Calone ◽

Luiz Paulo Kowalski ◽

Caj Haglund ◽

...

Keyword(s):

Machine Learning ◽

Tongue Cancer ◽

Supervised Machine Learning ◽

Oral Tongue ◽

Classification Techniques ◽

Machine Learning Classification ◽

Oral Tongue Cancer

Download Full-text

Deep Learning and Conventional Machine Learning for Image-Based in-Situ Fault Detection During Laser Welding: A Comparative Study

10.20944/preprints202105.0272.v1 ◽

2021 ◽

Author(s):

Christian Knaak ◽

Moritz Kröger ◽

Frederic Schulze ◽

Peter Abels ◽

Arnold Gillner

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Near Infrared ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Detection Rates ◽

Feature Extraction And Selection ◽

Welding Defects

An effective process monitoring strategy is a requirement for meeting the challenges posed by increasingly complex products and manufacturing processes. To address these needs, this study investigates a comprehensive scheme based on classical machine learning methods, deep learning algorithms, and feature extraction and selection techniques. In a first step, a novel deep learning architecture based on convolutional neural networks (CNN) and gated recurrent units (GRU) is introduced to predict the local weld quality based on mid-wave infrared (MWIR) and near-infrared (NIR) image data. The developed technology is used to discover critical welding defects including lack of fusion (false friends), sagging and lack of penetration, and geometric deviations of the weld seam. Additional work is conducted to investigate the significance of various geometrical, statistical, and spatio-temporal features extracted from the keyhole and weld pool regions. Furthermore, the performance of the proposed deep learning architecture is compared to that of classical supervised machine learning algorithms, such as multi-layer perceptron (MLP), logistic regression (LogReg), support vector machines (SVM), decision trees (DT), random forest (RF) and k-Nearest Neighbors (kNN). Optimal hyperparameters for each algorithm are determined by an extensive grid search. Ultimately, the three best classification models are combined into an ensemble classifier that yields the highest detection rates and achieves the most robust estimation of welding defects among all classifiers studied, which is validated on previously unknown welding trials.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

10.1101/134627 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text