Machine Learning Fundamentals

2021 ◽  
Author(s):  
Hui Jiang

This lucid, accessible introduction to supervised machine learning presents core concepts in a focused and logical way that is easy for beginners to follow. The author assumes basic calculus, linear algebra, probability and statistics but no prior exposure to machine learning. Coverage includes widely used traditional methods such as SVMs, boosted trees, HMMs, and LDAs, plus popular deep learning methods such as convolution neural nets, attention, transformers, and GANs. Organized in a coherent presentation framework that emphasizes the big picture, the text introduces each method clearly and concisely “from scratch” based on the fundamentals. All methods and algorithms are described by a clean and consistent style, with a minimum of unnecessary detail. Numerous case studies and concrete examples demonstrate how the methods can be applied in a variety of contexts.

2018 ◽  
Vol 68 (1) ◽  
pp. 161-181 ◽  
Author(s):  
Dan Guest ◽  
Kyle Cranmer ◽  
Daniel Whiteson

Machine learning has played an important role in the analysis of high-energy physics data for decades. The emergence of deep learning in 2012 allowed for machine learning tools which could adeptly handle higher-dimensional and more complex problems than previously feasible. This review is aimed at the reader who is familiar with high-energy physics but not machine learning. The connections between machine learning and high-energy physics data analysis are explored, followed by an introduction to the core concepts of neural networks, examples of the key results demonstrating the power of deep learning for analysis of LHC data, and discussion of future prospects and concerns.


Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


2019 ◽  
Vol 38 (7) ◽  
pp. 526-533 ◽  
Author(s):  
York Zheng ◽  
Qie Zhang ◽  
Anar Yusifov ◽  
Yunzhi Shi

Recent advances in machine learning and its applications in various sectors are generating a new wave of experiments and solutions to solve geophysical problems in the oil and gas industry. We present two separate case studies in which supervised deep learning is used as an alternative to conventional techniques. The first case is an example of image classification applied to seismic interpretation. A convolutional neural network (CNN) is trained to pick faults automatically in 3D seismic volumes. Every sample in the input seismic image is classified as either a nonfault or fault with a certain dip and azimuth that are predicted simultaneously. The second case is an example of elastic model building — casting prestack seismic inversion as a machine learning regression problem. A CNN is trained to make predictions of 1D velocity and density profiles from input seismic records. In both case studies, we demonstrate that CNN models trained from synthetic data can be used to make efficient and effective predictions on field data. While results from the first example show that high-quality fault picks can be predicted from migrated seismic images, we find that it is more challenging in the prestack seismic inversion case where constraining the subsurface geologic variations and careful preconditioning of input seismic data are important for obtaining reasonably reliable results. This observation matches our experience using conventional workflows and methods, which also respond to improved signal to noise after migration and stack, and the inherent subsurface ambiguity makes unique parameter inversion difficult.


2019 ◽  
Vol 63 (11) ◽  
pp. 1658-1667
Author(s):  
M J Castro-Bleda ◽  
S España-Boquera ◽  
J Pastor-Pellicer ◽  
F Zamora-Martínez

Abstract This paper presents the ‘NoisyOffice’ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.


Author(s):  
Christian Knaak ◽  
Moritz Kröger ◽  
Frederic Schulze ◽  
Peter Abels ◽  
Arnold Gillner

An effective process monitoring strategy is a requirement for meeting the challenges posed by increasingly complex products and manufacturing processes. To address these needs, this study investigates a comprehensive scheme based on classical machine learning methods, deep learning algorithms, and feature extraction and selection techniques. In a first step, a novel deep learning architecture based on convolutional neural networks (CNN) and gated recurrent units (GRU) is introduced to predict the local weld quality based on mid-wave infrared (MWIR) and near-infrared (NIR) image data. The developed technology is used to discover critical welding defects including lack of fusion (false friends), sagging and lack of penetration, and geometric deviations of the weld seam. Additional work is conducted to investigate the significance of various geometrical, statistical, and spatio-temporal features extracted from the keyhole and weld pool regions. Furthermore, the performance of the proposed deep learning architecture is compared to that of classical supervised machine learning algorithms, such as multi-layer perceptron (MLP), logistic regression (LogReg), support vector machines (SVM), decision trees (DT), random forest (RF) and k-Nearest Neighbors (kNN). Optimal hyperparameters for each algorithm are determined by an extensive grid search. Ultimately, the three best classification models are combined into an ensemble classifier that yields the highest detection rates and achieves the most robust estimation of welding defects among all classifiers studied, which is validated on previously unknown welding trials.


2017 ◽  
Vol 1 (3) ◽  
pp. 83 ◽  
Author(s):  
Chandrasegar Thirumalai ◽  
Ravisankar Koppuravuri

In this paper, we will use deep neural networks for predicting the bike sharing usage based on previous years usage data. We will use because deep neural nets for getting higher accuracy. Deep neural nets are quite different from other machine learning techniques; here we can add many numbers of hidden layers to improve the accuracy of our prediction and the model can be trained in the way we want such that we can achieve the results we want. Nowadays many AI experts will say that deep learning is the best AI technique available now and we can achieve some unbelievable results using this technique. Now we will use that technique to predict bike sharing usage of a rental company to make sure they can take good business decisions based on previous years data.


2017 ◽  
Author(s):  
Christoph Sommer ◽  
Rudolf Hoefler ◽  
Matthias Samwer ◽  
Daniel W. Gerlich

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.


SPE Journal ◽  
2020 ◽  
Vol 25 (05) ◽  
pp. 2778-2800 ◽  
Author(s):  
Harpreet Singh ◽  
Yongkoo Seol ◽  
Evgeniy M. Myshakin

Summary The application of specialized machine learning (ML) in petroleum engineering and geoscience is increasingly gaining attention in the development of rapid and efficient methods as a substitute to existing methods. Existing ML-based studies that use well logs contain two inherent limitations. The first limitation is that they start with one predefined combination of well logs that by default assumes that the chosen combination of well logs is poised to give the best outcome in terms of prediction, although the variation in accuracy obtained through different combinations of well logs can be substantial. The second limitation is that most studies apply unsupervised learning (UL) for classification problems, but it underperforms by a substantial margin compared with nearly all the supervised learning (SL) algorithms. In this context, this study investigates a variety of UL and SL ML algorithms applied on multiple well-log combinations (WLCs) to automate the traditional workflow of well-log processing and classification, including an optimization step to achieve the best output. The workflow begins by processing the measured well logs, which includes developing different combinations of measured well logs and their physics-motivated augmentations, followed by removal of potential outliers from the input WLCs. Reservoir lithology with four different rock types is investigated using eight UL and seven SL algorithms in two different case studies. The results from the two case studies are used to identify the optimal set of well logs and the ML algorithm that gives the best matching reservoir lithology to its ground truth. The workflow is demonstrated using two wells from two different reservoirs on Alaska North Slope to distinguish four different rock types along the well (brine-dominated sand, hydrate-dominated sand, shale, and others/mixed compositions). The results show that the automated workflow investigated in this study can discover the ground truth for the lithology with up to 80% accuracy with UL and up to 90% accuracy with SL, using six routine well logs [vp, vs, ρb, ϕneut, Rt, gamma ray (GR)], which is a significant improvement compared with the accuracy reported in the current state of the art, which is less than 70%.


Sign in / Sign up

Export Citation Format

Share Document