A New Linear Classifier Based on Combining Supervised and Unsupervised Techniques

Luminiţa State; Iuliana Paraschiv-Munteanu

doi:10.15837/ijccc.2011.1.2212

A New Linear Classifier Based on Combining Supervised and Unsupervised Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2011.1.2212 ◽

2011 ◽

Vol 6 (1) ◽

pp. 175

Author(s):

Luminiţa State ◽

Iuliana Paraschiv-Munteanu

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

The Other ◽

Support Vector ◽

Learning Sequence ◽

Empirical Risk ◽

Linear Classifier ◽

Alternative Approach ◽

Suitable Structure ◽

Selection Of

<p>The aim of the research reported in the paper is to obtain an alternative approach in using Support Vector Machine (SVM) in case of nonlinearly separable data based on using the k-means algorithm instead of the standard kernel based approach. <br />The SVM is a relatively new concept in machine learning and it was introduced by Vapnik in 1995. In designing a classifier, two main problems have to be solved, on one hand the option concerning a suitable structure and on the other hand the selection of an algorithm for parameter estimation. <br />The algorithm for parameter estimation performs the optimization of a convenable selected cost function with respect to the empirical risk which is directly related to the representativeness of the available learning sequence. The choice of the structure is made such that to maximize the generalization capacity, that is to assure good performance in classifying new data coming from the same classes. In solving these problems one has to establish a balance between the accuracy in encoding the learning sequence and the generalization capacities because usually the over-fitting prevents the minimization of the empirical risk.</p>

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Utilizing Data-Driven Models to Predict Brittleness in Tuscaloosa Marine Shale: A Machine Learning Approach

10.2118/208628-stu ◽

2021 ◽

Author(s):

Jamal Ahmadov

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brittleness Index ◽

Estimation Methods ◽

Gradient Boosting ◽

Average Error ◽

Support Vector ◽

Marine Shale ◽

Effective Manner ◽

Selection Of

Abstract The Tuscaloosa Marine Shale (TMS) formation is a clay- and liquid-rich emerging shale play across central Louisiana and southwest Mississippi with recoverable resources of 1.5 billion barrels of oil and 4.6 trillion cubic feet of gas. The formation poses numerous challenges due to its high average clay content (50 wt%) and rapidly changing mineralogy, making the selection of fracturing candidates a difficult task. While brittleness plays an important role in screening potential intervals for hydraulic fracturing, typical brittleness estimation methods require the use of geomechanical and mineralogical properties from costly laboratory tests. Machine Learning (ML) can be employed to generate synthetic brittleness logs and therefore, may serve as an inexpensive and fast alternative to the current techniques. In this paper, we propose the use of machine learning to predict the brittleness index of Tuscaloosa Marine Shale from conventional well logs. We trained ML models on a dataset containing conventional and brittleness index logs from 8 wells. The latter were estimated either from geomechanical logs or log-derived mineralogy. Moreover, to ensure mechanical data reliability, dynamic-to-static conversion ratios were applied to Young's modulus and Poisson's ratio. The predictor features included neutron porosity, density and compressional slowness logs to account for the petrophysical and mineralogical character of TMS. The brittleness index was predicted using algorithms such as Linear, Ridge and Lasso Regression, K-Nearest Neighbors, Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost and Gradient Boosting. Models were shortlisted based on the Root Mean Square Error (RMSE) value and fine-tuned using the Grid Search method with a specific set of hyperparameters for each model. Overall, Gradient Boosting and Random Forest outperformed other algorithms and showed an average error reduction of 5 %, a normalized RMSE of 0.06 and a R-squared value of 0.89. The Gradient Boosting was chosen to evaluate the test set and successfully predicted the brittleness index with a normalized RMSE of 0.07 and R-squared value of 0.83. This paper presents the practical use of machine learning to evaluate brittleness in a cost and time effective manner and can further provide valuable insights into the optimization of completion in TMS. The proposed ML model can be used as a tool for initial screening of fracturing candidates and selection of fracturing intervals in other clay-rich and heterogeneous shale formations.

Download Full-text

Application of Various Machine Learning Techniques in Predicting Total Organic Carbon from Well Logs

Computational Intelligence and Neuroscience ◽

10.1155/2021/7390055 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Osama Siddig ◽

Ahmed Farid Ibrahim ◽

Salaheldin Elkatatny

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Total Organic Carbon ◽

The Other ◽

Well Logs ◽

Machine Learning Techniques ◽

Percentage Error ◽

Average Error ◽

Support Vector ◽

Empirical Correlations

Unconventional resources have recently gained a lot of attention, and as a consequence, there has been an increase in research interest in predicting total organic carbon (TOC) as a crucial quality indicator. TOC is commonly measured experimentally; however, due to sampling restrictions, obtaining continuous data on TOC is difficult. Therefore, different empirical correlations for TOC have been presented. However, there are concerns about the generalization and accuracy of these correlations. In this paper, different machine learning (ML) techniques were utilized to develop models that predict TOC from well logs, including formation resistivity (FR), spontaneous potential (SP), sonic transit time (Δt), bulk density (RHOB), neutron porosity (CNP), gamma ray (GR), and spectrum logs of thorium (Th), uranium (Ur), and potassium (K). Over 1250 data points from the Devonian Duvernay shale were utilized to create and validate the model. These datasets were obtained from three wells; the first was used to train the models, while the data sets from the other two wells were utilized to test and validate them. Support vector machine (SVM), random forest (RF), and decision tree (DT) were the ML approaches tested, and their predictions were contrasted with three empirical correlations. Various AI methods’ parameters were tested to assure the best possible accuracy in terms of correlation coefficient (R) and average absolute percentage error (AAPE) between the actual and predicted TOC. The three ML methods yielded good matches; however, the RF-based model has the best performance. The RF model was able to predict the TOC for the different datasets with R values range between 0.93 and 0.99 and AAPE values less than 14%. In terms of average error, the ML-based models outperformed the other three empirical correlations. This study shows the capability and robustness of ML models to predict the total organic carbon from readily available logging data without the need for core analysis or additional well interventions.

Download Full-text

Resampled dimensional reduction for feature representation in machine learning

10.21203/rs.3.pex-1636/v1 ◽

2021 ◽

Author(s):

Herdiantri Sufriyana ◽

Yu Wei Wu ◽

Emily Chia-Yu Su

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Prediction Model ◽

Sample Size ◽

Dimensional Reduction ◽

Latent Variables ◽

Feature Representation ◽

Estimated Parameters ◽

Representation Technique ◽

Selection Of

Abstract We aimed to provide a resampling protocol for dimensional reduction resulting a few latent variables. The applicability focuses on but not limited for developing a machine learning prediction model in order to improve the number of sample size in relative to the number of candidate predictors. By this feature representation technique, one can improve generalization by preventing latent variables to overfit data used to conduct the dimensional reduction. However, this technique may warrant more computational capacity and time to conduct the procedure. The key stages consisted of derivation of latent variables from multiple resampling subsets, parameter estimation of latent variables in population, and selection of latent variables transformed by the estimated parameters.

Download Full-text

Modeling of Aboveground Biomass with Landsat 8 OLI and Machine Learning in Temperate Forests

Forests ◽

10.3390/f11010011 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11

Author(s):

Pablito M. López-Serrano ◽

José Luis Cárdenas Domínguez ◽

José Javier Corral-Rivas ◽

Enrique Jiménez ◽

Carlos A. López-Sánchez ◽

...

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Goodness Of Fit ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Sensing Applications ◽

Learning Techniques ◽

Physical Variables ◽

Selection Of

An accurate estimation of forests’ aboveground biomass (AGB) is required because of its relevance to the carbon cycle, and because of its economic and ecological importance. The selection of appropriate variables from satellite information and physical variables is important for precise AGB prediction mapping. Because of the complex relationships for AGB prediction, non-parametric machine-learning techniques represent potentially useful techniques for AGB estimation, but their use and comparison in forest remote-sensing applications is still relatively limited. The objective of the present study was to evaluate the performance of automatic learning techniques, support vector regression (SVR) and random forest (RF), to predict the observed AGB (from 318 permanent sampling plots) from the Landsat 8 Landsat 8 Operational Land Imager (OLI) sensor, spectral indexes, texture indexes and physical variables the Sierra Madre Occidental in Mexico. The result showed that the best SVR model explained 80% of the total variance (root mean square error (RMSE) = 8.20 Mg ha−1). The variables that best predicted AGB, in order of importance, were the bands that belong to the region of red and near and middle infrared, and the average temperature. The results show that the SVR technique has a good potential for the estimation of the AGB and that the selection of the model hyperparameters has important implications for optimizing the goodness of fit.

Download Full-text

Intelligent Symbiotic Relay Selection Technique for 5G Networks

International Journal of Engineering Research in Africa ◽

10.4028/www.scientific.net/jera.43.84 ◽

2019 ◽

Vol 43 ◽

pp. 84-100

Author(s):

Maryleen U. Ndubuaku ◽

Kennedy Chinedu Okafor ◽

Chidiebele Chinwendu Udeze ◽

Omar Salih

Keyword(s):

Relay Selection ◽

Network Capacity ◽

The Other ◽

Link Quality ◽

Support Vector ◽

Second Phase ◽

Signal To Noise ◽

Relay Nodes ◽

Geographical Locations ◽

Selection Of

The growing demand for bandwidth and spectrum has inspired the ongoing efforts to establish the future 5G network supporting vertical sectors such as cyber-physical systems (CPS). Cooperative communication is one of the requisite techniques to improve coverage, network capacity and reduce power consumption in the network. In this paper, a symbiotic two-phase intelligent transmission is considered. The first phase occurs between the source and the candidate relays, and involves the selection of a set of “reliable relays”. The second phase occurs between the reliable relays and the destination, and involves the selection of the “best relay” for transmission. Dynamic relay selection using k-means clustering is used to detect the most significant correlation between all the channel state information (CSI) attributes in the system. The work identified the reliable relays while reducing the number of relay nodes for the second transmission phase. Contextual scenarios are created with typical network configuration using three geographical locations Coventry, Birmingham and London. An experimental validation is done with Omnet++ environment for the scenarios of three geographical locations. A natural grouping of mobile users is carried out leveraging the relay capabilities. The results are validated using support vector machine (SVM) classification algorithm. Considering urban environment deployment of relay nodes, metrics such as signal-to-noise-plus-interference ratio (SINR), attenuation, signal to noise ratio (SNR), link quality, k-means clustering, accuracy, and root mean square error (RMSE) are investigated for the Direct-2-Direct (D2D) capable relays. It was observed that the proposed technique both outperforms the other fixed-parameter relay selection techniques and improves with larger datasets unlike the other techniques.

Download Full-text

Combinatorial Algorithm In Linear Model

MATEC Web of Conferences ◽

10.1051/matecconf/201819603017 ◽

2018 ◽

Vol 196 ◽

pp. 03017

Author(s):

Jana Ižvoltová ◽

Peter Pisca

Keyword(s):

Parameter Estimation ◽

Linear Model ◽

Markov Models ◽

Nonlinear Models ◽

The Other ◽

Estimation Methods ◽

Combinatorial Algorithm ◽

Alternative Approach ◽

Parameter Estimation Methods ◽

Iterative Numerical Methods

Gauss-jacobi combinatorial algorithm is an alternative approach to traditional iterative numerical methods, which is primary oriented for parameter estimation in nonlinear models. The combinatorial algorithm is often exploited for outlier diagnosis in nonlinear models, where the other parameter estimation methods lose their efficiency. The paper describes comparison of both of gauss-jacobi combinatorial and gauss-markov models executed on parameter estimation process of levelling network for the reason to find the efficiency of combinatorial algorithm in simply linear model.

Download Full-text

Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning

Science ◽

10.1126/science.aau5631 ◽

2019 ◽

Vol 363 (6424) ◽

pp. eaau5631 ◽

Cited By ~ 66

Author(s):

Andrew F. Zahrt ◽

Jeremy J. Henle ◽

Brennan T. Rose ◽

Yang Wang ◽

William T. Darrow ◽

...

Keyword(s):

Machine Learning ◽

Catalyst Design ◽

Machine Learning Algorithms ◽

Support Vector ◽

Asymmetric Reaction ◽

Stage Of Development ◽

Chiral Phosphoric Acid ◽

Vector Machines ◽

Feed Forward Neural Networks ◽

Selection Of

Catalyst design in asymmetric reaction development has traditionally been driven by empiricism, wherein experimentalists attempt to qualitatively recognize structural patterns to improve selectivity. Machine learning algorithms and chemoinformatics can potentially accelerate this process by recognizing otherwise inscrutable patterns in large datasets. Herein we report a computationally guided workflow for chiral catalyst selection using chemoinformatics at every stage of development. Robust molecular descriptors that are agnostic to the catalyst scaffold allow for selection of a universal training set on the basis of steric and electronic properties. This set can be used to train machine learning methods to make highly accurate predictive models over a broad range of selectivity space. Using support vector machines and deep feed-forward neural networks, we demonstrate accurate predictive modeling in the chiral phosphoric acid–catalyzed thiol addition toN-acylimines.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

Implementasi teknik seleksi fitur pada klasifikasi malware Android menggunakan support vector machine (SVM)

Repositor ◽

10.22219/repositor.v1i1.1 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Hendra Saputra ◽

Setio Basuki ◽

Mahar Faiqurahman

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Chi Square ◽

Android Malware ◽

Correlation Based Feature Selection ◽

Selection Of

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver. To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one

Download Full-text