scholarly journals Comparison of Machine Learning Classification Methods for Determining the Geographical Origin of Raw Milk Using Vibrational Spectroscopy

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Aimen El Orche ◽  
Amine Mamad ◽  
Omar Elhamdaoui ◽  
Amine Cheikh ◽  
Miloud El Karbane ◽  
...  

One of the significant challenges in the food industry is the determination of the geographical origin, since products from different regions can lead to great variance in raw milk. Therefore, monitoring the origin of raw milk has become very relevant for producers and consumers worldwide. In this exploratory study, midinfrared spectroscopy combined with machine learning classification methods was investigated as a rapid and nondestructive method for the classification of milk according to its geographical origin. The curse of dimensionality makes some classification methods struggle to train efficient models. Thus, principal component analysis (PCA) has been applied to create a smaller set of features. The application of machine learning methods such as PLS-DA, PCA-LDA, SVM, and PCA-SVM demonstrates that the best results are obtained using PLS-DA, PCA-LDA, and PCA-SVM methods which show a correct classification rate (CCR) of 100% for PLS-DA and PCA-LDA and 94.95% for PCA-SVM, whereas the application of SVM without feature extraction gives a low CCR of 66.67%. These findings demonstrate that FT-MIR spectroscopy, combined with machine learning methods, is an efficient and suitable approach to classify the geographical origins of raw milk.

2020 ◽  
Vol 36 (17) ◽  
pp. 4590-4598
Author(s):  
Robert Page ◽  
Ruriko Yoshida ◽  
Leon Zhang

Abstract Motivation Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of phylogenetic trees is not Euclidean, so ordinary machine learning methods cannot be directly applied. In 2019, Yoshida et al. introduced the notion of tropical principal component analysis (PCA), a statistical method for visualization and dimensionality reduction using a tropical polytope with a fixed number of vertices that minimizes the sum of tropical distances between each data point and its tropical projection. However, their work focused on the tropical projective space rather than the space of phylogenetic trees. We focus here on tropical PCA for dimension reduction and visualization over the space of phylogenetic trees. Results Our main results are 2-fold: (i) theoretical interpretations of the tropical principal components over the space of phylogenetic trees, namely, the existence of a tropical cell decomposition into regions of fixed tree topology; and (ii) the development of a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York. Availability and implementation Dataset: http://polytopes.net/Data.tar.gz. Code: http://polytopes.net/tropica_MCMC_codes.tar.gz. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
İlhan Umut ◽  
Güven Çentik

The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron,K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that whileK-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.


Polymers ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 825
Author(s):  
Kaixin Liu ◽  
Zhengyang Ma ◽  
Yi Liu ◽  
Jianguo Yang ◽  
Yuan Yao

Increasing machine learning methods are being applied to infrared non-destructive assessment for internal defects assessment of composite materials. However, most of them extract only linear features, which is not in accord with the nonlinear characteristics of infrared data. Moreover, limited infrared images tend to restrict the data analysis capabilities of machine learning methods. In this work, a novel generative kernel principal component thermography (GKPCT) method is proposed for defect detection of carbon fiber reinforced polymer (CFRP) composites. Specifically, the spectral normalization generative adversarial network is proposed to augment the thermograms for model construction. Sequentially, the KPCT method is used by feature mapping of all thermogram data using kernel principal component analysis, which allows for differentiation of defects and background in the dimensionality-reduced data. Additionally, a defect-background separation metric is designed to help the performance evaluation of data analysis methods. Experimental results on CFRP demonstrate the feasibility and advantages of the proposed GKPCT method.


2020 ◽  
Author(s):  
Serge Dolgikh

UNSTRUCTURED An analysis of a combined dataset of Wave 1 and 2 cases, aligned at approximately Local Time Zero + 2 months with unsupervised machine learning methods such as Principal Component Analysis and deep autoencoder dimensionality reduction allows to clearly separate milder background cases from those with more rapid and aggressive onset of the epidemics. The analysis and findings of the study can be used in evaluation of possible epidemiological scenarios and as an effective modeling tool to design corrective and preventative measures to avoid developments with potentially heavy impact.


2020 ◽  
Vol 66 (6) ◽  
pp. 2495-2522 ◽  
Author(s):  
Duncan Simester ◽  
Artem Timoshenko ◽  
Spyros I. Zoumpoulis

We investigate how firms can use the results of field experiments to optimize the targeting of promotions when prospecting for new customers. We evaluate seven widely used machine-learning methods using a series of two large-scale field experiments. The first field experiment generates a common pool of training data for each of the seven methods. We then validate the seven optimized policies provided by each method together with uniform benchmark policies in a second field experiment. The findings not only compare the performance of the targeting methods, but also demonstrate how well the methods address common data challenges. Our results reveal that when the training data are ideal, model-driven methods perform better than distance-driven methods and classification methods. However, the performance advantage vanishes in the presence of challenges that affect the quality of the training data, including the extent to which the training data captures details of the implementation setting. The challenges we study are covariate shift, concept shift, information loss through aggregation, and imbalanced data. Intuitively, the model-driven methods make better use of the information available in the training data, but the performance of these methods is more sensitive to deterioration in the quality of this information. The classification methods we tested performed relatively poorly. We explain the poor performance of the classification methods in our setting and describe how the performance of these methods could be improved. This paper was accepted by Matthew Shum, marketing.


Logistics ◽  
2020 ◽  
Vol 4 (4) ◽  
pp. 35
Author(s):  
Sidharth Sankhye ◽  
Guiping Hu

The rising popularity of smart factories and Industry 4.0 has made it possible to collect large amounts of data from production stages. Thus, supervised machine learning methods such as classification can viably predict product compliance quality using manufacturing data collected during production. Elimination of uncertainty via accurate prediction provides significant benefits at any stage in a supply chain. Thus, early knowledge of product batch quality can save costs associated with recalls, packaging, and transportation. While there has been thorough research on predicting the quality of specific manufacturing processes, the adoption of classification methods to predict the overall compliance of production batches has not been extensively investigated. This paper aims to design machine learning based classification methods for quality compliance and validate the models via case study of a multi-model appliance production line. The proposed classification model could achieve an accuracy of 0.99 and Cohen’s Kappa of 0.91 for the compliance quality of unit batches. Thus, the proposed method would enable implementation of a predictive model for compliance quality. The case study also highlights the importance of feature construction and dataset knowledge in training classification models.


2008 ◽  
Vol 17 (2) ◽  
pp. 121-142 ◽  
Author(s):  
Guido Heumer ◽  
Heni Ben Amor ◽  
Bernhard Jung

This paper presents a comparison of various machine learning methods applied to the problem of recognizing grasp types involved in object manipulations performed with a data glove. Conventional wisdom holds that data gloves need calibration in order to obtain accurate results. However, calibration is a time-consuming process, inherently user-specific, and its results are often not perfect. In contrast, the present study aims at evaluating recognition methods that do not require prior calibration of the data glove. Instead, raw sensor readings are used as input features that are directly mapped to different categories of hand shapes. An experiment was carried out in which test persons wearing a data glove had to grasp physical objects of different shapes corresponding to the various grasp types of the Schlesinger taxonomy. The collected data was comprehensively analyzed using numerous classification techniques provided in an open-source machine learning toolbox. Evaluated machine learning methods are composed of (a) 38 classifiers including different types of function learners, decision trees, rule-based learners, Bayes nets, and lazy learners; (b) data preprocessing using principal component analysis (PCA) with varying degrees of dimensionality reduction; and (c) five meta-learning algorithms under various configurations where selection of suitable base classifier combinations was informed by the results of the foregoing classifier evaluation. Classification performance was analyzed in six different settings, representing various application scenarios with differing generalization demands. The results of this work are twofold: (1) We show that a reasonably good to highly reliable recognition of grasp types can be achieved—depending on whether or not the glove user is among those training the classifier—even with uncalibrated data gloves. (2) We identify the best performing classification methods for the recognition of various grasp types. To conclude, cumbersome calibration processes before productive usage of data gloves can be spared in many situations.


2021 ◽  
Vol 8 ◽  
Author(s):  
Si Yang ◽  
Chenxi Li ◽  
Yang Mei ◽  
Wen Liu ◽  
Rong Liu ◽  
...  

Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy.


Mathematics ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1481 ◽  
Author(s):  
Samed Jukic ◽  
Muzafer Saracevic ◽  
Abdulhamit Subasi ◽  
Jasmin Kevric

This research presents the epileptic focus region localization during epileptic seizures by applying different signal processing and ensemble machine learning techniques in intracranial recordings of electroencephalogram (EEG). Multi-scale Principal Component Analysis (MSPCA) is used for denoising EEG signals and the autoregressive (AR) algorithm will extract useful features from the EEG signal. The performances of the ensemble machine learning methods are measured with accuracy, F-measure, and the area under the receiver operating characteristic (ROC) curve (AUC). EEG-based focus area localization with the proposed methods reaches 98.9% accuracy using the Rotation Forest classifier. Therefore, our results suggest that ensemble machine learning methods can be applied to differentiate the EEG signals from epileptogenic brain areas and signals recorded from non-epileptogenic brain regions with high accuracy.


Sign in / Sign up

Export Citation Format

Share Document