scholarly journals CLASSIFICATION MODEL BUILDING USING MACHINE LEARNING METHODS

Author(s):  
Oleksandr PISKUN ◽  
2021 ◽  
Vol 15 ◽  
Author(s):  
Jacob Tryon ◽  
Ana Luisa Trejos

Wearable robotic exoskeletons have emerged as an exciting new treatment tool for disorders affecting mobility; however, the human–machine interface, used by the patient for device control, requires further improvement before robotic assistance and rehabilitation can be widely adopted. One method, made possible through advancements in machine learning technology, is the use of bioelectrical signals, such as electroencephalography (EEG) and electromyography (EMG), to classify the user's actions and intentions. While classification using these signals has been demonstrated for many relevant control tasks, such as motion intention detection and gesture recognition, challenges in decoding the bioelectrical signals have caused researchers to seek methods for improving the accuracy of these models. One such method is the use of EEG–EMG fusion, creating a classification model that decodes information from both EEG and EMG signals simultaneously to increase the amount of available information. So far, EEG–EMG fusion has been implemented using traditional machine learning methods that rely on manual feature extraction; however, new machine learning methods have emerged that can automatically extract relevant information from a dataset, which may prove beneficial during EEG–EMG fusion. In this study, Convolutional Neural Network (CNN) models were developed using combined EEG–EMG inputs to determine if they have potential as a method of EEG–EMG fusion that automatically extracts relevant information from both signals simultaneously. EEG and EMG signals were recorded during elbow flexion–extension and used to develop CNN models based on time–frequency (spectrogram) and time (filtered signal) domain image inputs. The results show a mean accuracy of 80.51 ± 8.07% for a three-class output (33.33% chance level), with an F-score of 80.74%, using time–frequency domain-based models. This work demonstrates the viability of CNNs as a new method of EEG–EMG fusion and evaluates different signal representations to determine the best implementation of a combined EEG–EMG CNN. It leverages modern machine learning methods to advance EEG–EMG fusion, which will ultimately lead to improvements in the usability of wearable robotic exoskeletons.


2021 ◽  
Vol 25 (5) ◽  
pp. 1291-1322
Author(s):  
Sandeep Kumar Singla ◽  
Rahul Dev Garg ◽  
Om Prakash Dubey

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis based on machine learning methods. A number of machine learning data analytical tools may be exploited for the classification and regression problems. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market. Machine learning ensemble methods such as Decision Tree (C5.0), Classification and Regression (CART), Gradient Boosting Machine (GBM) and Random Forest (RF) has been investigated in the proposed work. The proposed work demonstrates that temporal variations in the spectral data and computational efficiency of machine learning methods may be effectively used for the discrimination of types of sugarcane. The discrimination has been considered as a binary classification problem to segregate ratoon from plantation sugarcane. Variable importance selection based on Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG) have been used to create the appropriate dataset for the classification. The performance of the binary classification model based on RF is the best in all the possible combination of input images. Feature selection based on MDA and MDG measures of RF is also important for the dimensionality reduction. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM method is the lowest. Binary classification based on the remotely sensed data can be effectively handled using random forest method.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Paolo Fusar-Poli ◽  
Dominic Stringer ◽  
Alice M. S. Durieux ◽  
Grazia Rutigliano ◽  
Ilaria Bonoldi ◽  
...  

Abstract Predicting the onset of psychosis in individuals at-risk is based on robust prognostic model building methods including a priori clinical knowledge (also termed clinical-learning) to preselect predictors or machine-learning methods to select predictors automatically. To date, there is no empirical research comparing the prognostic accuracy of these two methods for the prediction of psychosis onset. In a first experiment, no improved performance was observed when machine-learning methods (LASSO and RIDGE) were applied—using the same predictors—to an individualised, transdiagnostic, clinically based, risk calculator previously developed on the basis of clinical-learning (predictors: age, gender, age by gender, ethnicity, ICD-10 diagnostic spectrum), and externally validated twice. In a second experiment, two refined versions of the published model which expanded the granularity of the ICD-10 diagnosis were introduced: ICD-10 diagnostic categories and ICD-10 diagnostic subdivisions. Although these refined versions showed an increase in apparent performance, their external performance was similar to the original model. In a third experiment, the three refined models were analysed under machine-learning and clinical-learning with a variable event per variable ratio (EPV). The best performing model under low EPVs was obtained through machine-learning approaches. The development of prognostic models on the basis of a priori clinical knowledge, large samples and adequate events per variable is a robust clinical prediction method to forecast psychosis onset in patients at-risk, and is comparable to machine-learning methods, which are more difficult to interpret and implement. Machine-learning methods should be preferred for high dimensional data when no a priori knowledge is available.


Logistics ◽  
2020 ◽  
Vol 4 (4) ◽  
pp. 35
Author(s):  
Sidharth Sankhye ◽  
Guiping Hu

The rising popularity of smart factories and Industry 4.0 has made it possible to collect large amounts of data from production stages. Thus, supervised machine learning methods such as classification can viably predict product compliance quality using manufacturing data collected during production. Elimination of uncertainty via accurate prediction provides significant benefits at any stage in a supply chain. Thus, early knowledge of product batch quality can save costs associated with recalls, packaging, and transportation. While there has been thorough research on predicting the quality of specific manufacturing processes, the adoption of classification methods to predict the overall compliance of production batches has not been extensively investigated. This paper aims to design machine learning based classification methods for quality compliance and validate the models via case study of a multi-model appliance production line. The proposed classification model could achieve an accuracy of 0.99 and Cohen’s Kappa of 0.91 for the compliance quality of unit batches. Thus, the proposed method would enable implementation of a predictive model for compliance quality. The case study also highlights the importance of feature construction and dataset knowledge in training classification models.


2021 ◽  
Vol 11 ◽  
Author(s):  
Mengya Li ◽  
Haiyan He ◽  
Guorong Huang ◽  
Bo Lin ◽  
Huiyan Tian ◽  
...  

Gastric cancer (GC) is the fifth most common cancer in the world and a serious threat to human health. Due to its high morbidity and mortality, a simple, rapid and accurate early screening method for GC is urgently needed. In this study, the potential of Raman spectroscopy combined with different machine learning methods was explored to distinguish serum samples from GC patients and healthy controls. Serum Raman spectra were collected from 109 patients with GC (including 35 in stage I, 14 in stage II, 35 in stage III, and 25 in stage IV) and 104 healthy volunteers matched for age, presenting for a routine physical examination. We analyzed the difference in serum metabolism between GC patients and healthy people through a comparative study of the average Raman spectra of the two groups. Four machine learning methods, one-dimensional convolutional neural network, random forest, support vector machine, and K-nearest neighbor were used to explore identifying two sets of Raman spectral data. The classification model was established by using 70% of the data as a training set and 30% as a test set. Using unseen data to test the model, the RF model yielded an accuracy of 92.8%, and the sensitivity and specificity were 94.7% and 90.8%. The performance of the RF model was further confirmed by the receiver operating characteristic (ROC) curve, with an area under the curve (AUC) of 0.9199. This exploratory work shows that serum Raman spectroscopy combined with RF has great potential in the machine-assisted classification of GC, and is expected to provide a non-destructive and convenient technology for the screening of GC patients.


2018 ◽  
Vol 226 (4) ◽  
pp. 259-273 ◽  
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Abstract. Machine learning tools are increasingly used in social sciences and policy fields due to their increase in predictive accuracy. However, little research has been done on how well the models of machine learning methods replicate across samples. We compare machine learning methods with regression on the replicability of variable selection, along with predictive accuracy, using an empirical dataset as well as simulated data with additive, interaction, and non-linear squared terms added as predictors. Methods analyzed include support vector machines (SVM), random forests (RF), multivariate adaptive regression splines (MARS), and the regularized regression variants, least absolute shrinkage and selection operator (LASSO), and elastic net. In simulations with additive and linear interactions, machine learning methods performed similarly to regression in replicating predictors; they also performed mostly equal or below regression on measures of predictive accuracy. In simulations with square terms, machine learning methods SVM, RF, and MARS improved predictive accuracy and replicated predictors better than regression. Thus, in simulated datasets, the gap between machine learning methods and regression on predictive measures foreshadowed the gap in variable selection. In replications on the empirical dataset, however, improved prediction by machine learning methods was not accompanied by a visible improvement in replicability in variable selection. This disparity is explained by the overall explanatory power of the models. When predictors have small effects and noise predominates, improved global measures of prediction in a sample by machine learning methods may not lead to the robust selection of predictors; thus, in the presence of weak predictors and noise, regression remains a useful tool for model building and replication.


Sign in / Sign up

Export Citation Format

Share Document