THE USE OF MACHINE LEARNING METHODS FOR BINARY CLASSIFICATION OF THE WORKING CONDITION OF BEARINGS USING THE SIGNALS OF VIBRATION ACCELERATION

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.03 ◽

2021 ◽

pp. 15-22

Author(s):

Ruslan Babudzhan ◽

Konstantyn Isaienkov ◽

Danilo Krasiy ◽

Oleksii Vodka ◽

Ivan Zadorozhny ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Fractal Dimensions ◽

Feature Space ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Vibration Acceleration ◽

K Nearest Neighbors

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.

Download Full-text

Detection of Smoking in Indoor Environment Using Machine Learning

Applied Sciences ◽

10.3390/app10248912 ◽

2020 ◽

Vol 10 (24) ◽

pp. 8912

Author(s):

Jae Hyuk Cho

Keyword(s):

Machine Learning ◽

Indoor Air ◽

Indoor Environment ◽

Indoor Air Pollution ◽

Binary Classification ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Indoor Smoking

Revealed by the effect of indoor pollutants on the human body, indoor air quality management is increasing. In particular, indoor smoking is one of the common sources of indoor air pollution, and its harmfulness has been well studied. Accordingly, the regulation of indoor smoking is emerging all over the world. Technical approaches are also being carried out to regulate indoor smoking, but research is focused on detection hardware. This study includes analytical and machine learning approach of cigarette detection by detecting typical gases (total volatile organic compounds, CO2 etc.) being collected from IoT sensors. In detail, data set for machine learning was built using IoT sensors, including training data set securely collected from the rotary smoking machine and test data set gained from actual indoor environment with spontaneous smokers. The prediction accuracy was evaluated with accuracy, precision, and recall. As a result, the non-linear support vector machine (SVM) model showed the best performance with 93% in accuracy and 88% in the F1 score. The supervised learning k-nearest neighbors (KNN) and multilayer perceptron (MLP) models also showed relatively fine results, but shows effectivity simplifying prediction with binary classification to improve accuracy and speed.

Download Full-text

Artificially Generated Training Data-sets for Supervised Machine Learning Techniques in Magnetic Resonance Imaging: An Example in Myocardial Segmentation

2019 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2019.220 ◽

2019 ◽

Author(s):

Christos Xanthis ◽

Kostas Haris ◽

Dimitrios Filos ◽

Anthony Aletras

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Resonance Imaging ◽

Learning Techniques ◽

Myocardial Segmentation

Download Full-text

Automatic recognition of self-acknowledged limitations in clinical research literature

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy038 ◽

2018 ◽

Vol 25 (7) ◽

pp. 855-861 ◽

Cited By ~ 4

Author(s):

Halil Kilicoglu ◽

Graciela Rosemblat ◽

Mario Malički ◽

Gerben ter Riet

Keyword(s):

Machine Learning ◽

Clinical Research ◽

Binary Classification ◽

Classification Performance ◽

Research Literature ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Rule Based ◽

Research Transparency

Abstract Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff’s α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

Download Full-text

Abstract MP10: Microbiome-based Diagnostic Screening Of Cardiovascular Disease Using A Machine Learning Approach

Hypertension ◽

10.1161/hyp.76.suppl_1.mp10 ◽

2020 ◽

Vol 76 (Suppl_1) ◽

Author(s):

Sachin Aryal ◽

Ahmad Alimadadi ◽

Ishan Manandhar ◽

Bina Joe ◽

Xi Cheng

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiovascular Health ◽

Characteristic Curve ◽

Feature Space ◽

Supervised Machine Learning ◽

Support Vector ◽

Diagnostic Screening ◽

New Strategy ◽

Metagenomics Data

In recent years, the microbiome has been recognized as an important factor associated with cardiovascular disease (CVD), which is the leading cause of human mortality worldwide. Disparities in gut microbial compositions between individuals with and without CVD were reported, whereby, we hypothesized that utilizing such microbiome-based data for training with supervised machine learning (ML) models could be exploited as a new strategy for evaluation of cardiovascular health. To test our hypothesis, we analyzed the metagenomics data extracted from the American Gut Project. Specifically, 16S rRNA reads from stool samples of 478 CVD and 473 non-CVD control samples were analyzed using five supervised ML algorithms: random forest (RF), support vector machine with radial kernel (svmRadial), decision tree (DT), elastic net (ENet) and neural networks (NN). Thirty-nine differential bacterial taxa (LEfSe: LDA > 2) were identified between CVD and non-CVD groups. ML classifications, using these taxonomic features, achieved an AUC (area under the receiver operating characteristic curve) of ~0.58 (RF). However, by choosing the top 500 high-variance features of operational taxonomic units (OTUs) for training ML models, an improved AUC of ~0.65 (RF) was achieved. Further, by limiting the selection to only the top 25 highly contributing OTU features to reduce the dimensionality of feature space, the AUC was further significantly enhanced to ~0.70 (RF). In summary, this study is the first to demonstrate the successful development of a ML model using microbiome-based datasets for a systematic diagnostic screening of CVD.

Download Full-text

Supervised machine learning methods in psychology: A practical introduction with annotated R code

10.31234/osf.io/s72vu ◽

2019 ◽

Author(s):

Hannes Rosenbusch ◽

Felix Soldner ◽

Anthony M Evans ◽

Marcel Zeelenberg

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Psychological Research ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Comprehensive Overview ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Out Of Sample

Machine learning methods for pattern detection and prediction are increasingly prevalent in psychological research. We provide a comprehensive overview of machine learning, its applications, and how to implement models for research. We review fundamental concepts of machine learning, such as prediction accuracy and out-of-sample evaluation, and summarize four standard prediction algorithms: linear regressions, ridge regressions, decision trees, and random forests (plus k-nearest neighbors, Naïve Bayes classifiers, and support vector machines in the supplementary material). This selection provides a set of powerful models that are implemented regularly in machine learning projects. We demonstrate each method with examples and annotated R code, and discuss best practices for determining sample sizes; comparing model performances; tuning prediction models; preregistering prediction models; and reporting results. Finally, we discuss the value of machine learning methods in maintaining psychology’s status as a predictive science.

Download Full-text

Supervised binary classification methods for strawberry ripeness discrimination from bioimpedance data

Scientific Reports ◽

10.1038/s41598-021-90471-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Pietro Ibba ◽

Christian Tronstad ◽

Roberto Moscetti ◽

Tanja Mimmo ◽

Giuseppe Cantarella ◽

...

Keyword(s):

Time Management ◽

Binary Classification ◽

Time Estimation ◽

Supervised Machine Learning ◽

Support Vector ◽

Strawberry Fruit ◽

K Nearest Neighbors ◽

Promising Alternative ◽

Machine Learning Classification ◽

Unseen Data

AbstractStrawberry is one of the most popular fruits in the market. To meet the demanding consumer and market quality standards, there is a strong need for an on-site, accurate and reliable grading system during the whole harvesting process. In this work, a total of 923 strawberry fruit were measured directly on-plant at different ripening stages by means of bioimpedance data, collected at frequencies between 20 Hz and 300 kHz. The fruit batch was then splitted in 2 classes (i.e. ripe and unripe) based on surface color data. Starting from these data, six of the most commonly used supervised machine learning classification techniques, i.e. Logistic Regression (LR), Binary Decision Trees (DT), Naive Bayes Classifiers (NBC), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron Networks (MLP), were employed, optimized, tested and compared in view of their performance in predicting the strawberry fruit ripening stage. Such models were trained to develop a complete feature selection and optimization pipeline, not yet available for bioimpedance data analysis of fruit. The classification results highlighted that, among all the tested methods, MLP networks had the best performances on the test set, with 0.72, 0.82 and 0.73 for the F$$_1$$ 1 , F$$_{0.5}$$ 0.5 and F$$_2$$ 2 -score, respectively, and improved the training results, showing good generalization capability, adapting well to new, previously unseen data. Consequently, the MLP models, trained with bioimpedance data, are a promising alternative for real-time estimation of strawberry ripeness directly on-field, which could be a potential application technique for evaluating the harvesting time management for farmers and producers.

Download Full-text

COVID-19 Prediction Applying Supervised Machine Learning Algorithms with Comparative Analysis Using WEKA

Algorithms ◽

10.3390/a14070201 ◽

2021 ◽

Vol 14 (7) ◽

pp. 201

Author(s):

Charlyn Nayve Villavicencio ◽

Julio Jerison Escudero Macrohon ◽

Xavier Alphonse Inbaraj ◽

Jyh-Horng Jeng ◽

Jer-Guang Hsieh

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Mean Absolute Error ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbors ◽

Use Of Technology

Early diagnosis is crucial to prevent the development of a disease that may cause danger to human lives. COVID-19, which is a contagious disease that has mutated into several variants, has become a global pandemic that demands to be diagnosed as soon as possible. With the use of technology, available information concerning COVID-19 increases each day, and extracting useful information from massive data can be done through data mining. In this study, authors utilized several supervised machine learning algorithms in building a model to analyze and predict the presence of COVID-19 using the COVID-19 Symptoms and Presence dataset from Kaggle. J48 Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors and Naïve Bayes algorithms were applied through WEKA machine learning software. Each model’s performance was evaluated using 10-fold cross validation and compared according to major accuracy measures, correctly or incorrectly classified instances, kappa, mean absolute error, and time taken to build the model. The results show that Support Vector Machine using Pearson VII universal kernel outweighs other algorithms by attaining 98.81% accuracy and a mean absolute error of 0.012.

Download Full-text

Machine Learning Models for COVID-19 Detection in Brazil Based on Symptoms (Preprint)

10.2196/preprints.27293 ◽

2021 ◽

Author(s):

Íris Viana dos Santos Santana ◽

Andressa C. M. da Silveira ◽

Álvaro Sobrinho ◽

Lenardo Chaves e Silva ◽

Leandro Dias da Silva ◽

...

Keyword(s):

Machine Learning ◽

Early Stage ◽

Area Under The Curve ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Accuracy Score ◽

K Nearest Neighbors ◽

Runny Nose ◽

Extreme Gradient Boosting

BACKGROUND controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities. OBJECTIVE to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms. METHODS firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance. RESULTS the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache. CONCLUSIONS supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.

Download Full-text

Opinion Mining using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4108.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4287-4292

Keyword(s):

Machine Learning ◽

Opinion Mining ◽

Predictive Ability ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Significant Information ◽

Learning Techniques

Sentiment analysis or opinion mining has gained much attention in recent years.With the constantly evolving social networks and internet marketing sites, reviews and blogs have been obtained among them, they act as an significant source for future analysis and better decision making. These reviews are naturally unstructured and thus require pre processing and further classification to gain the significant information for future use. These reviews and blogs can be of different types such as positive, negative and neutral . Supervised machine learning techniquess help to classify these reviews. In this paper five machine learning algorithms (K-Nearest Neighbors (KNN), Decision Tree, Artificial neural networks (ANNs), Naïve bayes and Support Vector Machine (SVM))are used for classification of sentiments. These algorithms are analyzed usingTwitter dataset. Performance analysis of these algorithms are done by using various performance measures such as Accuracy, precision, recall and F-measure. The evaluation of these techniques on Twitter datasetshowed predictive ability of Machine Learning in opinion mining

Download Full-text

What can machine learning do for seismic data processing? An interpolation application

Geophysics ◽

10.1190/geo2016-0300.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. V163-V177 ◽

Cited By ~ 55

Author(s):

Yongna Jia ◽

Jianwei Ma

Keyword(s):

Machine Learning ◽

Seismic Data ◽

Input Data ◽

Interpolation Method ◽

Window Size ◽

Tight Frame ◽

Training Data ◽

Low Rank ◽

Support Vector ◽

Data Sets

Machine learning (ML) systems can automatically mine data sets for hidden features or relationships. Recently, ML methods have become increasingly used within many scientific fields. We have evaluated common applications of ML, and then we developed a novel method based on the classic ML method of support vector regression (SVR) for reconstructing seismic data from under-sampled or missing traces. First, the SVR method mines a continuous regression hyperplane from training data that indicates the hidden relationship between input data with missing traces and output completed data, and then it interpolates missing seismic traces for other input data by using the learned hyperplane. The key idea of our new ML method is significantly different from that of many previous interpolation methods. Our method depends on the characteristics of the training data, rather than the assumptions of linear events, sparsity, or low rank. Therefore, it can break out the previous assumptions or constraints and show universality to different data sets. In addition, our method dramatically reduces the manual workload; for example, it allows users to avoid selecting the window size parameters, as is required for methods based on the assumption of linear events. The ML method facilitates intelligent interpolation between data sets with similar geomorphological structures, which can significantly reduce costs in engineering applications. Furthermore, we combine a sparse transform called the data-driven tight frame (so-called compressed learning) with the SVR method to improve the training performance, in which the training is implemented in a sparse coefficient domain rather than in the data domain. Numerical experiments show the competitive performance of our method in comparison with the traditional [Formula: see text]-[Formula: see text] interpolation method.

Download Full-text