Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification

Journal for the Measurement of Physical Behaviour ◽

10.1123/jmpb.2020-0016 ◽

2020 ◽

pp. 1-9

Author(s):

Supun Nakandala ◽

Marta M. Jankowska ◽

Fatima Tuz-Zahra ◽

John Bellettiere ◽

Jordan A. Carlson ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Logistic Regression ◽

Random Forest ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Feature Engineering ◽

Free Living ◽

Data Set

Background: Machine learning has been used for classification of physical behavior bouts from hip-worn accelerometers; however, this research has been limited due to the challenges of directly observing and coding human behavior “in the wild.” Deep learning algorithms, such as convolutional neural networks (CNNs), may offer better representation of data than other machine learning algorithms without the need for engineered features and may be better suited to dealing with free-living data. The purpose of this study was to develop a modeling pipeline for evaluation of a CNN model on a free-living data set and compare CNN inputs and results with the commonly used machine learning random forest and logistic regression algorithms. Method: Twenty-eight free-living women wore an ActiGraph GT3X+ accelerometer on their right hip for 7 days. A concurrently worn thigh-mounted activPAL device captured ground truth activity labels. The authors evaluated logistic regression, random forest, and CNN models for classifying sitting, standing, and stepping bouts. The authors also assessed the benefit of performing feature engineering for this task. Results: The CNN classifier performed best (average balanced accuracy for bout classification of sitting, standing, and stepping was 84%) compared with the other methods (56% for logistic regression and 76% for random forest), even without performing any feature engineering. Conclusion: Using the recent advancements in deep neural networks, the authors showed that a CNN model can outperform other methods even without feature engineering. This has important implications for both the model’s ability to deal with the complexity of free-living data and its potential transferability to new populations.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

A Novel Ensemble Stacking Classification of Genetic Variations Using Machine Learning Algorithms

International Journal of Image and Graphics ◽

10.1142/s0219467823500158 ◽

2021 ◽

Author(s):

Jahnavi Yeturu ◽

Poongothai Elango ◽

S. P. Raja ◽

P. Nagendra Kumar

Keyword(s):

Machine Learning ◽

Heart Diseases ◽

Learning Algorithms ◽

Genetic Mutation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Genetic Mutations ◽

Validation Data ◽

Data Set

Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods

Applied System Innovation ◽

10.3390/asi4040077 ◽

2021 ◽

Vol 4 (4) ◽

pp. 77

Author(s):

Md. Murad Hossain ◽

Md. Asadullah ◽

Abidur Rahaman ◽

Md. Sipon Miah ◽

M. Zahid Hasan ◽

...

Keyword(s):

Machine Learning ◽

Domestic Violence ◽

Logistic Regression ◽

Random Forest ◽

Family Violence ◽

Violence Against Women ◽

Machine Learning Algorithms ◽

Data Set ◽

Domestic Violence Against Women ◽

Women And Children

The COVID-19 outbreak resulted in preventative measures and restrictions for Bangladesh during the summer of 2020—these unstable and stressful times led to multiple social problems (e.g., domestic violence and divorce). Globally, researchers, policymakers, governments, and civil societies have been concerned about the increase in domestic violence against women and children during the ongoing COVID-19 pandemic. In Bangladesh, domestic violence against women and children has increased during the COVID-19 pandemic. In this article, we investigated family violence among 511 families during the COVID-19 outbreak. Participants were given questionnaires to answer, for a period of over ten days; we predicted family violence using a machine learning-based model. To predict domestic violence from our data set, we applied random forest, logistic regression, and Naive Bayes machine learning algorithms to our model. We employed an oversampling strategy named the Synthetic Minority Oversampling Technique (SMOTE) and the chi-squared statistical test to, respectively, solve the imbalance problem and discover the feature importance of our data set. The performances of the machine learning algorithms were evaluated based on accuracy, precision, recall, and F-score criteria. Finally, the receiver operating characteristic (ROC) and confusion matrices were developed and analyzed for three algorithms. On average, our model, with the random forest, logistic regression, and Naive Bayes algorithms, predicted family violence with 77%, 69%, and 62% accuracy for our data set. The findings of this study indicate that domestic violence has increased and is highly related to two features: family income level during the COVID-19 pandemic and education level of the family members.

Download Full-text

Prediction of Cardiovascular Disease using Machine Learning Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3986.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2404-2414

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Logistic Regression ◽

Human Life ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Noise Data ◽

The Individual

Background/Aim: Healthcare is an unavoidable assignment to be done in human life. Cardiovascular sickness is a general class for a scope of infections that are influencing heart and veins. The early strategies for estimating the cardiovascular sicknesses helped in settling on choices about the progressions to have happened in high-chance patients which brought about the decrease of their dangers. Methods: In the proposed research, we have considered informational collection from kaggle and it doesn't require information pre-handling systems like the expulsion of noise data, evacuation of missing information, filling default esteems if applicable and classification of attributes for prediction and decision making at different levels. The performance of the diagnosis model is obtained by using methods like classification, accuracy, sensitivity and specificity analysis. This paper proposes a prediction model to predict whether a people have a cardiovascular disease or not and to provide an awareness or diagnosis on that. This is done by comparing the accuracies of applying rules to the individual results of Support Vector Machine, Random forest, Naive Bayes classifier and logistic regression on the dataset taken in a region to present an accurate model of predicting cardiovascular disease. Results: The machine learning algorithms under study were able to predict cardiovascular disease in patients with accuracy between 58.71% and 77.06%. Conclusions: It was shown that Logistic Regression has better Accuracy (77.06 %) when compared to different Machine-learning Algorithms.

Download Full-text

Waste Management System Fraud Detection Using Machine Learning Algorithms to Minimize Penalties Avoidance and Redemption Abuse

Recycling ◽

10.3390/recycling6040065 ◽

2021 ◽

Vol 6 (4) ◽

pp. 65

Author(s):

Ali Hewiagh ◽

Kannan Ramakrishnan ◽

Timothy Tzen Vun Yap ◽

Ching Seong Tan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Waste Management ◽

Management System ◽

Learning Algorithms ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Waste Management System

Online frauds have pernicious impacts on different system domains, including waste management systems. Fraudsters illegally obtain rewards for their recycling activities or avoid penalties for those who are required to recycle their own waste. Although some approaches have been introduced to prevent such fraudulent activities, the fraudsters continuously seek new ways to commit illegal actions. Machine learning technology has shown significant and impressive results in identifying new online fraud patterns in different system domains such as e-commerce, insurance, and banking. The purpose of this paper, therefore, is to analyze a waste management system and develop a machine learning model to detect fraud in the system. The intended system allows consumers, individuals, and organizations to track, monitor, and update their performance in their recycling activities. The data set provided by a waste management organization is used for the analysis and the model training. This data set contains transactions of users’ recycling activities and behaviors. Three machine learning algorithms, random forest, support vector machine, and multi-layer perceptron are used in the experiments and the best detection model is selected based on the model’s performance. Results show that each of these algorithms can be used for fraud detection in waste managements with high accuracy. The random forest algorithm produces the optimal model with an accuracy of 96.33%, F1-score of 95.20%, and ROC of 98.92%.

Download Full-text

Classification of Medical Thermograms Belonging Neonates by Using Segmentation, Feature Engineering and Machine Learning Algorithms

Traitement du signal ◽

10.18280/ts.370409 ◽

2020 ◽

Vol 37 (4) ◽

pp. 611-617

Author(s):

Ahmet H. Ornek ◽

Saim Ervural ◽

Murat Ceylan ◽

Murat Konak ◽

Hanifi Soylu ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Intensive Care Unit ◽

Neonatal Intensive Care ◽

Local Binary Pattern ◽

Intelligent System ◽

Machine Learning Algorithms ◽

Feature Engineering ◽

Harmful Radiation

Monitoring and evaluating the skin temperature value are considerably important for neonates. A system detecting diseases without any harmful radiation in early stages could be developed thanks to thermography. This study is aimed at detecting healthy/unhealthy neonates in neonatal intensive care unit (NICU). We used 40 different thermograms belonging 20 healthy and 20 unhealthy neonates. Thermograms were exported to thermal maps, and subsequently, the thermal maps were converted to a segmented thermal map. Local binary pattern and fast correlation-based filter (FCBF) were applied to extract salient features from thermal maps and to select significant features, respectively. Finally, the obtained features are classified as healthy and unhealthy with decision tree, artificial neural networks (ANN), logistic regression, and random forest algorithms. The best result was obtained as 92.5% accuracy (100% sensitivity and 85% specificity). This study proposes fast and reliable intelligent system for the detection of healthy/unhealthy neonates in NICU.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Statistical Analysis for Selective Identifications of VOCs by Using Surface Functionalized MoS2 Based Sensor Array

Chemistry Proceedings ◽

10.3390/csac2021-10451 ◽

2021 ◽

Vol 5 (1) ◽

pp. 35

Author(s):

Uttam Narendra Thakur ◽

Radha Bhardwaj ◽

Arnab Hazra

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Sensor Array ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Human Breath

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.

Download Full-text