Hybrid gene selection method based on mutual information technique and dragonfly optimization algorithm

One of the most prevalent problems with big data is that many of the features are irrelevant. Gene selection has been shown to improve the outcomes of many algorithms, but it is a difficult task in microarray data mining because most microarray datasets have only a few hundred records but thousands of variables. This type of dataset increases the chances of discovering incorrect predictions due to chance. Finding the most relevant genes is generally the most difficult part of creating a reliable classification model. Irrelevant and duplicated attributes have a negative impact on categorization algorithms’ accuracy. Many Machine Learning-based Gene Selection methods have been explored in the literature, with the aim of improving dimensionality reduction precision. Gene selection is a technique for extracting the most relevant data from a series of datasets. The classification method, which can be used in machine learning, pattern recognition, and signal processing, will benefit from further developments in the Gene selection technique. The goal of the feature selection is to select the smallest subset of features but carrying as much information about the class as possible. This paper models the gene selection approach as a binary-based optimization algorithm in discrete space, which directs binary dragonfly optimization algorithm «BDA» and verifies it in a chosen fitness function utilizing precision of the dataset’s k-nearest neighbors’ classifier. The experimental results revealed that the proposed algorithm, dubbed MI-BDA, in terms of precision of results as measured by cost of calculations and classification accuracy, it outperforms other algorithms

Download Full-text

Machine Learning Approach to Dysphonia Detection

Applied Sciences ◽

10.3390/app8101927 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1927 ◽

Cited By ~ 1

Author(s):

Zuzana Dankovičová ◽

Dávid Sovák ◽

Peter Drotár ◽

Liberios Vokorokos

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Nearest Neighbors ◽

Classification Model ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Speech Features

This paper addresses the processing of speech data and their utilization in a decision support system. The main aim of this work is to utilize machine learning methods to recognize pathological speech, particularly dysphonia. We extracted 1560 speech features and used these to train the classification model. As classifiers, three state-of-the-art methods were used: K-nearest neighbors, random forests, and support vector machine. We analyzed the performance of classifiers with and without gender taken into account. The experimental results showed that it is possible to recognize pathological speech with as high as a 91.3% classification accuracy.

Download Full-text

Hybrid Binary Dragonfly Optimization Algorithm with Statistical Dependence for Feature Selection

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2020.5.6.105 ◽

2020 ◽

Vol 5 (6) ◽

pp. 1420-1428

Author(s):

Omar S. Qasim ◽

Mohammed Sabah Mahmoud ◽

Fatima Mahmood Hasan

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Fitness Function ◽

Feature Selection Method ◽

Statistical Dependence ◽

K Nearest Neighbors ◽

Feature Selection Technique ◽

Selection Technique ◽

Dragonfly Algorithm ◽

The Cost

The aim of the feature selection technique is to obtain the most important information from a specific set of datasets. Further elaborations in the feature selection technique will positively affect the classification process, which can be applied in various areas such as machine learning, pattern recognition, and signal processing. In this study, a hybrid algorithm between the binary dragonfly algorithm (BDA) and the statistical dependence (SD) is presented, whereby the feature selection method in discrete space is modeled as a binary-based optimization algorithm, guiding BDA and using the accuracy of the k-nearest neighbors classifier on the dataset to verify it in the chosen fitness function. The experimental results demonstrated that the proposed algorithm, which we refer to as SD-BDA, outperforms other algorithms in terms of the accuracy of the results represented by the cost of the calculations and the accuracy of the classification.

Download Full-text

Classification of Aggressive Movements Using Smartwatches

Sensors ◽

10.3390/s20216377 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6377

Author(s):

Franck Tchuente ◽

Natalie Baddour ◽

Edward D. Lemaire

Keyword(s):

Machine Learning ◽

Random Forest ◽

Aggressive Behavior ◽

Naive Bayes ◽

Poor Performance ◽

Naïve Bayes ◽

Classification Model ◽

Support Vector ◽

Care Providers ◽

K Nearest Neighbors

Recognizing aggressive movements is a challenging task in human activity recognition. Wearable smartwatch technology with machine learning may be a viable approach for human aggressive behavior classification. This research identified a viable classification model and feature selector (CM-FS) combination for separating aggressive from non-aggressive movements using smartwatch data and determined if only one smartwatch is sufficient for this task. A ranking method was used to select relevant CM-FS models across accuracy, sensitivity, specificity, precision, F-score, and Matthews correlation coefficient (MCC). The Waikato environment for knowledge analysis (WEKA) was used to run 6 machine learning classifiers (random forest, k-nearest neighbors (kNN), multilayer perceptron neural network (MP), support vector machine, naïve Bayes, decision tree) coupled with three feature selectors (ReliefF, InfoGain, Correlation). Microsoft Band 2 accelerometer and gyroscope data were collected during an activity circuit that included aggressive (punching, shoving, slapping, shaking) and non-aggressive (clapping hands, waving, handshaking, opening/closing a door, typing on a keyboard) tasks. A combination of kNN and ReliefF was the best CM-FS model for separating aggressive actions from non-aggressive actions, with 99.6% accuracy, 98.4% sensitivity, 99.8% specificity, 98.9% precision, 0.987 F-score, and 0.984 MCC. kNN and random forest classifiers, combined with any of the feature selectors, generated the top models. Models with naïve Bayes or support vector machines had poor performance for sensitivity, F-score, and MCC. Wearing the smartwatch on the dominant wrist produced the best single-watch results. The kNN and ReliefF combination demonstrated that this smartwatch-based approach is a viable solution for identifying aggressive behavior. This wrist-based wearable sensor approach could be used by care providers in settings where people suffer from dementia or mental health disorders, where random aggressive behaviors often occur.

Download Full-text

Using Machine Learning to Build a Classification Model for IoT Networks to Detect Attack Signatures

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2020.12607 ◽

2020 ◽

Vol 12 (6) ◽

pp. 99-116

Author(s):

Mousa Al-Akhras ◽

Mohammed Alawairdhi ◽

Ali Alkoudari ◽

Samer Atawneh

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Denial Of Service ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Classification Model ◽

Security And Privacy ◽

K Nearest Neighbors ◽

Detection Model

Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.

Download Full-text

Comparison of Classification Methods used in Machine Learning for Dysgraphia Identification

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i11.6142 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1886-1891

Author(s):

Sarthika Dutt, Et. al.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Feature Selection Technique ◽

Random Forest Classification ◽

Visual Spatial ◽

Forest Classification

Dysgraphia is a disorder that affects writing skills. Dysgraphia Identification at an early age of a child's development is a difficult task. It can be identified using problematic skills associated with Dysgraphia difficulty. In this study motor ability, space knowledge, copying skill, Visual Spatial Response are some of the features included for Dysgraphia identification. The features that affect Dysgraphia disability are analyzed using a feature selection technique EN (Elastic Net). The significant features are classified using machine learning techniques. The classification models compared are KNN (K-Nearest Neighbors), Naïve Bayes, Decision tree, Random Forest, SVM (Support Vector Machine) on the Dysgraphia dataset. Results indicate the highest performance of the Random forest classification model for Dysgraphia identification.

Download Full-text

A Study on Optimization Algorithm (OA) in Machine Learning and Hierarchical Information

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8434 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1733-1736

Author(s):

N. Pooranam ◽

M. Nithya ◽

D. Praveen Kumar ◽

Rashmi P. Nayak ◽

G. Rakesh

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Optimization Algorithm ◽

Fitness Function ◽

Second Step ◽

Local Maxima ◽

Specific Direction ◽

Random Probability ◽

The Given ◽

Over Time

Genetic Algorithm is a division of machine learning, where the computers are programmed to teach themselves to complete the given task over time. In our project, we simulate many rockets to fly towards the target specified. Genetic algorithm revolves around three main concepts. First generate a population of random rockets that fly in random directions. Each rocket is implemented as an array of Vectors, where each vector points to a specific direction at a given time. We then apply a fitness function that calculates the best performing rockets in each generation. With the fitness function, we now select the best rockets with which we form the next population. This involves two steps: First step is the crossover. Choose two parents i.e., two rockets and use their vector values to create a child rocket. This is done by retrieving the first half vectors from the first parent and second half vectors from the second parent and fuses them to build the child rocket, Second step is the mutation. This step is very crucial. If mutation is not applied, we will receive a new population that is only built around best performing ones from the previous population.We will then land in local maxima and may never reach the target. Mutation helps create individual rockets that go beyond the local maxima to reach the target. But over mutation will lead to too much diversity that is not beneficial to the system. Thus, define a mutation rate that is optimally balanced. In mutation, we choose a rocket with random probability, and alter its vector values randomly. This new population of rockets forms the next generation.

Download Full-text

An Effecient Fake News Detection System Using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9453.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3125-3129 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Negative Impact ◽

Detection System ◽

Vital Role ◽

Machine Learning Algorithms ◽

Easy Access ◽

Fake News ◽

K Nearest Neighbors

Social media plays a major role in several things in our life. Social media helps all of us to find some important news with low price. It also provides easy access in less time. But sometimes social media gives a chance for the fast-spreading of fake news. So there is a possibility that less quality news with false information is spread through the social media. This shows a negative impact on the number of people. Sometimes it may impact society also. So, detection of fake news has vast importance. Machine learning algorithms play a vital role in fake news detection; Especially NLP (Natural Language Processing) algorithms are very useful for detecting the fake news. In this paper, we employed machine learning classifiers SVM, K-Nearest Neighbors, Decision tree, Random forest. By using these classifiers we successfully build a model to detect fake news from the given dataset. Python language was used for experiments.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

Using Machine Learning for Quantum Annealing Accuracy Prediction

Algorithms ◽

10.3390/a14060187 ◽

2021 ◽

Vol 14 (6) ◽

pp. 187

Author(s):

Aaron Barbosa ◽

Elijah Pelofske ◽

Georg Hahn ◽

Hristo N. Djidjev

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Classification Model ◽

Maximum Clique Problem ◽

Problem Instance ◽

Np Hard ◽

Machine Learning Classification ◽

Hard Problems ◽

Problem Instances ◽

D Wave

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.

Download Full-text