classification algorithms
Recently Published Documents





2022 ◽  
Vol 16 (3) ◽  
pp. 1-37
Robert A. Sowah ◽  
Bernard Kuditchar ◽  
Godfrey A. Mills ◽  
Amevi Acakpovi ◽  
Raphael A. Twum ◽  

Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.

Shankar Shambhu ◽  
Deepika Koundal ◽  
Prasenjit Das ◽  
Chetan Sharma

COVID-19 pandemic has hit the world with such a force that the world's leading economies are finding it challenging to come out of it. Countries with the best medical facilities are even cannot handle the increasing number of cases and fatalities. This disease causes significant damage to the lungs and respiratory system of humans, leading to their death. Computed tomography (CT) images of the respiratory system are analyzed in the proposed work to classify the infected people with non-infected people. Deep learning binary classification algorithms have been applied, which have shown an accuracy of 86.9% on 746 CT images of chest having COVID-19 related symptoms.

Hassan Najadat ◽  
Mohammad A. Alzubaidi ◽  
Islam Qarqaz

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262417
Cédric Simar ◽  
Robin Petit ◽  
Nichita Bozga ◽  
Axelle Leroy ◽  
Ana-Maria Cebolla ◽  

Objective Different visual stimuli are classically used for triggering visual evoked potentials comprising well-defined components linked to the content of the displayed image. These evoked components result from the average of ongoing EEG signals in which additive and oscillatory mechanisms contribute to the component morphology. The evoked related potentials often resulted from a mixed situation (power variation and phase-locking) making basic and clinical interpretations difficult. Besides, the grand average methodology produced artificial constructs that do not reflect individual peculiarities. This motivated new approaches based on single-trial analysis as recently used in the brain-computer interface field. Approach We hypothesize that EEG signals may include specific information about the visual features of the displayed image and that such distinctive traits can be identified by state-of-the-art classification algorithms based on Riemannian geometry. The same classification algorithms are also applied to the dipole sources estimated by sLORETA. Main results and significance We show that our classification pipeline can effectively discriminate between the display of different visual items (Checkerboard versus 3D navigational image) in single EEG trials throughout multiple subjects. The present methodology reaches a single-trial classification accuracy of about 84% and 93% for inter-subject and intra-subject classification respectively using surface EEG. Interestingly, we note that the classification algorithms trained on sLORETA sources estimation fail to generalize among multiple subjects (63%), which may be due to either the average head model used by sLORETA or the subsequent spatial filtering failing to extract discriminative information, but reach an intra-subject classification accuracy of 82%.

Ignace T. C. Hooge ◽  
Diederick C. Niehorster ◽  
Marcus Nyström ◽  
Richard Andersson ◽  
Roy S. Hessels

AbstractEye trackers are applied in many research fields (e.g., cognitive science, medicine, marketing research). To give meaning to the eye-tracking data, researchers have a broad choice of classification methods to extract various behaviors (e.g., saccade, blink, fixation) from the gaze signal. There is extensive literature about the different classification algorithms. Surprisingly, not much is known about the effect of fixation and saccade selection rules that are usually (implicitly) applied. We want to answer the following question: What is the impact of the selection-rule parameters (minimal saccade amplitude and minimal fixation duration) on the distribution of fixation durations? To answer this question, we used eye-tracking data with high and low quality and seven different classification algorithms. We conclude that selection rules play an important role in merging and selecting fixation candidates. For eye-tracking data with good-to-moderate precision (RMSD < 0.5∘), the classification algorithm of choice does not matter too much as long as it is sensitive enough and is followed by a rule that selects saccades with amplitudes larger than 1.0∘ and a rule that selects fixations with duration longer than 60 ms. Because of the importance of selection, researchers should always report whether they performed selection and the values of their parameters.

Ding Li ◽  
Scott Dick

AbstractGraph-based algorithms are known to be effective approaches to semi-supervised learning. However, there has been relatively little work on extending these algorithms to the multi-label classification case. We derive an extension of the Manifold Regularization algorithm to multi-label classification, which is significantly simpler than the general Vector Manifold Regularization approach. We then augment our algorithm with a weighting strategy to allow differential influence on a model between instances having ground-truth vs. induced labels. Experiments on four benchmark multi-label data sets show that the resulting algorithm performs better overall compared to the existing semi-supervised multi-label classification algorithms at various levels of label sparsity. Comparisons with state-of-the-art supervised multi-label approaches (which of course are fully labeled) also show that our algorithm outperforms all of them even with a substantial number of unlabeled examples.

SinkrOn ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 59-65
Artika Arista

Many people today are unsure whether they have COVID-19. The frequent fever, dry cough, and sore throat are all signs and symptoms of COVID-19. If a person has signs or symptoms of coronavirus disease 2019 (COVID-19), he/she should see the doctor or go to a clinic as soon as possible. As a result, it's vital to learn and comprehend the fundamental differences. COVID-19 can cause a wide range of symptoms. The experiments were carried out using two Machine Learning Classification Algorithms, namely Decision Tree (DT) and Logistic Regression (LR). Both algorithms were written and analyzed using the Python program in Jupyter Notebook 6.4.5. From the results obtained in the experiments of covid symptoms dataset, on average, the DT model has obtained the best cross-validation average and the testing performance average compared to the LR machine learning models. For cross-validation results, the DT model has achieved an accuracy of 98.0%. For performance testing, the DT model has achieved an accuracy of 98.0%. The LR has obtained the second-best result on the average of cross-validation performance and the testing results. For cross-validation results, the LR model has achieved an accuracy of 96.0%. For performance testing, the LR model has achieved an accuracy of 97.0%. Consequently, the DT for the COVID-19 symptoms dataset is outperforming the LR for cross-validation and testing results.

The IoT is a new concept that provides a world where smart, connected, embedded systems operate, giving rise to the amount of data from different sources that are considered to have highly useful and valuable information. Data mining would play a critical role in creating smarter IoT. Traditional care of an elderly person is a difficult and complex task. The need to have a caregiver with the elderly person almost all the time drains the human and financial resources of the health care system. The emergence of Artificial intelligence has allowed the conception of technical assistance where it helps and reduces the time spent by the caregiver with the elderly person. This work aims to focus on analyzing techniques that are used for prediction purposes of falls in the elderly. We examine the applicability of three classification algorithms for IoT data. These algorithms are analyzed and a comparative study is undertaken to find the classifier that performs the best analysis on the dataset using a set of predefined performance metrics to compare the results of each classifier.

Sign in / Sign up

Export Citation Format

Share Document