Feature Selection using K-Means Genetic Clustering To Predict Rheumatoid Arthritis Disease

In our Society, Aging society plays serious problems in health and medical care. When compared to other diseases in the real life Rheumatoid Arthritis disease is a common disease, Rheumatoid Arthritis is a disease that causes pain in musculoskeletal system that affect the quality of the people. Rheumatoid Arthritis is onset at middle age, but can affect children and young adults. If the disease is not monitored and treated as early as possible, it can cause serious joint deformities. Cluster analysis is an unsupervised learning technique in data mining for identifying or exploring out the structure of data without known about class label. Many clustering algorithms were proposed to analyze high volume of data, but many of them not evaluate cluster’s quality because of inconvenient features presented in the dataset. Feature selection is a prime task in data analysis in case of high dimensional dataset. Optimal subsets of features are enough to cluster the data. In this study, Rheumatoid Arthritis clinical data were analyzed to predict the patient affected with Rheumatoid Arthritis disease. In this study, KMeans clustering algorithm was used to predict the patient affected with Rheumatoid Arthritis Disease. Genetic algorithm is used to filter the feature and at the end of the process it finds optimal clusters for k-Means clustering algorithm. Based on the initial centroid , K-Means algorithm may have the chance of producing empty cluster. K-means does not effectively handle the outliers or noisy data in the dataset. K-means algorithm when combined with Genetic Algorithm shows high performance quality of clustering and fast evolution process when compared with K-Means alone. In this paper, to diagnosis Rheumatoid Arthritis disease we use machine learning algorithm FSKG. A predictive FSKG model is explored that diagnoses rheumatoid arthritis. After completing data analysis and pre-processing operations, Genetic Algorithm and K-Means Clustering Algorithm are integrated to choose correct features among all the features. Experimental Results from this study imply improved accuracy when compared to k-means algorithm for rheumatoid disease prediction.

Download Full-text

Relationship Between Rheumatoid Arthritis Disease Activity Assessed with the US7 Score and Quality of Life Measured with Questionnaires (HAQ, EQ-5D, WPAI)

Current Rheumatology Reviews ◽

10.2174/1573397113666170517160726 ◽

2017 ◽

Vol 13 (3) ◽

Cited By ~ 2

Author(s):

Martina Skacelova ◽

Horak Pavel ◽

Hermanova Zuzana ◽

Langova Katerina

Keyword(s):

Quality Of Life ◽

Rheumatoid Arthritis ◽

Disease Activity ◽

Rheumatoid Arthritis Disease Activity ◽

Rheumatoid Arthritis Disease

Download Full-text

Improved feature selection and classification for rheumatoid arthritis disease using weighted decision tree approach (REACT)

The Journal of Supercomputing ◽

10.1007/s11227-019-02800-1 ◽

2019 ◽

Vol 75 (8) ◽

pp. 5507-5519 ◽

Cited By ~ 2

Author(s):

S. Shanmugam ◽

J. Preethi

Keyword(s):

Rheumatoid Arthritis ◽

Feature Selection ◽

Decision Tree ◽

Rheumatoid Arthritis Disease ◽

Tree Approach

Download Full-text

Online Feature Selection for Robust Classification of the Microbiological Quality of Traditional Vanilla Cream by Means of Multispectral Imaging

Sensors ◽

10.3390/s19194071 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4071 ◽

Cited By ~ 1

Author(s):

Alexandra Lianou ◽

Arianna Mencattini ◽

Alexandro Catini ◽

Corrado Di Natale ◽

George-John E. Nychas ◽

...

Keyword(s):

Feature Selection ◽

Multispectral Imaging ◽

Dairy Product ◽

Real Life ◽

Microbiological Quality ◽

Support Vector ◽

Isothermal Conditions ◽

Online Feature Selection ◽

Model Training

The performance of an Unsupervised Online feature Selection (UOS) algorithm was investigated for the selection of training features of multispectral images acquired from a dairy product (vanilla cream) stored under isothermal conditions. The selected features were further used as input in a support vector machine (SVM) model with linear kernel for the determination of the microbiological quality of vanilla cream. Model training (n = 65) was based on two batches of cream samples provided directly by the manufacturer and stored at different isothermal conditions (4, 8, 12, and 15 °C), whereas model testing (n = 132) and validation (n = 48) were based on real life conditions by analyzing samples from different retail outlets as well as expired samples from the market. Qualitative analysis was performed for the discrimination of cream samples in two microbiological quality classes based on the values of total viable counts [TVC ≤ 2.0 log CFU/g (fresh samples) and TVC ≥ 6.0 log CFU/g (spoiled samples)]. Results exhibited good performance with an overall accuracy of classification for the two classes of 91.7% for model validation. Further on, the model was extended to include the samples in the TVC range 2–6 log CFU/g, using 1 log step to define the microbiological quality of classes in order to assess the potential of the model to estimate increasing microbial populations. Results demonstrated that high rates of correct classification could be obtained in the range of 2–5 log CFU/g, whereas the percentage of erroneous classification increased in the TVC class (5,6) that was close to the spoilage level of the product. Overall, the results of this study demonstrated that the UOS algorithm in tandem with spectral data acquired from multispectral imaging could be a promising method for real-time assessment of the microbiological quality of vanilla cream samples.

Download Full-text

A Novel on Transmission Line Tower Big Data Analysis Model Using Altered K-means and ADQL

Sustainability ◽

10.3390/su11133499 ◽

2019 ◽

Vol 11 (13) ◽

pp. 3499 ◽

Cited By ~ 5

Author(s):

Se-Hoon Jung ◽

Jun-Ho Huh

Keyword(s):

Big Data ◽

Data Analysis ◽

Transmission Line ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Principal Component ◽

Big Data Analysis ◽

Standard Normal Distribution ◽

Analysis Model ◽

Q Learning

This study sought to propose a big data analysis and prediction model for transmission line tower outliers to assess when something is wrong with transmission line tower big data based on deep reinforcement learning. The model enables choosing automatic cluster K values based on non-labeled sensor big data. It also allows measuring the distance of action between data inside a cluster with the Q-value representing network output in the altered transmission line tower big data clustering algorithm containing transmission line tower outliers and old Deep Q Network. Specifically, this study performed principal component analysis to categorize transmission line tower data and proposed an automatic initial central point approach through standard normal distribution. It also proposed the A-Deep Q-Learning algorithm altered from the deep Q-Learning algorithm to explore policies based on the experiences of clustered data learning. It can be used to perform transmission line tower outlier data learning based on the distance of data within a cluster. The performance evaluation results show that the proposed model recorded an approximately 2.29%~4.19% higher prediction rate and around 0.8% ~ 4.3% higher accuracy rate compared to the old transmission line tower big data analysis model.

Download Full-text

Feature Selection for Fatigue Segment Classification System Using Elitist Non-Dominated Sorting in Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.165.232 ◽

2012 ◽

Vol 165 ◽

pp. 232-236 ◽

Cited By ~ 1

Author(s):

Mohd Haniff Osman ◽

Z.M. Nopiah ◽

S. Abdullah

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification System ◽

Learning Algorithm ◽

Selection Procedure ◽

Classification Error ◽

Algorithm Selection ◽

Nsga Ii ◽

Nearest Neighbours ◽

Good Set

Having relevant features for representing dataset would motivate such algorithms to provide a highly accurate classification system in less-consuming time. Unfortunately, one good set of features is sometimes not fit to all learning algorithms. To confirm that learning algorithm selection does not weights system accuracy user has to validate that the given dataset is a feature-oriented dataset. Thus, in this study we propose a simple verification procedure based on multi objective approach by means of elitist Non-dominated Sorting in Genetic Algorithm (NSGA-II). The way NSGA-II performs in this work is quite similar to the feature selection procedure except on interpretation of the results i.e. set of optimal solutions. Two conflicting minimization elements namely classification error and number of used features are taken as objective functions. A case study of fatigue segment classification was chosen for the purpose of this study where simulations were repeated using four single classifiers such as Naive-Bayes, k nearest neighbours, decision tree and radial basis function. The proposed procedure demonstrates that only two features are needed for classifying a fatigue segment task without having to place concern on learning algorithm

Download Full-text

A Dynamic Genetic Algorithm for Clustering Problems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1884 ◽

2013 ◽

Vol 411-414 ◽

pp. 1884-1893

Author(s):

Yong Chun Cao ◽

Ya Bin Shao ◽

Shuang Liang Tian ◽

Zheng Qi Cai

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Search Space ◽

Adaptive Mutation ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Clustering Problems

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.

Download Full-text

MS3: CORRELATION OF A GENERIC HEALTH-RELATED QUALITY OF LIFE QUESTIONNAIRE AND SELF-ADMINISTERED RHEUMATOID ARTHRITIS DISEASE ACTIVITY INSTRUMENT

Value in Health ◽

10.1046/j.1524-4733.2001.40201-43.x ◽

2001 ◽

Vol 4 (2) ◽

pp. 63-64

Author(s):

SS Kim ◽

AM Drabinski ◽

GR Williams ◽

CA Formica

Keyword(s):

Quality Of Life ◽

Rheumatoid Arthritis ◽

Life Questionnaire ◽

Health Related Quality ◽

Quality Of Life Questionnaire ◽

Rheumatoid Arthritis Disease Activity ◽

Rheumatoid Arthritis Disease ◽

Related Quality ◽

Health Related

Download Full-text

Data Analysis Using Representation Theory and Clustering Algorithms

WSEAS TRANSACTIONS ON COMPUTERS ◽

10.37394/23205.2020.19.38 ◽

2021 ◽

Vol 19 ◽

pp. 310-320

Author(s):

Suboh Alkhushayni ◽

Taeyoung Choi ◽

Du’a Alzaleq

Keyword(s):

Data Analysis ◽

Random Forest ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Categorical Variables ◽

Common Disease ◽

Agglomerative Hierarchical Clustering ◽

Data Set

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.

Download Full-text

A Hybrid Feature Selection Based on Mutual Information and Genetic Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v7.i1.pp214-225 ◽

2017 ◽

Vol 7 (1) ◽

pp. 214

Author(s):

Yuan-Dong Lan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mutual Information ◽

Learning Algorithm ◽

Selection Process ◽

Optimal Subset ◽

Benchmark Datasets ◽

Two Phases ◽

Feature Dimension ◽

Necessary And Sufficient

Feature selection aims to choose an optimal subset of features that are necessary and sufficient to improve the generalization performance and the running efficiency of the learning algorithm. To get the optimal subset in the feature selection process, a hybrid feature selection based on mutual information and genetic algorithm is proposed in this paper. In order to make full use of the advantages of filter and wrapper model, the algorithm is divided into two phases: the filter phase and the wrapper phase. In the filter phase, this algorithm first uses the mutual information to sort the feature, and provides the heuristic information for the subsequent genetic algorithm, to accelerate the search process of the genetic algorithm. In the wrapper phase, using the genetic algorithm as the search strategy, considering the performance of the classifier and dimension of subset as an evaluation criterion, search the best subset of features. Experimental results on benchmark datasets show that the proposed algorithm has higher classification accuracy and smaller feature dimension, and its running time is less than the time of using genetic algorithm.

Download Full-text

Efficient Semi-Supervised Learning and Sparse Structural Learning for Feature Selection of Leukemia Dataset

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3110 ◽

2020 ◽

Vol 10 (8) ◽

pp. 1815-1824

Author(s):

S. Nithya Roopa ◽

N. Nagarajan

Keyword(s):

Feature Selection ◽

Supervised Learning ◽

Health Informatics ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Research Work ◽

Real Life ◽

Support Vector ◽

Structural Learning ◽

Huge Amount

The amount of data produced in health informatics growing large and as a result analysis of this huge amount of data requires a great knowledge which is to be gained. The basic aim of health informatics is to take in real world medical data from all levels of human existence to help improve our understanding of medicine and medical practices. Huge amount of unlabeled data are obtainable in lots of real-life data-mining tasks, e.g., uncategorized messages in an automatic email categorization system, unknown genes functions for doing gene function calculation, and so on. Labelled data is frequently restricted and expensive to produce, while labelling classically needs human proficiency. Consequently, semi-supervised learning has become a topic of significant recent interest. This research work proposed a new semi-supervised grouping, where the performance of unsupervised clustering algorithms is enhanced with restricted numbers of supervision in labels form on constraints or data. The previous system designed a Clustering Guided Hybrid support vector machine based Sparse Structural Learning (CGHSSL) for feature selection. However, it does not produce a satisfactory accuracy results. In this research, proposed clustering-guided with Convolution Neural Network (CNN) based sparse structural learning clustering algorithm. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm is progressed for learning cluster labels of input samples having more accuracy guiding features election at same time. Concurrently, prediction of cluster labels is as well performed by CNN by means of using hidden structure which is shared by various characteristics. The parameters of CNN are then optimized maximizing Multi-objective Bee Colony (MBO) algorithm that can unravel feature correlations to render outcomes with additional consistency. Row-wise sparse designs are then balanced to yield design depicted to suit for feature selection. This semi supervised algorithm is utilized to choose important characteristics from Leukemia1 dataset additional resourcefully. Therefore dataset size is decreased significantly utilizing semi supervised algorithm prominently. As well proposed Semi Supervised Clustering-Guided Sparse Structural Learning (SSCGSSL) technique is utilized to increase the clustering performance in higher. The experimental results show that the proposed system achieves better performance compared with the existing system in terms of Accuracy, Entropy, Purity, Normalized Mutual Information (NMI) and F-measure.

Download Full-text