scholarly journals Feature Selection using K-Means Genetic Clustering To Predict Rheumatoid Arthritis Disease

2019 ◽  
Vol 8 (3) ◽  
pp. 7020-7023

In our Society, Aging society plays serious problems in health and medical care. When compared to other diseases in the real life Rheumatoid Arthritis disease is a common disease, Rheumatoid Arthritis is a disease that causes pain in musculoskeletal system that affect the quality of the people. Rheumatoid Arthritis is onset at middle age, but can affect children and young adults. If the disease is not monitored and treated as early as possible, it can cause serious joint deformities. Cluster analysis is an unsupervised learning technique in data mining for identifying or exploring out the structure of data without known about class label. Many clustering algorithms were proposed to analyze high volume of data, but many of them not evaluate cluster’s quality because of inconvenient features presented in the dataset. Feature selection is a prime task in data analysis in case of high dimensional dataset. Optimal subsets of features are enough to cluster the data. In this study, Rheumatoid Arthritis clinical data were analyzed to predict the patient affected with Rheumatoid Arthritis disease. In this study, KMeans clustering algorithm was used to predict the patient affected with Rheumatoid Arthritis Disease. Genetic algorithm is used to filter the feature and at the end of the process it finds optimal clusters for k-Means clustering algorithm. Based on the initial centroid , K-Means algorithm may have the chance of producing empty cluster. K-means does not effectively handle the outliers or noisy data in the dataset. K-means algorithm when combined with Genetic Algorithm shows high performance quality of clustering and fast evolution process when compared with K-Means alone. In this paper, to diagnosis Rheumatoid Arthritis disease we use machine learning algorithm FSKG. A predictive FSKG model is explored that diagnoses rheumatoid arthritis. After completing data analysis and pre-processing operations, Genetic Algorithm and K-Means Clustering Algorithm are integrated to choose correct features among all the features. Experimental Results from this study imply improved accuracy when compared to k-means algorithm for rheumatoid disease prediction.

Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4071 ◽  
Author(s):  
Alexandra Lianou ◽  
Arianna Mencattini ◽  
Alexandro Catini ◽  
Corrado Di Natale ◽  
George-John E. Nychas ◽  
...  

The performance of an Unsupervised Online feature Selection (UOS) algorithm was investigated for the selection of training features of multispectral images acquired from a dairy product (vanilla cream) stored under isothermal conditions. The selected features were further used as input in a support vector machine (SVM) model with linear kernel for the determination of the microbiological quality of vanilla cream. Model training (n = 65) was based on two batches of cream samples provided directly by the manufacturer and stored at different isothermal conditions (4, 8, 12, and 15 °C), whereas model testing (n = 132) and validation (n = 48) were based on real life conditions by analyzing samples from different retail outlets as well as expired samples from the market. Qualitative analysis was performed for the discrimination of cream samples in two microbiological quality classes based on the values of total viable counts [TVC ≤ 2.0 log CFU/g (fresh samples) and TVC ≥ 6.0 log CFU/g (spoiled samples)]. Results exhibited good performance with an overall accuracy of classification for the two classes of 91.7% for model validation. Further on, the model was extended to include the samples in the TVC range 2–6 log CFU/g, using 1 log step to define the microbiological quality of classes in order to assess the potential of the model to estimate increasing microbial populations. Results demonstrated that high rates of correct classification could be obtained in the range of 2–5 log CFU/g, whereas the percentage of erroneous classification increased in the TVC class (5,6) that was close to the spoilage level of the product. Overall, the results of this study demonstrated that the UOS algorithm in tandem with spectral data acquired from multispectral imaging could be a promising method for real-time assessment of the microbiological quality of vanilla cream samples.


2019 ◽  
Vol 11 (13) ◽  
pp. 3499 ◽  
Author(s):  
Se-Hoon Jung ◽  
Jun-Ho Huh

This study sought to propose a big data analysis and prediction model for transmission line tower outliers to assess when something is wrong with transmission line tower big data based on deep reinforcement learning. The model enables choosing automatic cluster K values based on non-labeled sensor big data. It also allows measuring the distance of action between data inside a cluster with the Q-value representing network output in the altered transmission line tower big data clustering algorithm containing transmission line tower outliers and old Deep Q Network. Specifically, this study performed principal component analysis to categorize transmission line tower data and proposed an automatic initial central point approach through standard normal distribution. It also proposed the A-Deep Q-Learning algorithm altered from the deep Q-Learning algorithm to explore policies based on the experiences of clustered data learning. It can be used to perform transmission line tower outlier data learning based on the distance of data within a cluster. The performance evaluation results show that the proposed model recorded an approximately 2.29%~4.19% higher prediction rate and around 0.8% ~ 4.3% higher accuracy rate compared to the old transmission line tower big data analysis model.


2012 ◽  
Vol 165 ◽  
pp. 232-236 ◽  
Author(s):  
Mohd Haniff Osman ◽  
Z.M. Nopiah ◽  
S. Abdullah

Having relevant features for representing dataset would motivate such algorithms to provide a highly accurate classification system in less-consuming time. Unfortunately, one good set of features is sometimes not fit to all learning algorithms. To confirm that learning algorithm selection does not weights system accuracy user has to validate that the given dataset is a feature-oriented dataset. Thus, in this study we propose a simple verification procedure based on multi objective approach by means of elitist Non-dominated Sorting in Genetic Algorithm (NSGA-II). The way NSGA-II performs in this work is quite similar to the feature selection procedure except on interpretation of the results i.e. set of optimal solutions. Two conflicting minimization elements namely classification error and number of used features are taken as objective functions. A case study of fatigue segment classification was chosen for the purpose of this study where simulations were repeated using four single classifiers such as Naive-Bayes, k nearest neighbours, decision tree and radial basis function. The proposed procedure demonstrates that only two features are needed for classifying a fatigue segment task without having to place concern on learning algorithm


2013 ◽  
Vol 411-414 ◽  
pp. 1884-1893
Author(s):  
Yong Chun Cao ◽  
Ya Bin Shao ◽  
Shuang Liang Tian ◽  
Zheng Qi Cai

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.


2021 ◽  
Vol 19 ◽  
pp. 310-320
Author(s):  
Suboh Alkhushayni ◽  
Taeyoung Choi ◽  
Du’a Alzaleq

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.


Author(s):  
Yuan-Dong Lan

Feature selection aims to choose an optimal subset of features that are necessary and sufficient to improve the generalization performance and the running efficiency of the learning algorithm. To get the optimal subset in the feature selection process, a hybrid feature selection based on mutual information and genetic algorithm is proposed in this paper. In order to make full use of the advantages of filter and wrapper model, the algorithm is divided into two phases: the filter phase and the wrapper phase. In the filter phase, this algorithm first uses the mutual information to sort the feature, and provides the heuristic information for the subsequent genetic algorithm, to accelerate the search process of the genetic algorithm. In the wrapper phase, using the genetic algorithm as the search strategy, considering the performance of the classifier and dimension of subset as an evaluation criterion, search the best subset of features. Experimental results on benchmark datasets show that the proposed algorithm has higher classification accuracy and smaller feature dimension, and its running time is less than the time of using genetic algorithm.


2020 ◽  
Vol 10 (8) ◽  
pp. 1815-1824
Author(s):  
S. Nithya Roopa ◽  
N. Nagarajan

The amount of data produced in health informatics growing large and as a result analysis of this huge amount of data requires a great knowledge which is to be gained. The basic aim of health informatics is to take in real world medical data from all levels of human existence to help improve our understanding of medicine and medical practices. Huge amount of unlabeled data are obtainable in lots of real-life data-mining tasks, e.g., uncategorized messages in an automatic email categorization system, unknown genes functions for doing gene function calculation, and so on. Labelled data is frequently restricted and expensive to produce, while labelling classically needs human proficiency. Consequently, semi-supervised learning has become a topic of significant recent interest. This research work proposed a new semi-supervised grouping, where the performance of unsupervised clustering algorithms is enhanced with restricted numbers of supervision in labels form on constraints or data. The previous system designed a Clustering Guided Hybrid support vector machine based Sparse Structural Learning (CGHSSL) for feature selection. However, it does not produce a satisfactory accuracy results. In this research, proposed clustering-guided with Convolution Neural Network (CNN) based sparse structural learning clustering algorithm. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm is progressed for learning cluster labels of input samples having more accuracy guiding features election at same time. Concurrently, prediction of cluster labels is as well performed by CNN by means of using hidden structure which is shared by various characteristics. The parameters of CNN are then optimized maximizing Multi-objective Bee Colony (MBO) algorithm that can unravel feature correlations to render outcomes with additional consistency. Row-wise sparse designs are then balanced to yield design depicted to suit for feature selection. This semi supervised algorithm is utilized to choose important characteristics from Leukemia1 dataset additional resourcefully. Therefore dataset size is decreased significantly utilizing semi supervised algorithm prominently. As well proposed Semi Supervised Clustering-Guided Sparse Structural Learning (SSCGSSL) technique is utilized to increase the clustering performance in higher. The experimental results show that the proposed system achieves better performance compared with the existing system in terms of Accuracy, Entropy, Purity, Normalized Mutual Information (NMI) and F-measure.


Sign in / Sign up

Export Citation Format

Share Document