Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.

Download Full-text

Challenges in benchmarking stream learning algorithms with real-world data

Data Mining and Knowledge Discovery ◽

10.1007/s10618-020-00698-5 ◽

2020 ◽

Vol 34 (6) ◽

pp. 1805-1858

Author(s):

Vinicius M. A. Souza ◽

Denis M. dos Reis ◽

André G. Maletzke ◽

Gustavo E. A. P. A. Batista

Keyword(s):

Real World ◽

Learning Algorithms ◽

Real World Data ◽

World Data

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Real-World Data Difficulty Estimation with the Use of Entropy

Entropy ◽

10.3390/e23121621 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1621

Author(s):

Przemysław Juszczuk ◽

Jan Kozak ◽

Grzegorz Dziczkowski ◽

Szymon Głowania ◽

Tomasz Jach ◽

...

Keyword(s):

Real World ◽

Real Estate Market ◽

Entropy Measure ◽

Real World Data ◽

World Data ◽

The Real Estate ◽

The Real Estate Market ◽

The Internet Of Things ◽

Selection Of

In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.

Download Full-text

AB0430 DEVELOP A REPLICABLE MODEL FOR RATIONAL SELECTION OF STRATEGIES IN TREAT-TO-TARGET AND MAINTAIN-BEING-TARGET: REAL WORLD DATA MINING VIA SMART SYSTEM OF DISEASE MANAGEMENT (SSDM)

10.1136/annrheumdis-2019-eular.6512 ◽

2019 ◽

Author(s):

Rong Mu ◽

LI Chun ◽

Jing Yang ◽

Xiaohan Wang ◽

Bin Wu ◽

...

Keyword(s):

Data Mining ◽

Disease Management ◽

Real World ◽

Treat To Target ◽

Rational Selection ◽

Real World Data ◽

Smart System ◽

World Data ◽

Selection Of

Download Full-text

Comparison of Fuzzy Clustering Methods and Their Applications to Geophysics Data

Applied Computational Intelligence and Soft Computing ◽

10.1155/2009/876361 ◽

2009 ◽

Vol 2009 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

David J. Miller ◽

Carl A. Nelson ◽

Molly Boeka Cannon ◽

Kenneth P. Cannon

Keyword(s):

Fuzzy Clustering ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Optimum Number ◽

Clustering Methods ◽

Real World Data ◽

Data Set ◽

World Data

Fuzzy clustering algorithms are helpful when there exists a dataset with subgroupings of points having indistinct boundaries and overlap between the clusters. Traditional methods have been extensively studied and used on real-world data, but require users to have some knowledge of the outcome a priori in order to determine how many clusters to look for. Additionally, iterative algorithms choose the optimal number of clusters based on one of several performance measures. In this study, the authors compare the performance of three algorithms (fuzzy c-means, Gustafson-Kessel, and an iterative version of Gustafson-Kessel) when clustering a traditional data set as well as real-world geophysics data that were collected from an archaeological site in Wyoming. Areas of interest in the were identified using a crisp cutoff value as well as a fuzzyα-cut to determine which provided better elimination of noise and non-relevant points. Results indicate that theα-cut method eliminates more noise than the crisp cutoff values and that the iterative version of the fuzzy clustering algorithm is able to select an optimum number of subclusters within a point set (in both the traditional and real-world data), leading to proper indication of regions of interest for further expert analysis

Download Full-text

Applications of clustering algorithms and self organizing maps as data mining and business intelligence tools on real world data sets

2010 International Conference on Methods and Models in Computer Science (ICM2CS-2010) ◽

10.1109/icm2cs.2010.5706714 ◽

2010 ◽

Cited By ~ 1

Author(s):

L Singh ◽

S Singh ◽

P K Dubey

Keyword(s):

Data Mining ◽

Real World ◽

Business Intelligence ◽

Clustering Algorithms ◽

Data Sets ◽

Real World Data ◽

Self Organizing Maps ◽

World Data ◽

Self Organizing

Download Full-text

Enhancing of DBSCAN based on Sampling and Densitybased Separation

Iraqi Journal for Computers and Informatics ◽

10.25195/ijci.v42i1.82 ◽

2016 ◽

Vol 42 (1) ◽

pp. 38-47

Author(s):

Safaa Al-mamory ◽

Israa Kamil

Keyword(s):

Real World ◽

Clustering Algorithms ◽

Experimental Results ◽

New Technique ◽

Real World Data ◽

New Techniques ◽

World Data ◽

Density Based Clustering ◽

A New Technique

DBSCAN (Density-Based Clustering of Applications with Noise )is one of the attractive algorithms among densitybased clustering algorithms. It characterized by its ability to detect clusters of various sizes and shapes with the presence of noise, but its performance degrades when data have different densities .In this paper, we proposed a new technique to separate data based on its density with a new samplingtechnique , the purpose of these new techniques is for getting data with homogenous density .The experimental results onsynthetic data and real world data show that the new technique enhanced the clustering of DBSCAN to large extent.

Download Full-text

Recurrent Adaptive Classifier Ensemble for Handling Recurring Concept Drifts

Applied Computational Intelligence and Soft Computing ◽

10.1155/2021/5533777 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Tinofirei Museba ◽

Fulufhelo Nelwamondo ◽

Khmaies Ouahada ◽

Ayokunle Akinola

Keyword(s):

Machine Learning ◽

Real World ◽

Concept Drift ◽

Learning Algorithms ◽

Learning Model ◽

Machine Learning Algorithms ◽

Classifier Ensemble ◽

Series Data ◽

Real World Data ◽

World Data

For most real-world data streams, the concept about which data is obtained may shift from time to time, a phenomenon known as concept drift. For most real-world applications such as nonstationary time-series data, concept drift often occurs in a cyclic fashion, and previously seen concepts will reappear, which supports a unique kind of concept drift known as recurring concepts. A cyclically drifting concept exhibits a tendency to return to previously visited states. Existing machine learning algorithms handle recurring concepts by retraining a learning model if concept is detected, leading to the loss of information if the concept was well learned by the learning model, and the concept will recur again in the next learning phase. A common remedy for most machine learning algorithms is to retain and reuse previously learned models, but the process is time-consuming and computationally prohibitive in nonstationary environments to appropriately select any optimal ensemble classifier capable of accurately adapting to recurring concepts. To learn streaming data, fast and accurate machine learning algorithms are needed for time-dependent applications. Most of the existing algorithms designed to handle concept drift do not take into account the presence of recurring concept drift. To accurately and efficiently handle recurring concepts with minimum computational overheads, we propose a novel and evolving ensemble method called Recurrent Adaptive Classifier Ensemble (RACE). The algorithm preserves an archive of previously learned models that are diverse and always trains both new and existing classifiers. The empirical experiments conducted on synthetic and real-world data stream benchmarks show that RACE significantly adapts to recurring concepts more accurately than some state-of-the-art ensemble classifiers based on classifier reuse.

Download Full-text