evolutionary clustering
Recently Published Documents


TOTAL DOCUMENTS

173
(FIVE YEARS 46)

H-INDEX

19
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Christian Nordahl ◽  
Veselka Boeva ◽  
Håkan Grahn ◽  
Marie Persson Netz

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.


2021 ◽  
Author(s):  
Bryar A.Hassan ◽  
Tarik A.Rashid ◽  
Seyedali Mirjalili

Abstract This article presents the data used to evaluate the performance of evolutionary clustering algorithm star (ECA*) compared to five traditional and modern clustering algorithms. Two experimental methods are employed to examine the performance of ECA* against genetic algorithm for clustering++ (GENCLUST++), learning vector quantisation (LVQ), expectation maximisation (EM), K-means++ (KM++) and K-means (KM). These algorithms are applied to 32 heterogenous and multi-featured datasets to determine which one performs well on the three tests. For one, ther paper examines the efficiency of ECA* in contradiction of its corresponding algorithms using clustering evaluation measures. These validation criteria are objective function and cluster quality measures. For another, it suggests a performance rating framework to measurethe the performance sensitivity of these algorithms on varos dataset features (cluster dimensionality, number of clusters, cluster overlap, cluster shape and cluster structure). The contributions of these experiments are two-folds: (i) ECA* exceeds its counterpart aloriths in ability to find out the right cluster number; (ii) ECA* is less sensitive towards dataset features compared to its competitive techniques. Nonetheless, the results of the experiments performed demonstrate some limitations in the ECA*: (i) ECA* is not fully applied based on the premise that no prior knowledge exists; (ii) Adapting and utilising ECA* on several real applications has not been achieved yet.


Author(s):  
Benjamin Mario Sainz-Tinajero ◽  
Andres Eduardo Gutierrez-Rodriguez ◽  
Hector G. Ceballos ◽  
Francisco J. Cantu-Ortiz

Author(s):  
Ibrahim Arpaci ◽  
Shadi Alshehabi ◽  
Ibrahim Mahariq ◽  
Ahmet E. Topcu

This study investigates the impact of global infection rates on social media posts during the COVID-19 pandemic. The study analysed over 179 million tweets posted between March 22 and April 13, 2020 and the global COVID-19 infection rates using evolutionary clustering analysis. Results showed six clusters constructed for each term type, including three-level [Formula: see text]-grams (unigrams, bigrams and trigrams). The frequent occurrences of unigrams (“COVID-19”, “virus”, “government”, “people”, etc.), bigrams (“COVID 19”, “COVID-19 cases”, “times share”, etc.) and trigrams (“COVID 19 crisis”, “things help stop” and “trying times share”) were identified. The results demonstrated that the unigram trends on Twitter were up to about two times and 54 times more common than the bigram terms and trigram terms, respectively. Unigrams like “home” or “need” also became important as these terms reflected the main concerns of people during this period. Taken together, the present findings confirm that many tweets were used to broadcast people’s prevalent topics of interest during the COVID-19 pandemic. Furthermore, the results indicate that the number of COVID-19 infections had a significant effect on all clusters, being strong on 86% of clusters and moderate on 16% of clusters. The downward slope in global infection rates reflected the start of the trending of “social distancing” and “stay at home”. These findings suggest that infection rates have had a significant impact on social media posting during the COVID-19 pandemic.


Author(s):  
Bryar A. Hassan ◽  
Tarik A. Rashid ◽  
Seyedali Mirjalili

AbstractIt is beneficial to automate the process of deriving concept hierarchies from corpora since a manual construction of concept hierarchies is typically a time-consuming and resource-intensive process. As such, the overall process of learning concept hierarchies from corpora encompasses a set of steps: parsing the text into sentences, splitting the sentences and then tokenising it. After the lemmatisation step, the pairs are extracted using formal context analysis (FCA). However, there might be some uninteresting and erroneous pairs in the formal context. Generating formal context may lead to a time-consuming process, so formal context size reduction is require to remove uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this study aims to propose two frameworks: (1) A framework to review the current process of deriving concept hierarchies from corpus utilising formal concept analysis (FCA); (2) A framework to decrease the formal context’s ambiguity of the first framework using an adaptive version of evolutionary clustering algorithm (ECA*). Experiments are conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reducing size of formal context, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context is evaluated to the standard one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the standard one. The adaptive ECA* is examined against its four counterpart baseline algorithms (Fuzzy K-means, JBOS approach, AddIntent algorithm, and FastAddExtent) to measure the execution time on random datasets with different densities (fill ratios). The results show that adaptive ECA* performs concept lattice faster than other mentioned competitive techniques in different fill ratios.


Sign in / Sign up

Export Citation Format

Share Document