Enhancing Machine Learning Aptitude Using Significant Cluster Identification for Augmented Image Refining

Dhanalakshmi Samiappan; S. Latha; T. Rama Rao; Deepak Verma; CSA Sriharsha

doi:10.1142/s021800142051009x

Enhancing Machine Learning Aptitude Using Significant Cluster Identification for Augmented Image Refining

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142051009x ◽

2019 ◽

Vol 34 (09) ◽

pp. 2051009

Author(s):

Dhanalakshmi Samiappan ◽

S. Latha ◽

T. Rama Rao ◽

Deepak Verma ◽

CSA Sriharsha

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Speckle Noise ◽

Computational Time ◽

Significant Cluster ◽

Edge Preservation ◽

Cluster Identification ◽

Base Method ◽

Window Selection

Enhancing the image to remove noise, preserving the useful features and edges are the most important tasks in image analysis. In this paper, Significant Cluster Identification for Maximum Edge Preservation (SCI-MEP), which works in parallel with clustering algorithms and improved efficiency of the machine learning aptitude, is proposed. Affinity propagation (AP) is a base method to obtain clusters from a learnt dictionary, with an adaptive window selection, which are then refined using SCI-MEP to preserve the semantic components of the image. Since only the significant clusters are worked upon, the computational time drastically reduces. The flexibility of SCI-MEP allows it to be integrated with any clustering algorithm to improve its efficiency. The method is tested and verified to remove Gaussian noise, rain noise and speckle noise from images. Our results have shown that SCI-MEP considerably optimizes the existing algorithms in terms of performance evaluation metrics.

Get full-text (via PubEx)

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Get full-text (via PubEx)

Hamming Distance based Clustering Algorithm

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 11-20 ◽

Cited By ~ 3

Author(s):

Ritu Vijay ◽

Prerna Mahajan ◽

Rekha Kandwal

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Hamming Distance ◽

Promising Result ◽

Clustering Algorithms ◽

Distribution Patterns ◽

Mixed Data ◽

Binary Representation ◽

Data Sets ◽

Performance Study

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.

Get full-text (via PubEx)

Design of an Unsupervised Machine Learning-Based Movie Recommender System

10.20944/preprints202001.0124.v1 ◽

2020 ◽

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.

Get full-text (via PubEx)

Ensemble Hybrid K- Means and DBSCAN Clustering Algorithm – HDKA for Cancer Dataset

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8257.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6036-6040

Keyword(s):

Machine Learning ◽

Data Mining ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Cancer Dataset ◽

Dbscan Clustering ◽

Selection Of

Data Mining is the foremost vital space of analysis and is pragmatically utilized in totally different domains, It becomes a highly demanding field because huge amounts of data have been collected in various applications. The database can be clustered in more number of ways depending on the clustering algorithm used, parameter settings and other factors. Multiple clustering algorithms can be combined to get the final partitioning of data which provides better clustering results. In this paper, Ensemble hybrid KMeans and DBSCAN (HDKA) algorithm has been proposed to overcome the drawbacks of DBSCAN and KMeans clustering algorithms. The performance of the proposed algorithm improves the selection of centroid points through the centroid selection strategy.For experimental results we have used two dataset Colon and Leukemia from UCI machine learning repository.

Get full-text (via PubEx)

Hierarchical Sparse Subspace Clustering (HESSC): An Automatic Approach for Hyperspectral Image Analysis

Remote Sensing ◽

10.3390/rs12152421 ◽

2020 ◽

Vol 12 (15) ◽

pp. 2421

Author(s):

Kasra Rafiezadeh Shahi ◽

Mahdi Khodadadzadeh ◽

Laura Tusa ◽

Pedram Ghamisi ◽

Raimon Tolosana-Delgado ◽

...

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Hyperspectral Image ◽

Imaging Techniques ◽

Clustering Algorithms ◽

Machine Learning Techniques ◽

Mixed Data ◽

Number Of Clusters ◽

Hyperspectral Image Analysis ◽

Learning Techniques

Hyperspectral imaging techniques are becoming one of the most important tools to remotely acquire fine spectral information on different objects. However, hyperspectral images (HSIs) require dedicated processing for most applications. Therefore, several machine learning techniques were proposed in the last decades. Among the proposed machine learning techniques, unsupervised learning techniques have become popular as they do not need any prior knowledge. Specifically, sparse subspace-based clustering algorithms have drawn special attention to cluster the HSI into meaningful groups since such algorithms are able to handle high dimensional and highly mixed data, as is the case in real-world applications. Nonetheless, sparse subspace-based clustering algorithms usually tend to demand high computational power and can be time-consuming. In addition, the number of clusters is usually predefined. In this paper, we propose a new hierarchical sparse subspace-based clustering algorithm (HESSC), which handles the aforementioned problems in a robust and fast manner and estimates the number of clusters automatically. In the experiment, HESSC is applied to three real drill-core samples and one well-known rural benchmark (i.e., Trento) HSI datasets. In order to evaluate the performance of HESSC, the performance of the new proposed algorithm is quantitatively and qualitatively compared to the state-of-the-art sparse subspace-based algorithms. In addition, in order to have a comparison with conventional clustering algorithms, HESSC’s performance is compared with K-means and FCM. The obtained clustering results demonstrate that HESSC performs well when clustering HSIs compared to the other applied clustering algorithms.

Get full-text (via PubEx)

SVM Classifier on K-means Clustering Algorithm with Normalization in Data Mining for Prediction

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc.v7i6.5318 ◽

2019 ◽

Vol 7 (6) ◽

pp. 29-34

Author(s):

Vasu Deep ◽

Himanshu Sharma

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Research Work ◽

Initial Point ◽

Computational Time ◽

Svm Classifier ◽

Point Selection ◽

Cancer Dataset ◽

Number Of Iterations

This work is belonging to K-means clustering algorithms classifier is used with this algorithm to classified data and Min Max normalization technique also used is to enhance the results of this work over simply K- Means algorithm. K-means algorithm is a clustering algorithm and basically used for discovering the cluster within a dataset. Here cancer dataset is used for this research work and dataset is classified in two categories – Cancer and Non-Cancer, after execution of the implemented algorithm with SVM and Normalization technique. The initial point selection effects on the results of the algorithm, both in the number of clusters found and their centroids. In this work enhance the k-means clustering algorithm methods are discussed. This technique helps to improve efficiency, accuracy, performance and computational time. Some enhanced variations improve the efficiency and accuracy of algorithm. The main of all methods is to decrees the number of iterations which will less computational time. K-means algorithm in clustering is most popular technique which is widely used technique in data mining. Various enhancements done on K-mean are collected, so by using these enhancements one can build a new proposed algorithm which will be more efficient, accurate and less time consuming than the previous work. More focus of this studies is to decrease the number of iterations which is less time consuming and second one is to gain more accuracy using normalization technique overall belonging to improve time and accuracy than previous studies.

Get full-text (via PubEx)

eSCIFI: An Energy Saving Mechanism for WLANs Based on Machine Learning

Energies ◽

10.3390/en15020462 ◽

2022 ◽

Vol 15 (2) ◽

pp. 462

Author(s):

Guilherme Henrique Apostolo ◽

Flavia Bernardini ◽

Luiz C. Schara Magalhães ◽

Débora C. Muchaluat-Saade

Keyword(s):

Machine Learning ◽

Energy Saving ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Local Area Networks ◽

Local Area ◽

Wireless Local Area Networks ◽

Machine Learning Algorithms ◽

Decision Algorithm ◽

User Coverage

As wireless local area networks grow in size to provide access to users, power consumption becomes an important issue. Power savings in a large-scale Wi-Fi network, with low impact to user service, is undoubtedly desired. In this work, we propose and evaluate the eSCIFI energy saving mechanism for Wireless Local Area Networks (WLANs). eSCIFI is an energy saving mechanism that uses machine learning algorithms as occupancy demand estimators. The eSCIFI mechanism is designed to cope with a broader range of WLANs, which includes Wi-Fi networks such as the Fluminense Federal University (UFF) SCIFI network. The eSCIFI can cope with WLANs that cannot acquire data in a real time manner and/or possess a limited CPU power. The eSCIFI design also includes two clustering algorithms, named cSCIFI and cSCIFI+, that help to guarantee the network’s coverage. eSCIFI uses those network clusters and machine learning predictions as input features to an energy state decision algorithm that then decides which Access Points (AP) can be switched off during the day. To evaluate eSCIFI performance, we conducted several trace-driven simulations comparing the eSCIFI mechanism using both clustering algorithms with other energy saving mechanisms found in the literature using the UFF SCIFI network traces. The results showed that eSCIFI mechanism using the cSCIFI+ clustering algorithm achieves the best performance and that it can save up to 64.32% of the UFF SCIFI network energy without affecting the user coverage.

Get full-text (via PubEx)

High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing

Journal of Information & Knowledge Management ◽

10.1142/s0219649218500338 ◽

2018 ◽

Vol 17 (03) ◽

pp. 1850033 ◽

Cited By ~ 2

Author(s):

Saida Ishak Boushaki ◽

Nadjet Kamel ◽

Omar Bendjeghaba

Keyword(s):

Clustering Algorithm ◽

Sparse Matrix ◽

Clustering Algorithms ◽

Document Clustering ◽

Cuckoo Search ◽

Latent Semantic Indexing ◽

Computational Time ◽

High Dimensional ◽

Semantic Indexing ◽

Number Of Clusters

The clustering is an important data analysis technique. However, clustering high-dimensional data like documents needs more effort in order to extract the richness relevant information hidden in the multidimensionality space. Recently, document clustering algorithms based on metaheuristics have demonstrated their efficiency to explore the search area and to achieve the global best solution rather than the local one. However, most of these algorithms are not practical and suffer from some limitations, including the requirement of the knowledge of the number of clusters in advance, they are neither incremental nor extensible and the documents are indexed by high-dimensional and sparse matrix. In order to overcome these limitations, we propose in this paper, a new dynamic and incremental approach (CS_LSI) for document clustering based on the recent cuckoo search (CS) optimization and latent semantic indexing (LSI). Conducted Experiments on four well-known high-dimensional text datasets show the efficiency of LSI model to reduce the dimensionality space with more precision and less computational time. Also, the proposed CS_LSI determines the number of clusters automatically by employing a new proposed index, focused on significant distance measure. This later is also used in the incremental mode and to detect the outlier documents by maintaining a more coherent clusters. Furthermore, comparison with conventional document clustering algorithms shows the superiority of CS_LSI to achieve a high quality of clustering.

Get full-text (via PubEx)

An Ultra-Fast Method for Clustering of Big Genomic Data

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 45-60

Author(s):

Billel Kenidra ◽

Mohamed Benmohammed

Keyword(s):

Dna Methylation ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Computational Time ◽

Fast Method ◽

Running Time ◽

Cancer Subtypes ◽

Biologically Relevant ◽

High Dimensional Datasets

The clustering process is used to identify cancer subtypes based on gene expression and DNA methylation datasets, since cancer subtype information is critically important for understanding tumor heterogeneity, detecting previously unknown clusters of biological samples, which are usually associated with unknown types of cancer will, in turn, gives way to prescribe more effective treatments for patients. This is because cancer has varying subtypes which often respond disparately to the same treatment. While the DNA methylation database is extremely large-scale datasets, running time still remains a major challenge. Actually, traditional clustering algorithms are too slow to handle biological high-dimensional datasets, they usually require large amounts of computational time. The proposed clustering algorithm extraordinarily overcomes all others in terms of running time, it is able to rapidly identify a set of biologically relevant clusters in large-scale DNA methylation datasets, its superiority over the others has been demonstrated regarding its relative speed.

Get full-text (via PubEx)

Design of an Unsupervised Machine Learning-Based Movie Recommender System

Symmetry ◽

10.3390/sym12020185 ◽

2020 ◽

Vol 12 (2) ◽

pp. 185 ◽

Cited By ~ 3

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used average similarity, computational time, association rule with Apriori algorithm, and clustering performance evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies–Bouldin Index.

Get full-text (via PubEx)