A MODIFIED KOHONEN SELF-ORGANIZING MAP (KSOM) CLUSTERING FOR FOUR CATEGORICAL DATA

2016 ◽  
Vol 78 (6-13) ◽  
Author(s):  
Azlin Ahmad ◽  
Rubiyah Yusof

The Kohonen Self-Organizing Map (KSOM) is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. Despite its advantages, the KSOM algorithm has a few drawbacks; such as overlapped cluster and non-linear separable problems. Therefore, this paper proposes a modified KSOM that inspired from pheromone approach in Ant Colony Optimization. The modification is focusing on the distance calculation amongst objects. The proposed algorithm has been tested on four real categorical data that are obtained from UCI machine learning repository; Iris, Seeds, Glass and Wisconsin Breast Cancer Database. From the results, it shows that the modified KSOM has produced accurate clustering result and all clusters can clearly be identified.

2016 ◽  
pp. 203-214 ◽  
Author(s):  
Ahmad Al-Khasawneh

Breast cancer is the second leading cause of cancer deaths in women worldwide. Early diagnosis of this illness can increase the chances of long-term survival of cancerous patients. To help in this aid, computerized breast cancer diagnosis systems are being developed. Machine learning algorithms and data mining techniques play a central role in the diagnosis. This paper describes neural network based approaches to breast cancer diagnosis. The aim of this research is to investigate and compare the performance of supervised and unsupervised neural networks in diagnosing breast cancer. A multilayer perceptron has been implemented as a supervised neural network and a self-organizing map as an unsupervised one. Both models were simulated using a variety of parameters and tested using several combinations of those parameters in independent experiments. It was concluded that the multilayer perceptron neural network outperforms Kohonen's self-organizing maps in diagnosing breast cancer even with small data sets.


2015 ◽  
Vol 3 (2) ◽  
pp. 94-101 ◽  
Author(s):  
Masashi Ikeda ◽  
Kazuki Kumon ◽  
Kazuya Omoto ◽  
Yuh Sugii ◽  
Akifumi Mizutani ◽  
...  

2005 ◽  
Vol 15 (01n02) ◽  
pp. 101-110 ◽  
Author(s):  
TIMO SIMILÄ ◽  
SAMPSA LAINE

Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


Author(s):  
Abou_el_ela Abdou Hussein

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.


Author(s):  
Avinash Navlani ◽  
V. B. Gupta

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.


Author(s):  
Phillip L. Manning ◽  
Peter L. Falkingham

Dinosaurs successfully conjure images of lost worlds and forgotten lives. Our understanding of these iconic, extinct animals now comes from many disciplines, not just the science of palaeontology. In recent years palaeontology has benefited from the application of new and existing techniques from physics, biology, chemistry, engineering, but especially computational science. The application of computers in palaeontology is highlighted in this chapter as a key area of development in studying fossils. The advances in high performance computing (HPC) have greatly aided and abetted multiple disciplines and technologies that are now feeding paleontological research, especially when dealing with large and complex data sets. We also give examples of how such multidisciplinary research can be used to communicate not only specific discoveries in palaeontology, but also the methods and ideas, from interrelated disciplines to wider audiences. Dinosaurs represent a useful vehicle that can help enable wider public engagement, communicating complex science in digestible chunks.


2010 ◽  
pp. 1797-1803
Author(s):  
Lisa Friedland

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).


2022 ◽  
pp. 67-76
Author(s):  
Dineshkumar Bhagwandas Vaghela

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.


Sign in / Sign up

Export Citation Format

Share Document