Multiagent Based Large Data Clustering Scheme for Data Mining Applications

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.

Download Full-text

Big Data Clustering And Its Applications Examination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1466.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3687-3693

Keyword(s):

Data Mining ◽

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Data Sets ◽

Clustering Methods ◽

Time Saving ◽

Data Set ◽

The Many

Clustering is a type of mining process where the data set is categorized into various sub classes. Clustering process is very much essential in classification, grouping, and exploratory pattern of analysis, image segmentation and decision making. And we can explain about the big data as very large data sets which are examined computationally to show techniques and associations and also which is associated to the human behavior and their interactions. Big data is very essential for several organisations but in few cases very complex to store and it is also time saving. Hence one of the ways of overcoming these issues is to develop the many clustering methods, moreover it suffers from the large complexity. Data mining is a type of technique where the useful information is extracted, but the data mining models cannot utilized for the big data because of inherent complexity. The main scope here is to introducing a overview of data clustering divisions for the big data And also explains here few of the related work for it. This survey concentrates on the research of several clustering algorithms which are working basically on the elements of big data. And also the short overview of clustering algorithms which are grouped under partitioning, hierarchical, grid based and model based are seenClustering is major data mining and it is used for analyzing the big data.the problems for applying clustering patterns to big data and also we phase new issues come up with big data

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

A Pattern Storage System using Pattern Warehouse along with Sources of Pattern Generation and Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1063.08810s19 ◽

2019 ◽

Vol 8 (10S) ◽

pp. 357-362

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Storage System ◽

Critical Evaluation ◽

Large Data ◽

Vital Role ◽

Pattern Generation ◽

Data Repository ◽

Storage Unit ◽

Data Mining Algorithms

Now a day different data mining algorithms are ready to create the specific set of data known as Pattern from a huge data repository, but there is no infrastructure or system to save it as persistent storage for the generated patterns. Pattern warehouse presents a foundation to make these patterns safe in the specific environment for long term use. Most organizations are excited to know the information or patterns rather than raw data or group of unprocessed data. Because extracted knowledge play a vital role to take right decision for the growth of an organization. We have examined the sources of patterns generated from large data sets. In this paper, we have presented little importance on the application area of pattern and idea of patter warehouse, the architecture of pattern warehouse then correlation between data warehouse and data mining, association between data mining and pattern warehouse, critical evaluation between existing approaches which theoretically published and more stress on association rule related review elements. In this paper, we analyze the patterns warehouse, data warehouse concerning various factors like storage space, type of storage unit, characteristics, and provide several research domains.

Download Full-text

A Novel Cosine Similarity Like Data Clustering Method for Effective Data Classification in Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6417.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 340-346

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Similarity Measures ◽

Numerical Data ◽

Data Classification ◽

Fundamental Goal ◽

Learning Technique ◽

Categorical Data Clustering

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text

A PSO based time series data clustering using modified S-transform for data mining

International Journal of Data Mining Modelling and Management ◽

10.1504/ijdmmm.2011.041810 ◽

2011 ◽

Vol 3 (3) ◽

pp. 277

Author(s):

Ranjeeta Bisoi ◽

P.K. Dash

Keyword(s):

Data Mining ◽

Time Series ◽

Data Clustering ◽

Time Series Data ◽

Series Data ◽

S Transform

Download Full-text

PENERAPAN DATA MINING DALAM MENGELOMPOKKAN KUNJUNGAN WISATAWAN DI KOTA YOGYAKARTA MENGGUNAKAN METODE K-MEANS

Journal of Computer Science and Technology (JCS-TECH) ◽

10.54840/jcstech.v1i1.9 ◽

2021 ◽

Vol 1 (1) ◽

pp. 27-32

Author(s):

Bambang Setio ◽

Putri Prasetyaningrum

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Data Clustering ◽

Cluster 2

Yogyakarta merupakan salah satu kota di Indonesia yang memiliki daya tarik wisata dan merupakan kota tujuan wisata yang paling diminati oleh wisatawan, dilihat dari jumlah kunjungan wisatawan yang semakin naik dari tahun ke tahun. Selain sebagai kota wisata, Yogyakarta merupakan kota pelajar, kota budaya dan kota perjuangan. Karena Yogyakarta disebut sebagai kota wisata, banyak berbagai macam objek wisata yang ditawarkan oleh Kota Yogyakarta. Dalam hal ini, penerapan datamining mampu menjadi solusi dalam menganalisa data. Clustering termasuk ke dalam descriptive methods, dan juga termasuk unsupervised learning dimana tidak ada pendefinisian kelas objek sebelumnya. Sehingga clustering dapat digunakan untuk menentukan label kelas bagi data-data yang belum diketahui kelasnya. Metode K-Means termasuk dalam partitioning clustering yang memisahkan data ke daerah bagian yang terpisah. Metode K-Means sangat terkenal karena kemudahan dan kemampuannya untuk mengelompokkan data besar dan outlier dengan sangat cepat. dari data yang diinputkandan telah di proses melalui metode algoritma K-Means bahwa telah melakukan iterasi sebanyak 5 kali dengan memilih cluster 1, cluster 2, cluster 3 secara acak (random) dengan cluster 1 memiliki 24 data dengan persentase sebesar (50%), cluster 2 memiliki 11 data dengan persentase sebesar (23%), dan cluster 3 memiliki 13 data dengan persentase sebesar (27%).

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Multiagent Based Large Data Clustering Scheme for Data Mining Applications

Improvement of K-Means Algorithm for Accelerated Big Data Clustering

Big Data Clustering And Its Applications Examination

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

A Pattern Storage System using Pattern Warehouse along with Sources of Pattern Generation and Applications

A Novel Cosine Similarity Like Data Clustering Method for Effective Data Classification in Data Mining

A PSO based time series data clustering using modified S-transform for data mining

PENERAPAN DATA MINING DALAM MENGELOMPOKKAN KUNJUNGAN WISATAWAN DI KOTA YOGYAKARTA MENGGUNAKAN METODE K-MEANS

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Uncertainty-Based Clustering Algorithms for Large Data Sets

Export Citation Format