DATA MINING KLASTERISASI PENJUALAN ALAT-ALAT BANGUNAN MENGGUNAKAN METODE K-MEANS (STUDI KASUS DI TOKO ADI BANGUNAN)

M. Hasyim Siregar

doi:10.36378/jtos.v1i2.24

DATA MINING KLASTERISASI PENJUALAN ALAT-ALAT BANGUNAN MENGGUNAKAN METODE K-MEANS (STUDI KASUS DI TOKO ADI BANGUNAN)

JURNAL TEKNOLOGI DAN OPEN SOURCE ◽

10.36378/jtos.v1i2.24 ◽

2018 ◽

Vol 1 (2) ◽

pp. 83-91

Author(s):

M. Hasyim Siregar

Keyword(s):

Data Mining ◽

Cost Reduction ◽

Building Materials ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Operational Cost ◽

Use Of Data

In the world of business competition today, we are required to continually develop business to always survive in the competition. To achieve this there are a few things that can be done is to improve the quality of the product, adding the type of product and operational cost reduction company with how to use data analysis of the company. Data mining is a technology that automate the process to find interesting patterns and sensitive from the large data sets. This allows human understanding about finding patterns and scalability techniques. The store Adi Bangunan is a shop which is engaged in the sale of building materials and household who have such a system on supermarket namely buyers took own goods that will be purchased. Sales data, purchase goods or reimbursed some unexpected is not well ordered, so that the data is only function as archive for the store and cannot be used for the development of marketing strategy. In this research, data mining applied using the model of the process of K-Means that provides a standard process for the use of data mining in various areas used in the classification of because the results of this method can be easily understood and interpreted.

Download Full-text

South German Credit Data Classification Using Random Forest Algorithm to Predict Bank Credit Receipts

JISA(Jurnal Informatika dan Sains) ◽

10.31326/jisa.v3i2.837 ◽

2020 ◽

Vol 3 (2) ◽

Author(s):

Yoga Religia ◽

Gatot Tri Pranoto ◽

Egar Dika Santosa

Keyword(s):

Data Mining ◽

Random Forest ◽

Large Data ◽

Classification Algorithm ◽

Large Data Sets ◽

Training Data ◽

Data Sets ◽

Testing Data ◽

Credit Data

Normally, most of the bank's wealth is obtained from providing credit loans so that a marketing bank must be able to reduce the risk of non-performing credit loans. The risk of providing loans can be minimized by studying patterns from existing lending data. One technique that can be used to solve this problem is to use data mining techniques. Data mining makes it possible to find hidden information from large data sets by way of classification. The Random Forest (RF) algorithm is a classification algorithm that can be used to deal with data imbalancing problems. The purpose of this study is to discuss the use of the RF algorithm for classification of South German Credit data. This research is needed because currently there is no previous research that applies the RF algorithm to classify South German Credit data specifically. Based on the tests that have been done, the optimal performance of the classification algorithm RF on South German Credit data is the comparison of training data of 85% and testing data of 15% with an accuracy of 78.33%.

Download Full-text

Benefits of Educational Data Mining

Journal of International Business Research and Marketing ◽

10.18775/jibrm.1849-8558.2015.61.3002 ◽

2020 ◽

Vol 6 (1) ◽

pp. 12-16

Author(s):

Alisa Bilal Zorić

Keyword(s):

Data Mining ◽

Teaching And Learning ◽

Educational Data Mining ◽

Large Data ◽

Research Area ◽

Large Data Sets ◽

Data Sets ◽

Use Of Data ◽

Learning Conditions ◽

Methods And Techniques

We live in a world where we collect huge amounts of data, but if this data is not further analyzed, it remains only huge amounts of data. With new methods and techniques, we can use this data, analyze it and get a great advantage. The perfect method for this is data mining. Data mining is the process of extracting hidden and useful information and patterns from large data sets. Its application in various areas such as finance, telecommunications, healthcare, sales marketing, banking, etc. is already well known. In this paper, we want to introduce special use of data mining in education, called educational data mining. Educational Data Mining (EDM) is an interdisciplinary research area created as the application of data mining in the educational field. It uses different methods and techniques from machine learning, statistics, data mining and data analysis, to analyze data collected during teaching and learning. Educational Data Mining is the process of raw data transformation from large educational databases to useful and meaningful information which can be used for a better understanding of students and their learning conditions, improving teaching support as well as for decision making in educational systems.The goal of this paper is to introduce educational data mining and to present its application and benefits.

Download Full-text

Detection and Classification of Anomalies in Large Data Sets on the Basis of Information Granules

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2021.3076265 ◽

2021 ◽

pp. 1-1

Author(s):

Adam Kiersztyn ◽

Pawe Karczmarek ◽

Krystyna Kiersztyn ◽

Witold Pedrycz

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Information Granules

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Effects of genotype and lactation number on health and reproductive problems in dairy cows

Proceedings of the British Society of Animal Science ◽

10.1017/s1752756200595842 ◽

1997 ◽

Vol 1997 ◽

pp. 143-143

Author(s):

B.L. Nielsen ◽

R.F. Veerkamp ◽

J.E. Pryce ◽

G. Simm ◽

J.D. Oldham

Keyword(s):

Dairy Cows ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Variation Analysis ◽

Genetic Line ◽

Data Set ◽

Health Events ◽

Use Of Data ◽

Low Incidence

High producing dairy cows have been found to be more susceptible to disease (Jones et al., 1994; Göhn et al., 1995) raising concerns about the welfare of the modern dairy cow. Genotype and number of lactations may affect various health problems differently, and their relative importance may vary. The categorical nature and low incidence of health events necessitates large data-sets, but the use of data collected across herds may introduce unwanted variation. Analysis of a comprehensive data-set from a single herd was carried out to investigate the effects of genetic line and lactation number on the incidence of various health and reproductive problems.

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-015-0478-7 ◽

2015 ◽

Vol 8 (3) ◽

pp. 1009-1017 ◽

Cited By ~ 54

Author(s):

Junhai Zhai ◽

Sufang Zhang ◽

Chenxi Wang

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Distributed training and scalability for the particle clustering method UCluster

EPJ Web of Conferences ◽

10.1051/epjconf/202125102054 ◽

2021 ◽

Vol 251 ◽

pp. 02054

Author(s):

Olga Sunneborn Gudnadottir ◽

Daniel Gedon ◽

Colin Desmarais ◽

Karl Bengtsson Bernander ◽

Raazesh Sainudiin ◽

...

Keyword(s):

Particle Physics ◽

Hadron Collider ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Training Time ◽

Distributed Training ◽

Machine Learning Methods ◽

Multi Class Classification

In recent years, machine-learning methods have become increasingly important for the experiments at the Large Hadron Collider (LHC). They are utilised in everything from trigger systems to reconstruction and data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified to provide solutions for a variety of different decision problems. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, and thereby extending its utility to learn from arbitrarily large data sets. UCluster combines a graph-based neural network called ABCnet with a clustering step, using a combined loss function in the training phase. The original code is publicly available in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multi-class classification of simulated jet events. Our implementation adds the distributed training functionality by utilising the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different compute nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC data sets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU’s used. However, further improvements by a more exhaustive and possibly distributed hyper-parameter search is required in order to achieve the reported accuracy of the original UCluster method.

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

High performance spatial data mining for very large data-sets (citation_only)

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03 ◽

10.1145/781498.781509 ◽

2003 ◽

Author(s):

Baris Kazar

Keyword(s):

Data Mining ◽

Spatial Data ◽

High Performance ◽

Large Data ◽

Spatial Data Mining ◽

Large Data Sets ◽

Data Sets

Download Full-text