Determining the Number of Data Clusters in Any Dataset in the Presence of Considerable Noise

Keyword(s):  
2018 ◽  
Vol 6 (1) ◽  
pp. 41-48
Author(s):  
Santoso Setiawan

Abstract   Inaccurate stock management will lead to high and uneconomical storage costs, as there may be a void or surplus of certain products. This will certainly be very dangerous for all business people. The K-Means method is one of the techniques that can be used to assist in designing an effective inventory strategy by utilizing the sales transaction data that is already available in the company. The K-Means algorithm will group the products sold into several large transactional data clusters, so it is expected to help entrepreneurs in designing stock inventory strategies.   Keywords: inventory, k-means, product transaction data, rapidminer, data mining   Abstrak   Manajemen stok yang tidak akurat akan menyebabkan biaya penyimpanan yang tinggi dan tidak ekonomis, karena kemungkinan terjadinya kekosongan atau kelebihan produk tertentu. Hal ini sangat berbahaya bagi para pelaku bisnis. Metode K-Means adalah salah satu teknik yang dapat digunakan untuk membantu dalam merancang strategi persediaan yang efektif dengan memanfaatkan data transaksi penjualan yang telah tersedia di perusahaan. Algoritma K-Means akan mengelompokkan produk yang dijual ke beberapa cluster data transaksi yang umumnya besar, sehingga diharapkan dapat membantu pengusaha dalam merancang strategi persediaan stok.   Kata kunci: data transaksi produk, k-means, persediaan, rapidminer, data mining.


2021 ◽  
Vol 102 ◽  
pp. 102751
Author(s):  
Chiheb-Eddine Ben Ncir ◽  
Abdallah Hamza ◽  
Waad Bouaguel
Keyword(s):  
Big Data ◽  

2000 ◽  
Vol 12 (10) ◽  
pp. 2331-2353 ◽  
Author(s):  
H. Lipson ◽  
H. T. Siegelmann

This article introduces a method for clustering irregularly shaped data arrangements using high-order neurons. Complex analytical shapes are modeled by replacing the classic synaptic weight of the neuron by high-order tensors in homogeneous coordinates. In the first- and second-order cases, this neuron corresponds to a classic neuron and to an ellipsoidal-metric neuron. We show how high-order shapes can be formulated to follow the maximum-correlation activation principle and permit simple local Hebbian learning. We also demonstrate decomposition of spatial arrangements of data clusters, including very close and partially overlapping clusters, which are difficult to distinguish using classic neurons. Superior results are obtained for the Iris data.


2019 ◽  
pp. 85-105
Author(s):  
Benjamin Weissman ◽  
Enrico van de Laar
Keyword(s):  
Big Data ◽  

2020 ◽  
pp. 147-202
Author(s):  
Benjamin Weissman ◽  
Enrico van de Laar
Keyword(s):  
Big Data ◽  

2020 ◽  
Vol 2 (1) ◽  
pp. 1-14
Author(s):  
Torkis Nasution

The selection was an attempt College to get qualified prospective students. Test data for new students able to describe the quality of academic and connect to graduate on time. Recognizing the academic quality of students is required in the implementation of the lecture to obtain optimal results. Real conditions today, timely graduation has not achieved optimally, need to be improved to reach the limits of reasonableness. Data that has no need to do a classification based on academic quality, in order to obtain predictions timely graduation. Therefore, proposed an effort to resolve the problem by applying the K-Nearest Neighbor algorithm to re-clustering the test result data for new students. The procedure is to determine the amount of data clusters, determining the center point of the cluster, calculate the distance of the object with the centroid, classifying objects. If the new data group calculation results together with the results of calculation of new data group then finished its calculations. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results.  


2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Radiocarbon ◽  
1997 ◽  
Vol 40 (1) ◽  
pp. 561-569
Author(s):  
Herbert Haas ◽  
Matthew R. Doubrava

Application of radiocarbon dating to a short chronology is often limited by the wide probability ranges of calibrated dates. These wide ranges are caused by multiple intersections of the 14C age with the tree-ring curve. For a single unrelated 14C date, each intersection presents a probable solution. When several dates on different events are available, identification of the most probable solution for each event is possible if one can obtain some information on the relation between these events. We present here a method for such identifications.To demonstrate the method, we selected a series of 14C dates from mortuary monuments of the Egyptian Old Kingdom. Corrected 14C dates from seven monuments were used. Calibration of these dates produced three absolute ages with single intersections and four ages with 3–5 intersections. These data are compared to a historical chronology, which places the dated events at a younger age. If each intersection is chosen as a potential anchor point of the “correct” chronology, 17 solutions must be tested for the best fit against the historical chronology. The latter is based on the length of the reign of each pharaoh during the studied time span. The spreadsheet has the function of determining the probability of fit for each of the solutions. In a second step the 17 probability values and their offset between the historical and the 14C chronology are graphically analyzed to find the most probable offset. This offset is then applied as a correction to the estimated chronology to obtain an absolute time scale for the dated events.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 123
Author(s):  
Anderson Gregório Marques Soares ◽  
Elvis Thermo Carvalho Miranda ◽  
Rodrigo Santos do Amor Divino Lima ◽  
Carlos Gustavo Resque dos Santos ◽  
Bianchi Serique Meiguins

The Treemap is one of the most relevant information visualization (InfoVis) techniques to support the analysis of large hierarchical data structures or data clusters. Despite that, Treemap still presents some challenges for data representation, such as the few options for visual data mappings and the inability to represent zero and negative values. Additionally, visualizing high dimensional data requires many hierarchies, which can impair data visualization. Thus, this paper proposes to add layered glyphs to Treemap’s items to mitigate these issues. Layered glyphs are composed of N partially visible layers, and each layer maps one data dimension to a visual variable. Since the area of the upper layers is always smaller than the bottom ones, the layers can be stacked to compose a multidimensional glyph. To validate this proposal, we conducted a user study to compare three scenarios of visual data mappings for Treemaps: only Glyphs (G), Glyphs and Hierarchy (GH), and only Hierarchy (H). Thirty-six volunteers with a background in InfoVis techniques, organized into three groups of twelve (one group per scenario), performed 8 InfoVis tasks using only one of the proposed scenarios. The results point that scenario GH presented the best accuracy while having a task-solving time similar to scenario H, which suggests that representing more data in Treemaps with layered glyphs enriched the Treemap visualization capabilities without impairing the data readability.


Sign in / Sign up

Export Citation Format

Share Document