Stock Data Clustering of Food and Beverage Company

Shofwatul Uyun; Subanar Subanar

doi:10.22146/ijccs.2279

Stock Data Clustering of Food and Beverage Company

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.2279 ◽

2007 ◽

Vol 1 (2) ◽

Author(s):

Shofwatul Uyun ◽

Subanar Subanar

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Cluster Validity ◽

Fuzzy C Means ◽

Food And Beverage ◽

Hard Clustering ◽

Clustering Data ◽

Support Decision Making

AbstractCluster analysis can be defined as identifying groups of similar objects to discover distribution of patterns and interesting correlations in large data sets. Clustering analysis is important in the fields of pattern recognition and pattern classification. Over the years many methods have been developed for clustering data. In general, clustering methods can be categoried into two categories, i.e., fuzzy clustering and hard clustering. Fuzzy C-means is one of many methods of clustering based on fuzzy approach, while K-Means and K-Medoid are methods clustering based on crisp approach.This study aims to apply Fuzzy C-Means, K-Means and K-Medoid methods for clustering stock data in a jbod and beverage company. The main goal is to find a clustering method that can produce optimal clusters, The resulting clusters are validated using Dunn'• Index (DI). It is expected that the result of this reseach can be used to support decision making in the food and beverage company.Keywords : Clustering, Fuzzy C-Means, K-Means, K-Medoid, Cluster Validity, Dunn's Index (Dl)

Download Full-text

clusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences

10.1101/2021.02.22.432291 ◽

2021 ◽

Author(s):

Sebastiaan Valkiers ◽

Max Van Houcke ◽

Kris Laukens ◽

Pieter Meysman

Keyword(s):

T Cell ◽

Large Data ◽

Cell Receptor ◽

Amino Acid Sequences ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Link Type ◽

Large Sets ◽

Similar Accuracy

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Parallel Fuzzy c- Means Clustering for Large Data Sets

Euro-Par 2002 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/3-540-45706-2_48 ◽

2002 ◽

pp. 365-374 ◽

Cited By ~ 52

Author(s):

Terence Kwok ◽

Kate Smith ◽

Sebastian Lozano ◽

David Taniar

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

A FAST IMPLEMENTATION OF THE ISODATA CLUSTERING ALGORITHM

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195907002252 ◽

2007 ◽

Vol 17 (01) ◽

pp. 71-103 ◽

Cited By ~ 93

Author(s):

NARGESS MEMARSADEGHI ◽

DAVID M. MOUNT ◽

NATHAN S. NETANYAHU ◽

JACQUELINE LE MOIGNE

Keyword(s):

Clustering Algorithm ◽

Empirical Studies ◽

Synthetic Data ◽

Large Data ◽

Large Data Sets ◽

Cluster Center ◽

Data Sets ◽

Clustering Methods ◽

Sensing Applications ◽

Remote Sensing Applications

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.

Download Full-text

Analisis Algoritma K-Means Clustering Untuk Menentukan Strategi Promosi Penjualan Sepeda Motor Studi Kasus PT. Alfa Scorpii

JUTI UNISI ◽

10.32520/juti.v4i1.1087 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-8

Author(s):

Abdul Muni

Keyword(s):

Data Mining ◽

Large Data ◽

Sales Promotion ◽

Large Data Sets ◽

Data Sets ◽

Promotion Strategy ◽

Corporate Earnings ◽

Clustering Data ◽

The Right ◽

The Cost

PT. Alpa Scorpii is the sector private the economy in the motorcycle sales. The utilization of the data is not maximum, sales report that is used only limited to report. Promotion strategy is to increase the income of the company in relation to the straight way with the cost. The data mining so that data can be used as the existing knowledge from the large data sets or with the term knowledge discovery or pattern recognition. Many methods in data mining one only with the method the algorithm K-Means the Cluster. Clustering data so that the field of marketing can perform the motor sales promotion strategy to new customers with the right and can improve corporate earnings.

Download Full-text

Identifying Single Clusters in Large Data Sets

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch110 ◽

2011 ◽

pp. 582-585 ◽

Cited By ~ 1

Author(s):

Frank Klawonn ◽

Olga Georgieva

Keyword(s):

Least Squares ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Clustering Techniques ◽

Noise Data ◽

Data Points ◽

Arbitrary Noise ◽

Fuzzy C Means Algorithm

Most clustering methods have to face the problem of characterizing good clusters among noise data. The arbitrary noise points that just do not belong to any class being searched for are of a real concern. The outliers or noise data points are data that severely deviate from the pattern set by the majority of the data, and rounding and grouping errors result from the inherent inaccuracy in the collection and recording of data. In fact, a single outlier can completely spoil the least squares (LS) estimate and thus the results of most LS based clustering techniques such as the hard C-means (HCM) and the fuzzy C-means algorithm (FCM) (Bezdek, 1999).

Download Full-text

Time Series Clustering- Introduction to Healthcare System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9115.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2958-2963

Keyword(s):

Time Series ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Pull Out ◽

Clustering Approach ◽

Class Labels ◽

Clustering Data ◽

Extract Information

A clustering technique is an appropriate solvable approach for classifying information while no existence of premature information pertaining to class labels, using promising techniques like cloud based computing and big data over latest years. Investigating awareness was gradually piled up with unsupervised methods such as clustering approaches to pull out useful information from the data set available. Time series based clustering data was used in most of the technical domains to extract information enriched patterns to power the data analysis which extracts useful essence from complicated as well as large data sets. It is mostly not possible for large datasets using classification approach whereas clustering approach will resolve the problem with aid of unsupervised techniques. In the proposed methodology, main spotlight on time series health care datasets, one of the kind of admired data in clustering approaches. This summary will expose 4 major components of Time series approaches.

Download Full-text

[21] Clustering Methods for Analyzing Large Data Sets: Gonad Development, A Study Case

Methods in Enzymology - DNA Microarrays, Part B: Databases and Statistics ◽

10.1016/s0076-6879(06)11021-6 ◽

2006 ◽

pp. 387-407 ◽

Cited By ~ 3

Author(s):

Jérôme Hennetin ◽

Michel Bellis

Keyword(s):

Gonad Development ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Study Case

Download Full-text

Introduction to Clustering

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch010 ◽

2010 ◽

pp. 224-254

Author(s):

Raymond Greenlaw ◽

Sanpawat Kantabutra

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Research Directions ◽

History Of ◽

Representative Points ◽

Parallel Clustering ◽

Extensive List

This chapter provides the reader with an introduction to clustering algorithms and applications. A number of important well-known clustering methods are surveyed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. Algorithms are described for top-down and bottom-up hierarchical clustering, as are algorithms for K-Means clustering and for K-Medians clustering. The technique of representative points is also presented. Given the large data sets involved with clustering, the need to apply parallel computing to clustering arises, so they discuss issues related to parallel clustering as well. Throughout the chapter references are provided to works that contain a large number of experimental results. A comparison of the various clustering methods is given in tabular format. They conclude the chapter with a summary and an extensive list of references.

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text